ralph-starter vs doing it manually
I tracked one full week of development. Half the tasks with ralph-starter, half by hand. Same sprint, same project, same me, same coffee intake (a lot).
Look, I have been telling people ralph-starter saves time for weeks, but I realized I had never actually measured it. I was just vibing on the feeling that things were faster. That is not great. So I ran an honest experiment on myself.
12 features from the sprint backlog. All had clear specs in Linear. Endpoints, bug fixes, component updates, tests. Nothing exotic -- the kind of stuff that fills up every sprint everywhere.
I split them down the middle. 6 done manually (IDE open, ChatGPT tab, write code, run tests, fix, commit, repeat). 6 with ralph-starter.
Manual: the way I have been doing it for yearsโ
Read the ticket. Think about approach. Open files. Start coding. Hit a snag, open ChatGPT, paste context, get a suggestion back, realize it is not quite right, spend 10 minutes adapting it. Run tests. Something fails. Fix it. Run lint. Fix that too. Commit. Push. Open PR. You know this loop. You have lived this loop.
Average: 45 minutes per task for a total of about 4.5 hours of focused work.
And honestly, "focused" is generous. A chunk of that time was me being a human clipboard between ChatGPT and my editor. The AI was helpful but I was the integration layer. I was the glue code.
ralph-starter: the other wayโ
Read ticket. Label it "ralph-ready". Run one command. Go do something else. Review the PR when it shows up.
$ ralph-starter run --from linear --project ENG --issue ENG-71 --commit --pr --loops 5 --test --lint
๐ Loop 1/5
โ Fetching spec from Linear: ENG-71
โ "Add email validation to signup form"
โ Writing code with Claude Code...
โ Running tests... 4 passed, 1 failed
๐ Loop 2/5
โ Fixing: regex pattern for edge case emails...
โ Running tests... 5 passed โ
โ Running lint... clean โ
โ Committing changes...
โ Opening PR #112...
โ
Done in 1m 38s | Cost: $0.23 | Tokens: 17,640
Average: 12 minutes per task -- and most of that was waiting while I worked on other stuff. The actual hands-on time for all 6 was about 1.2 hours. Reading the PR, checking the diff, approving or leaving a comment. That is it.
Qualityโ
Both approaches produced working code. Tests passed, lint passed. The ralph-starter PRs were more verbose in places -- the AI writes way more error handling and comments than I would. I would have skipped half those try/catch blocks. Probably better practice, but yeah, I am lazier than the AI.
Nothing needed major rework either way. The AI code was not worse. It was just... more cautious than I am.
Where ralph-starter wonโ
Consistency. Every single PR had tests, passed lint, passed build. When I code manually I sometimes skip writing tests for small changes -- "it is just a tiny fix, I will add tests later." (I never add tests later.) The validation loop does not let the AI get away with that. I talked about this in the Ralph Wiggum technique post -- the loop enforces discipline that I personally do not have.
Throughput. 6 features in the time I would normally do 2, maybe 3 -- closer to 3.5x once you factor in all the context switching overhead from doing things manually.
Cost. The 6 ralph-starter tasks cost $1.87 total in API spend -- six tasks for less than two dollars. I track costs carefully now and this is pretty typical.
Where I wonโ
Complex design decisions. One task needed choosing between two data modeling approaches. The AI would have just picked one and run with it. I needed to think through tradeoffs, talk to the team, consider what happens next quarter. Ralph Wiggum vibes do not work here -- "I choo-choo-choose you!" is not a valid data modeling strategy.
Knowing stuff the AI does not. On a refactoring task, I knew a certain pattern was going to be deprecated next sprint. The AI had no idea. It would have happily written more of the old pattern and felt great about it. Sometimes you need the person who was in last week's meeting.
What I took away from thisโ
ralph-starter is better than me at "turn this spec into code." Faster, more consistent, less lazy about error handling. I am better than the AI at "figure out what we should build."
So now I focus on three things: writing clear specs (AI input), reviewing PRs (AI output), and architecture decisions (AI blind spot). The mechanical part in between -- translating spec to code -- ralph-starter does that. And it does not get distracted by Slack.
Want to run your own comparison?
npm i -g ralph-starter
ralph-starter init
ralph-starter run --from github --issue YOUR_ISSUE --commit --pr --loops 3 --test --lint
Pick a task you would normally do by hand. Time yourself both ways. I think you will be surprised.
