From AI Slop to Spec-Driven Development
/ 9 min read
The Early Days: Vibe Coding and AI Slop
In early 2025, I started using GitHub Copilot. First for my own side projects, and then also at work once my employer unlocked access for the team. Like many developers at the time, I was pretty excited. AI-assisted coding sounded like the future, and I wanted to be part of it.
The reality was… different.
Vibe coding, as people had started calling it, was really inefficient for me. You know the pattern: you write a comment or a prompt, the AI generates something that looks reasonable, and you think “great, that was fast.” But then you actually look at the code. And it’s full of small issues - wrong edge cases, weird patterns, things that almost work but not quite.
What I got felt like a perfect example of AI slop - code that looks fine on the surface but falls apart once you actually test it or try to build on top of it. The time I saved on the initial writing was more than eaten up by all the debugging, changing, and refactoring that came after. It just wasn’t worth it.
I had this cycle where I’d try to use Copilot for something, spend 20 minutes fixing what it produced, and then think “I could have written this myself in 10 minutes.” And that happened again and again. Sometimes the suggestions were completely off, sometimes they were close but introduced bugs that were hard to spot because the code looked correct.
The professional context was even more frustrating. At work, the stakes are higher. You can’t just push something that “mostly works” and fix it later. Code needs to be clean, testable, and maintainable. And the AI output was none of those things consistently enough to be useful. My colleagues had similar experiences - some were more positive, but most shared my feeling that the productivity gain was an illusion.
I kept trying, hoping the tooling would get better and the models would improve. But for most of 2025, my honest take was: it’s not worth the effort.
The Shift: December 2025
Then, around December 2025, something changed. And it wasn’t gradual - it felt almost like it happened overnight.
The models behind Copilot had clearly improved a lot. It could now handle like 80% on its own. In many cases, the AI would produce clean, working code with little to no manual fixing needed. Not perfect every time, but consistently good enough that the time investment actually paid off.
The difference wasn’t just in the code quality. The AI also got much better at understanding context. It would respect existing patterns in the codebase, use the right libraries, and follow conventions that were established in other files. In early 2025, it felt like the AI was generating code in a vacuum. By late 2025, it felt like it actually understood the project it was working in.
That shift in quality gave me the confidence to explore a more structured approach to working with AI. If the AI could now handle most tasks reliably, maybe it was time to stop using it casually and start building a real workflow around it.
Discovering Spec-Kit: Spec-Driven Development
Around the same time, I came across GitHub Spec Kit, an open-source toolkit for spec-driven development (SDD). There’s a great blog post by Den Delimarsky that explains it in detail, but here’s the short version: instead of just prompting an AI and hoping for the best, you follow a structured, step-by-step process to first define what you’re building before any code gets written.
The core idea behind SDD is something that resonated with me right away: code is not the best medium for requirements negotiation. When you skip the planning step and go straight to code - whether you write it yourself or let an AI do it - the codebase becomes the specification by default. And that leads to exactly the kind of mess I was dealing with in early 2025: disconnected components that sort of work together but are hard to maintain and debug.
How Spec-Kit Works
Spec-kit has two main parts: the Specify CLI and a set of templates and helper scripts. You bootstrap a project with a single command using uvx:
uvx --from git+https://github.com/github/spec-kit.git specify init <PROJECT_NAME>This creates two folders in your project - .github (or whatever folder your coding agent uses) and .specify. The .specify folder contains all the SDD templates for specs, technical plans, and tasks, along with helper scripts. The agent-specific folder gets prompt files that you can use through slash commands.
The workflow is built around four sequential slash commands:
/speckit.specify- You describe what you want to build, and the AI creates a structured specification. This is where you capture the requirements, edge cases, and acceptance criteria. The more detailed your initial prompt, the better the spec turns out./speckit.plan- Based on the spec, the AI creates a technical plan. This covers the architecture, technology choices, data contracts, and how things should fit together./speckit.tasks- The plan gets broken down into small, actionable tasks that the AI agent can pick up and implement one by one./speckit.implement- All the tiny tasks are implemented
All the artifacts are plain Markdown files in a specs folder, so you can review and edit them at any point.
There’s also a neat concept called the constitution - a constitution.md file where you define non-negotiable principles for your project. Things like “all code must have unit tests” or “we use TypeScript, not JavaScript” or “every API endpoint needs proper error handling.” The constitution acts as a guardrail that stays consistent across all features you build.
What I liked most about this approach is that the specs become living documents that evolve alongside your code. They’re not some dusty requirements document that nobody reads after the first week. Because the AI agent references them during implementation, they stay relevant and useful.
The Playground: Advent of Code 2025
I needed a good test case to try out spec-kit properly, and Advent of Code 2025 turned out to be the perfect playground.
Let’s be honest - letting an AI solve Advent of Code challenges by itself isn’t all that hard. You paste the puzzle description, the AI gives you a solution, done. But that wasn’t what I was interested in.
What I actually wanted to test was: can each Advent of Code challenge, treated as a “user story” or requirement, be solved properly through a spec-driven development approach using spec-kit? In other words, I wasn’t trying to solve the puzzles - I was using the puzzles as a controlled environment to test a workflow.
This made it a really interesting experiment. Each puzzle has a clear problem description (the “requirement”), well-defined inputs and outputs (the “acceptance criteria”), and a correct answer you can verify against (the “test”). It’s basically a perfect setup for testing spec-driven development.
The Workflow
For each puzzle, the workflow looked like this:
Step 1: Specification and planning with GPT 5.1
I used GPT 5.1 for all the specification and planning steps - the /speckit.specify, /speckit.plan, and /speckit.tasks commands. I’d give it the Advent of Code puzzle description and let it work through the full SDD process.
GPT 5.1 was great at this because it has strong reasoning and can understand complex problem descriptions well.
Step 2: Implementation with Claude Sonnet 4.5
Then I used Claude Sonnet 4.5 for the actual implementation. It would pick up the tasks generated in step one and write the code based on the spec and plan.
I split the workflow this way on purpose. The specification and planning steps benefit from strong reasoning and understanding of the problem. The implementation step benefits from clean, precise code generation. And these two models each performed well at their respective part.
This separation also mirrors how real-world development should work. The person (or model) who understands the requirements doesn’t have to be the same one who writes the code. And keeping the two steps separate forces you to be explicit about what you want before any code is written.
Why This Split Works
There’s something interesting that happens when you separate specification from implementation like this. In normal AI-assisted coding, the model has to do both things at once - understand what you want AND write the code. And that’s where a lot of mistakes come from, because the model might misunderstand the requirement and then confidently implement the wrong thing.
With the SDD approach, you create natural checkpoints. You can review the spec after /speckit.specify, adjust the plan after /speckit.plan, and verify the task breakdown after /speckit.tasks - all before a single line of code is written. If something is wrong, you fix it at that stage, which is much cheaper and faster than fixing wrong code later. It’s the same principle as catching bugs early in the development lifecycle, just applied to AI workflows.
The Results
For the first challenges I tried with this approach, the results were really clear: a 100% success rate. Every solution worked. The specs were clear enough that Claude Sonnet 4.5 could produce correct code on the first try, with actually not a single manual touch.
That’s a huge difference compared to my early 2025 experience, where even simple tasks needed a lot of manual babysitting. The combination of better models and a structured specification workflow made all the difference.
What surprised me most was how little effort the whole process took once it was set up. Writing a good spec took maybe 5-10 minutes per challenge. The implementation was basically instant. And the result was correct code that I actually understood because I had written the spec for it. Compare that to the old approach of spending 30 minutes going back and forth with the AI, fixing issues, and still not being sure if the result was fully correct.
The spec-driven approach also produced code that was more consistent in style and structure. Because the spec defined not just what to do but how to approach it, the generated code followed a clear logic that was easy to read and review.
What’s Next
I’m planning to use the spec-driven approach for more of my development work going forward - not just toy problems but actual features in real projects. I’m curious how well it scales to more complex, real-world scenarios where the requirements are less clearly defined than an Advent of Code puzzle.
I’ll be writing more about spec-driven development workflows and my experiences with spec-kit in future posts. If you’ve had a similar journey with AI-assisted development - or a completely different one - I’d love to hear about it.