Vibe Coding Structure: When Every Change Creates Two New Bugs

The app works. Customers use it. New features ship. Each one breaks something else. A bug fix introduces two new bugs. A simple change requires touching seven files, none of which are obviously related. Onboarding a new developer takes a week of "do not touch that, it will break the homepage."

This is architectural drift in a vibe-coded codebase. The LLM wrote the happy path. It did not write the boundaries. This article is the catalogue of structural failure modes I find, the diagnostic method I use to surface them, and how I would approach the cleanup if you brought your prototype to me.

The Short Answer

AI-generated codebases are functional, not structural. Code that works ships. Code that has clear boundaries, consistent naming, and rules for how change happens does not ship from an LLM that has been asked to "build a feature" twenty times in a row.

The fix is targeted refactoring. We identify the parts of the codebase that fight every change, refactor those, and leave the rest. The goal is not pretty code. The goal is a codebase you can extend without dread.

The Catalogue of Structural Drift

These are the patterns I find in vibe-coded codebases. Not exotic. Repeated.

Mixed concerns in single files. A React component file that fetches data, transforms it, validates form input, calls a payment API, and renders three different layouts. The LLM kept adding to whatever file it was editing.

No domain layer. Business logic lives in route handlers, in component files, in random utility modules. The same business rule is implemented three times in three different ways and they have already drifted apart.

Implicit data shapes. The shape of an object is whatever the LLM happened to construct that day. The same conceptual entity (a "user" or an "order") has different field names in different parts of the app. Sometimes user.email, sometimes user.emailAddress, sometimes userEmail.

Inconsistent error handling. Some functions throw. Some return null. Some return { error: ... }. The caller has to guess which is which. Errors silently propagate as null and surface six functions later.

Naming that does not survive a Find. Function names are handleClick, doStuff, processData. Variable names are data, result, temp. Find-in-files returns 200 results that are unrelated.

Files in the wrong place. A pricing calculator lives in components/UI/. A database migration sits in the components folder. The folder structure tells you nothing about what code does.

Circular imports. File A imports B, B imports C, C imports A. Webpack tolerates it. Refactoring it is a multi-day archaeology project.

Dead code. Functions defined but never called. Routes that return 404 but exist in the route table. Components that render nothing because the LLM tried two approaches and never deleted the abandoned one.

If three or more of those describe your codebase, this article is for you.

Why The Obvious Fix Fails

"I will ask the AI to refactor it." The AI does not understand what your codebase is supposed to do. It rearranges what is there. It often makes the structure worse by adding abstractions that do not match the domain. The output looks confident and is harder to maintain.

"We will add tests first." Tests are valuable. They are not a refactor. Tests will lock in the current bad structure and make changing it harder. The order is: refactor surgically, add tests for the refactored part, repeat.

"We will rewrite it." Sometimes this is the right answer. Most of the time it is the panic move. A rewrite of a working app loses every piece of business knowledge encoded in the code, plus the ten weeks it takes to get back to feature parity.

"We will hire a senior developer to clean it up." Hiring takes months. Onboarding to a vibe-coded codebase takes weeks. The senior developer is still working alone, reading uncommented code, and guessing at intent.

How I Diagnose Structural Drift

The diagnosis is what tells me whether to refactor or rebuild. I work through these in order.

Step 1: Read the codebase end-to-end

What I do. Open every file. Read it. Build a mental model of what the app does, where the boundaries are, and where they should be. This sounds slow and is the cheapest part of the engagement. A 5000-line codebase is a half-day. The patterns are in the structure, not in any single file.

Step 2: Identify the change-resistant zones

What I do. Look at the most recent ten commits or the last month of changes. Map which files were touched. The files touched in five or more unrelated changes are the files fighting structure. Those zones are the cleanup targets.

Step 3: Map the domain entities

What I do. List the conceptual entities in the app (user, order, product, inventory item). For each, identify how it is represented across the codebase. Document the inconsistencies. The cleanup unifies them.

Step 4: Identify the abstraction debt

What I do. Find places where the same logic is implemented multiple times in slightly different ways. Find places where an unhelpful abstraction has been forced (a "helper" function that is called once and is harder to read than the inline code). Both directions get cleaned.

Step 5: Identify the dead code

What I do. Use a static analyser plus manual reading to find unreferenced functions, unused exports, unreachable branches. Dead code is removed.

The diagnosis produces a written report: which zones are change-resistant, which entities are inconsistent, which abstractions are debt, and the cleanup priority order. You keep the report whether or not you proceed.

How I Would Approach The Cleanup Sprint

Phase 1: Establish the boundaries (week one)

The first work is naming. We agree on the domain entities, their canonical fields, their canonical names, and their boundaries.

What I do. Pick the right shape for each entity (often informed by what the database already says, sometimes informed by the API contracts). Pick names. Document them in one place. Add types or schemas that enforce the shape.

What you get. A canonical types file (TypeScript) or schema file (Zod, Yup, Joi) that becomes the single source of truth for entity shapes.

Phase 2: Untangle the most painful zone (week one to two)

We pick the worst zone from the diagnosis. Usually it is the file that gets touched every time anyone changes anything.

What I do. Split it. Move concerns into the right places. Component files become rendering only. Data fetching moves to a fetch layer. Business logic moves to a domain module. Validation moves to a schema. The file goes from 800 lines to four files of 150 lines each, each with a single responsibility.

Why this matters. The next time you (or anyone) needs to change something in this zone, the change touches one file, not seven.

Phase 3: Unify error handling (week two)

We pick a single error pattern and apply it consistently. Either functions throw and route handlers catch, or every function returns a result type. Mixed is the problem; either choice is acceptable.

What I do. Refactor incrementally. Validate after each change. Add the minimum tests to lock in the new pattern.

Phase 4: Remove the dead code (week two to three)

We delete unreferenced functions, unused exports, unreachable branches, and abandoned UI flows.

What I do. Verify each removal does not break a path. Commit small, reversible changes. The codebase shrinks, often by 15 to 30 percent. Reading it gets faster for everyone.

Phase 5: Document what remains (week three)

The cleanup is over. The remaining work is making the new structure stick.

What I do. Write the readme that says where things go. Document the canonical entities. Add a one-page "how to add a feature" guide. Add lint rules where they help.

Phase 6: Documented handover

You keep the cleaned codebase, the canonical types, the readme, and the rules. Future development is faster because the structure carries the load that was previously carried by tribal knowledge.

Honest Variables That Change The Cost Shape

Codebase size. A 2000-line app is a one-week sprint. A 50,000-line app is several weeks plus.

Test coverage today. Existing tests make the cleanup faster because we can validate quickly. No tests means we add the minimum to validate; full coverage is a separate engagement.

Active feature development. If features are being shipped weekly, the cleanup runs alongside in a branch. If we can pause feature work for two weeks, the cleanup finishes faster.

Team continuity. If the team that will keep working on the codebase is in the engagement (pairing, code review, decision-making), the cleanup sticks. If we hand off to a different team after, we need to invest more in documentation.

True rebuild candidates. A small set of codebases are genuinely faster to rebuild than to clean. The diagnosis says so.

Why I Wrote This

Most articles on "refactoring AI-generated code" are generic refactoring advice. This article exists because vibe-coded codebases concentrate failure in specific patterns and the cleanup work is targeted, not cosmetic. If you read this and recognised your codebase, the technical assessment is the route in. It produces a written diagnosis with the change-resistant zones named, the cleanup order, and a scoped engagement quote.

Frequently asked questions

Will you refactor everything?

Only what fights every change. The cleanup is targeted, not cosmetic.

How long does the cleanup take?

Typically two to four weeks for a small app. Larger codebases scope longer. The assessment names which.

Will the app stay working during the cleanup?

Yes. The cleanup is incremental, with each step validated before moving on.

Do I need to add tests first?

We add the minimum to lock in changes. Full test coverage is a separate engagement.

What if a rewrite is genuinely cheaper?

The assessment says so honestly. Sometimes it is. Most of the time it is not.

Have a project in mind?

I review every enquiry personally. Tell me what you want to build and I'll tell you on the call if it's a fit.

Get in touch →