Vibe Coding Scaling: When The Demo Worked And Real Load Did Not

Five users used the demo. It was fine. Twenty signed up after launch. Some pages got slow. Fifty users showed up the next week and the database started timing out. A hundred users have shown up since and you have been firefighting nightly. The LLM wrote the happy path under zero concurrency. Real load is something else.

This article is the catalogue of scaling failures vibe-coded apps ship with, the diagnostic method I use to find them, and how I would approach the production-hardening sprint.

The Short Answer

AI-generated apps fail under load in a recognisable set of ways: race conditions because nothing was written with concurrency in mind, N+1 queries because the LLM wrote the loop the way you would write it on paper, missing indexes because the schema was designed before the queries, unbounded memory growth because nothing has retention rules, and no caching because the LLM does not know what is expensive.

The fix is surgical. Most of the time the database is the bottleneck and the database fixes are well-understood. Most of the time horizontal scaling is overkill until the vertical fixes are done.

The Catalogue of Scaling Failure Modes

N+1 queries. A list of orders is fetched. For each order, a query fetches the customer. For each customer, a query fetches the address. A 100-order page does 301 database queries. The LLM wrote the loop because that is how you would write it on paper; it did not write the join.

Missing indexes. The schema was designed first. The queries came later. Now the most common query does a full table scan on a table with 200,000 rows. The query takes four seconds. The query runs on every page load. The page takes five.

Unbounded queries. A page lists "all customers." There is no limit. There is no pagination. The query is fast at 100 customers and times out at 10,000. Nobody noticed because the demo had 12.

Race conditions on counters and inventory. Two users buy the last item at the same moment. Both checkouts succeed. Inventory goes to -1. The LLM wrote inventory -= 1 without SELECT FOR UPDATE, without optimistic locking, without an idempotency key on the operation.

Memory leaks in long-lived processes. A Node process holds onto request-scoped data because closures captured it. Memory grows over the day. The process gets OOM-killed at 3 a.m. and the morning starts with "why was the site down?"

No caching where caching is obvious. The home page reads from the database every request. The product list is unchanged for hours and is fetched fresh every time. The LLM does not know what is cheap to cache because it does not know your access patterns.

No connection pooling. Every request opens a new database connection. At 100 concurrent requests the database hits its connection limit and returns errors to half of them.

Synchronous external API calls in the request path. The checkout calls the payment processor, then the email sender, then the analytics service, all sequentially, all blocking. The slowest one becomes the page's response time.

If three or more describe your app, this article is for you.

Why The Obvious Fix Fails

"We will buy a bigger server." Sometimes this works for a week. Then load grows again. The N+1 query that was tolerable on a small box is intolerable on a big one too; it is just intolerable later.

"We will add Redis." Caching helps when you cache the right thing. Cached invalidations cause stale-data bugs. Adding Redis without a caching strategy is a different way to fail.

"We will rewrite it in Go." Language choice is rarely the bottleneck. The bottleneck is the database query that scans 200,000 rows on every page load. Rewriting that query in Go does not help; rewriting the query (or adding the index) does.

"We will move to microservices." Microservices solve organisational problems. They do not solve "the database is slow." They often make scaling problems worse because every service adds network latency.

How I Diagnose Scaling Problems

Step 1: Establish the baseline

What I do. Run a load test against the current app at realistic concurrency. Measure response times at the p50, p95, and p99. Note what fails first (database, application server, external API). This becomes the baseline. Without it, "the fix is faster" is opinion, not measurement.

Step 2: Profile the database

What I do. Enable slow query logging if it is not on. Look at the slow query log over a representative day. Rank queries by total time consumed (frequency times average duration). The top five queries are usually responsible for most of the load.

Step 3: Profile the application

What I do. Run an application profiler (Node Inspector, Clinic.js, or equivalent) under load. Find the functions that consume the most CPU or wall time. Often the bottleneck is data transformation in the application layer, not the database itself.

Step 4: Map the request hot path

What I do. Pick the most-trafficked endpoint. Trace one request end-to-end. Record every database query, every external API call, every cache hit and miss. This map shows where time is spent and which dependencies are blocking.

Step 5: Identify the concurrency hazards

What I do. Read the code paths that handle inventory, counters, balances, anything stateful. Identify operations that are not atomic. Race conditions almost always exist in vibe-coded apps; finding them before they corrupt data is cheaper than finding them after.

The diagnosis produces a written report: where the bottleneck is, what the fix order is, and what the expected gains look like.

How I Would Approach The Hardening Sprint

Phase 1: Database fixes (week one)

The fastest, cheapest gains come from the database.

What I do. Add the missing indexes (carefully, with size and write-cost considered). Rewrite the top five slow queries. Replace N+1 patterns with proper joins or eager loading. Add LIMIT clauses where they are missing. Add pagination where the UI implies it.

What you get. Response times that are typically 5 to 20 times faster on the affected pages.

Phase 2: Concurrency fixes (week one to two)

This is the work that prevents data corruption.

What I do. Identify every state-mutating operation. Wrap inventory and balance changes in atomic operations (SELECT FOR UPDATE, optimistic locking with version columns, or database-level constraints). Add idempotency keys to operations that should be safe to retry. Add unique constraints where the application is supposed to enforce uniqueness.

Phase 3: Caching where it pays (week two)

We add caching only where it pays back, with invalidation rules that work.

What I do. Pick the queries that are expensive and rarely change. Cache them with explicit TTLs. Add invalidation hooks where the underlying data does change. Stale data after an invalidation gap is a tradeoff we name; we do not let it surprise anyone.

Phase 4: External API isolation (week two to three)

Slow external dependencies should not block the request path.

What I do. Move email sending, analytics, webhook fan-outs, and similar operations to a queue. The request returns once the operation is recorded; the actual work happens asynchronously. The user sees fast responses; the work still happens.

Phase 5: Connection pooling and process limits (week three)

We tune the runtime so it does not collapse under load.

What I do. Configure the database connection pool to match the application's real concurrency. Set process-level memory limits and restart policies. Add observability so we can see when the pool is saturated.

Phase 6: Re-test and document

We run the same load tests we ran in Phase 1. We compare. We document what changed and how to re-run the tests as the app grows.

Honest Variables That Change The Cost Shape

Database size. Indexes on a 1000-row table are free. Indexes on a 10-million-row table take time and disk and have to be added carefully.

Active traffic. Adding an index on a hot table is non-trivial under load. We may need to use online index creation or a maintenance window.

Hosting platform. Vercel, Render, Fly, AWS, your own server. Each has different connection pool primitives. The diagnosis names which.

Existing observability. If you have metrics, the diagnosis is faster. If you do not, we add the minimum and back-fill from there.

True horizontal-scaling needs. Some apps eventually need horizontal scaling. The diagnosis tells you whether you are there yet. Most of the time the answer is "vertical first, horizontal later."

Why I Wrote This

Most articles on "scaling AI apps" are generic devops advice. This article exists because vibe-coded codebases concentrate scaling failure in specific places (the database, the request path, the concurrency hazards) and the fixes are surgical. If you read this and recognised your prototype, the technical assessment is the route in.

Frequently asked questions

Can you fix scaling without rewriting?

Usually yes. Most scaling problems in vibe-coded apps are caused by missing indexes, unbounded queries, and N+1 patterns. They are surgical fixes.

Will you load test it?

Yes. The diagnosis includes a baseline load test and a target load test after the fix.

What if my app needs horizontal scaling?

That is a separate engagement. Most vibe-coded apps need vertical fixes first; horizontal scaling is overkill until the vertical work is done.

Do you cover database tuning?

Yes. Index design, query rewriting, connection pooling, and read replicas where they help.

How long does the hardening take?

Typically one to three weeks. The assessment scopes it.

Have a project in mind?

I review every enquiry personally. Tell me what you want to build and I'll tell you on the call if it's a fit.

Get in touch →