The freelancer who built the app is gone. The original LLM session is over. The codebase is yours. A bug breaks the checkout at 2 a.m. on a Saturday. Whoever is on call cannot read the code fast enough to understand what is happening. By the time the bug is patched, customer trust has taken a hit and the team is exhausted.
This is the support gap in a vibe-coded codebase. The LLM did not write a runbook. It did not write the comments that explain why a function exists. It did not write the architecture diagram that would let a new developer orient in an hour instead of a week. The code works for the path it was written for and is opaque for everything else.
This article is the diagnostic method I use to identify a support gap, and the takeover engagement I would propose if you brought your app to me.
The Short Answer
Supportability is the property of a codebase that lets someone other than the original author resolve an incident in minutes instead of days. AI-generated codebases default to low supportability because the LLM does not know which parts will need to be supported in production. The fix is a takeover engagement that produces a runbook, fills the documentation gaps, sets up monitoring and alerting that work, and (where useful) takes ongoing support off your plate.
This is not a retainer. It is a fixed takeover plus an optional, scoped support agreement.
What Makes A Codebase Hard To Support
No runbook. When something breaks, there is no documented procedure. Whoever is on call has to read the code, the logs, and the deployment configuration in real time. A 30-minute incident becomes a five-hour incident.
No monitoring or alerting. The first sign of a problem is a customer email. Errors silently propagate for hours before anyone notices. The on-call person finds out about the issue ninety minutes after it started.
Logs that say nothing. Application logs are missing the context that would help diagnose. Errors are logged without the user, the request, or the data that triggered them. "Something failed" is the entry; the cause is not.
Deployments that nobody understands. The deployment was set up by the original developer. Nobody else has done one. Rolling back is a guess. Pushing a hotfix at midnight is dangerous.
Secrets management that is opaque. Credentials live in environment variables on the server, in a .env file on someone's laptop, and in a Notion page that one person can access. Rotating a key requires three people in three places.
Backups that have never been tested. There is a daily database backup. Nobody has ever restored from it. The first restoration attempt happens during the actual incident, when it is too late to discover that the backup is corrupted.
Onboarding that takes a week. A new developer joins to help support the app. There is no "how to run this locally" guide that works on the first try. There is no diagram of how the parts fit together. The first week is reading code and asking the previous developer questions over Slack.
If three or more describe your situation, this article is for you.
Why The Obvious Fix Fails
"We will hire a support team." Hiring takes months. The team that joins still cannot read a vibe-coded codebase fast at 2 a.m. unless someone has done the support-readiness work first.
"We will use a monitoring SaaS." Monitoring SaaS is useful when you know what to monitor. Plugging in Datadog or New Relic to a codebase with no instrumentation gives you generic CPU and memory graphs. The actual signal (slow database queries, failed external API calls, application-level errors) is not there until someone instruments it.
"The original developer can be on call." Sometimes. Often the original developer has moved on, is busy, or is unavailable when the incident happens. Single-person dependencies are how 2 a.m. incidents become 7 a.m. apologies.
"We will rebuild it on a managed platform." Managed platforms reduce the surface you have to support but do not eliminate it. The application-level bugs still happen. The customer-specific quirks still need to be handled.
How I Diagnose A Support Gap
Step 1: Walk through a hypothetical incident
What I do. Pick a realistic failure (the checkout returns 500, the LLM-facing endpoint stops responding, the database runs out of connections). Walk through what would happen step by step. Identify each gap (no alert fires, no logs, no runbook, no rollback). The gaps are the work.
Step 2: Audit the deployment process
What I do. Watch someone (or do it myself) push a small change end-to-end. Note every step that depends on tribal knowledge. Note every step that nobody has documented. The deployment audit reveals where the support work has to happen.
Step 3: Audit the observability surface
What I do. Check what is logged, what is monitored, what alerts fire, where the dashboard lives, and who can read it. Most vibe-coded apps have application logs only and no real alerting. We document the gap.
Step 4: Audit the secrets and credentials
What I do. Inventory every credential the app uses. Identify where each lives. Identify the rotation procedure (or lack of one). Identify who has access. The audit is uncomfortable; the result is a path forward.
Step 5: Audit the backup and recovery process
What I do. Identify what is backed up, where, how often, and when the last successful restore was. If the answer to the last is "never," that is the priority work.
The diagnosis produces a written report: the gaps, the priority order, and the engagement to close them.
How I Would Approach The Takeover
Phase 1: Discovery (week zero)
Before writing any code or runbooks, I need to understand what the app does, who uses it, what the failure tolerance is, and what "support" actually means for your business.
What I do. Walk through the app with you. Map the critical user flows. Identify what cannot fail (checkout, auth) versus what can fail temporarily (analytics, marketing emails). Establish the response-time targets you need. Document the customer-facing impact of each failure class.
Phase 2: Runbook (week one)
The runbook is the document that turns "the app is broken" into "follow these steps."
What I write. For each failure class identified in discovery: how to detect it, how to triage it, how to escalate, how to recover, how to communicate. Concrete enough that a developer who has never seen the codebase can follow it. The runbook is the single most valuable artifact of the takeover.
Phase 3: Monitoring and alerting that fire on real signal (week one to two)
We move from "logs exist somewhere" to "alerts fire when things matter."
What I build. Application-level error tracking (Sentry, Rollbar, or equivalent). Performance monitoring on the critical user flows. Alerts that fire on the patterns that matter (spike in error rate, spike in payment failures, spike in LLM token usage, p99 response time crossing a threshold). On-call rotation if needed. Incident channel setup if it does not exist.
Phase 4: Deployment and rollback (week two)
We document the deployment. We make rollback safe. We add a staging environment if there isn't one.
What I build. A documented deployment procedure that anyone on the team can follow. A rollback procedure that has been tested. A staging environment that mirrors production. Smoke tests that run after every deployment and surface obvious breakage immediately.
Phase 5: Secrets and backups (week two to three)
We close the secrets gap and prove the backups work.
What I build. A single source of truth for secrets (a vault, a managed secret store, or platform-native). A documented rotation schedule. A test restore from the most recent database backup, run end-to-end. The first restoration is the engagement; subsequent restorations are routine.
Phase 6: Onboarding documentation (week three)
The week-to-orient gap is closed.
What I write. A "how to run this locally" guide that actually works on the first try. A diagram of how the system fits together. A "how to add a feature" guide. A "how to handle an incident" guide that points to the runbook.
Phase 7: Optional ongoing support
If you want me on call for incidents, the engagement continues with a defined response-time agreement and a scheduled monthly check-in. If you want to handle support in-house with the runbook, the engagement ends after the takeover.
Honest Variables That Change The Cost Shape
Codebase complexity. A simple CRUD app has fewer failure modes than a payment-processing app with three external integrations.
Customer impact tolerance. A B2B tool with a 24-hour response window is different from a real-time customer-facing app where a five-minute outage costs revenue.
Existing observability. If you have Sentry installed but nobody reads the alerts, the gap is smaller than if there is no observability at all.
Team continuity. If you have an in-house team that will pair through the takeover, the work sticks faster. If we are handing off to a different vendor afterwards, we invest more in documentation.
Compliance posture. Apps that need POPIA, GDPR, HIPAA, or PCI documentation have additional support-readiness work. The diagnosis names which.
Why I Wrote This
Most articles on "managing software support" assume a codebase that was built with support in mind. This article exists because vibe-coded codebases were not, and the takeover work is the path that makes them supportable. If you read this and recognised your situation, the technical assessment is the route in.