Vibe Coding Security: What AI-Generated Code Misses, and How I Would Harden It

Vibe coding is fine for a closed-loop demo. It is great for a hackathon. It is great for a prototype you throw away in a week. The moment the app touches a real user or real customer data, the security work the AI never did becomes your problem. AI-generated codebases ship with a recognisable set of gaps. The fix is not throwing the code away. The fix is an audit followed by a hardening sprint.

This article is the catalogue of gaps I find in vibe-coded codebases, the diagnostic method I use to surface them, and how I would approach the hardening sprint if you brought me your prototype. It is also a working draft of the engagement. Read it. If you recognise yourself, the technical assessment is the route in.

I am pro-AI. I run the SEO AI Toolbox on this site. I use AI in my own pipeline every day. And I read every line the AI writes. This article exists because the gap between those two practices is where most vibe-coded apps fall in.

The Short Answer

AI-generated codebases tend to ship with secrets in the frontend bundle, missing authentication, missing authorisation checks, no rate limiting, unsafe dependencies, no input validation, and full stack traces returned to the browser. These appear because the LLM wrote the happy path. It did not write the abuse path. It does not know your threat model.

The fix is plumbing: secrets moved to a real secret store, real session management, authorisation checks at every endpoint, parameterised queries throughout, rate limiting on every public endpoint, dependency audit, and an audit trail you can read after an incident. Each item is a real engineering task, not a checkbox.

The existing code is rarely replaced wholesale. It is hardened.

The Catalogue of Gaps

These are the gaps I find in vibe-coded apps. They are not exotic. They are the same eight or nine patterns, repeated.

Secrets in the wrong place. API keys for OpenAI, Anthropic, or your payment processor live in the frontend bundle. Database credentials are in the git history of the public repo. The .env.example file got committed with real values. The app calls a third-party API directly from the browser using the production key, because the LLM wrote a fetch call from a React component and nobody questioned it.

No real authentication. "Login" is a hard-coded password compared in the client. Or a single shared token. Or a JWT signed with a known weak secret. Or no session management at all (just a "logged-in" boolean in localStorage). The LLM wrote what looked like a login flow because the user asked for one.

No authorisation checks. Users are required to log in to see the dashboard, but every API endpoint that powers the dashboard is unauthenticated. Anyone who knows the URL can call it. Or the endpoints are authenticated but never check whether the user has permission for the specific resource they are asking about. User A can read User B's data by changing the ID in the URL.

Unvalidated input everywhere. User input goes straight into a SQL query, a shell command, an LLM prompt, or a file path. SQL injection. Command injection. Prompt injection. Path traversal. The LLM wrote queries with template literals because they look cleaner than parameterised queries. They are not safer.

No rate limiting. Every public endpoint can be hit at unlimited rate. The LLM-facing endpoint costs you money per call, and a script can spend thousands of dollars in an hour. The login endpoint accepts unlimited attempts. The signup endpoint accepts unlimited account creation.

Unsafe file uploads. File uploads accept any extension, any size, any content. The uploaded file is served from the same domain as the app. The MIME type comes from the request and is trusted. A user uploads pwn.html and now your domain is hosting attacker-controlled HTML.

Stack traces returned to the browser. Errors return the full stack trace, including file paths, dependency versions, database schema names, and table structure. An attacker reads the response and learns more about your stack than your team has documented internally.

Outdated or abandoned dependencies. Nobody has run npm audit (or the equivalent) in months. Half a dozen dependencies have known CVEs. Two are abandoned and will never be patched. The LLM picked package names that were popular when its training data was last updated, which may not be when you ran the build.

No audit trail. When something is created, changed, or deleted, no record exists of who did it, when, or what the previous value was. After an incident you cannot reconstruct what happened.

No incident response. When the inevitable happens, the response procedure starts with "wait, where do we even begin?" There is no logging, no monitoring, no alerting, no documented runbook.

If you recognise three or more of those, you are reading the right article.

Why The Obvious Fix Fails

"I will just ask the AI to fix the security." The AI does not know your threat model. It does not know your data classification. It does not know your compliance requirements. It does not know which of your dependencies are abandoned. It can patch what you point it at; it cannot find what you didn't. Worse: it confidently writes patches that look plausible and miss the actual issue. Most of my early audit work is unwinding "AI fixed the security" patches.

"I will add a security plugin." Most security plugins handle one class of risk. A web application firewall. A rate limiter. A secrets scanner. A dependency auditor. Your codebase has at least five classes of gap. One plugin is one class. Adding plugins also adds dependencies, which is its own risk surface.

"It is behind a login, so it is fine." The login is part of the audit. Until it has been read by someone who knows what to look for, it is not a control. It is a button. Most "behind a login" apps fail authorisation at the API layer (the login is checked at the UI, every request that powers the UI is unauthenticated).

"It has not been hacked yet." AI-built apps with exposed secrets get found by automated scanners within days of going public. GitHub secret scanners. Public-internet scrapers. Bug-bounty hunters who run paid scanners against every domain that shows up in certificate transparency logs. "Yet" is a planning error, not a defence.

How I Diagnose This

A vibe-coded app gets audited the same way every other app does, but the priorities are different because the gap distribution is different. AI-generated codebases concentrate gaps in the cheap-to-find categories.

Step 1: Repository and bundle audit

What I do. Clone the repo. Search the entire history for credentials, API keys, JWTs, database connection strings, OAuth secrets. Check the public-facing bundle that the browser actually receives. Diff what is in the bundle against what should be in it. Most of the easy wins are in the first hour.

Step 2: Dependency audit

What I do. Run npm audit (or the equivalent for your stack). Check package-lock.json for abandoned packages. Check the major version drift on critical libraries. Look for type-squatting and lookalike packages that the LLM may have hallucinated names for. Triage by severity and by reachability (a CVE in a transitively-included library that nobody actually calls is lower priority than a CVE in your auth middleware).

Step 3: Authentication and authorisation audit

What I do. Map every route in the app. For each route, identify whether authentication is required, where it is checked, and what the failure mode is. For each authenticated route, identify whether authorisation is checked at the resource level. Most vibe-coded apps fail at the authorisation level: authentication is implemented, authorisation is not.

Step 4: Input handling audit

What I do. Trace every user input from where it enters the system (a form, a query parameter, a webhook, an uploaded file) to where it is consumed (a database query, a shell command, an LLM prompt, a file path). For each path, identify whether validation happens at the boundary, whether the consumer treats the input as trusted, and what the failure mode looks like.

Step 5: Rate limiting and abuse audit

What I do. List every public endpoint. For each, identify the cost class (does it hit the database, does it hit an LLM, does it send email, does it create user accounts) and the current rate-limit posture. The LLM-facing endpoint without rate limiting is the most expensive bug a vibe-coded app ships with.

Step 6: Logging, monitoring, audit trail

What I do. Establish what is logged today, what is monitored, what alerts fire, and what an incident response procedure would look like in practice. Most vibe-coded apps have application logs only and no audit trail. The audit trail is what makes recovery possible.

The audit produces a written report with each gap rated by exploitability and business impact. You keep the report whether or not you proceed with the fix.

How I Would Approach The Hardening Sprint

Here is the engagement I would scope on a real vibe-coded app post-audit.

Phase 1: Bleeding-stop fixes (first 48 hours)

The audit will surface things that need to be patched before anything else. These are the bleeding-stop fixes: exposed secrets that are still in the bundle, unauthenticated LLM-facing endpoints with no rate limiting, file upload paths that accept any content. These are not "security improvements." They are "this is currently being exploited or about to be."

What I do. Rotate every credential that has been exposed. Move secrets to a real secret store (AWS Secrets Manager, Vault, Doppler, or platform-native, depending on where you host). Add minimum viable rate limiting on the most expensive endpoints. Lock down file uploads. Do not stop to make this elegant; make it working.

Phase 2: Authentication and authorisation hardening (week one)

What I build.

Real session management. Real token rotation. Real logout that invalidates server-side state.
Authorisation checks at every endpoint, not at the UI. The UI hide is for UX, not for security.
A consistent authorisation model. Either role-based (admin, user, guest) or attribute-based (owner of resource, member of organisation), depending on the app. Consistency matters more than which model you pick.
A clear path for staff vs customer separation, where applicable.

Phase 3: Input validation and parameterised queries (week one to two)

What I build.

Validation at every system boundary. Schema validation for incoming JSON. Type-checked query parameters. File-type checks that look at content, not extension or MIME type.
Parameterised queries throughout. No string concatenation in SQL. No template literals in shell commands. No raw user input in LLM prompts (or, where prompts must include user input, structured input with explicit boundaries).
Output encoding where output is rendered. HTML-escape where the output is HTML. JSON-encode where the output is JSON. Do not rely on the framework to do it.

Phase 4: Rate limiting and observability (week two)

What I build.

Rate limiting on every public endpoint, with limits scaled to the cost class.
Application logs structured (JSON) and shipped to a log aggregator.
Audit trail for every state-changing action: who, when, what, previous value, new value.
Alerts that fire on the patterns that matter: spike in error rate, spike in LLM token usage, spike in failed login attempts, new IP addresses calling expensive endpoints.

Phase 5: Dependency hygiene and incident response (week two to three)

What I build.

Updated dependency tree, with abandoned packages replaced and known CVEs patched.
A documented update path so the dependency audit is repeatable, not a one-off.
A written incident response runbook: how to detect, how to triage, how to escalate, how to recover, how to communicate. Concrete enough that someone other than me can run it.

Phase 6: Documented handover

You keep the code. You keep the runbook. You keep the audit trail. You keep the alerting configuration. You keep the secret-rotation schedule. You leave the engagement with a system someone else can run.

Honest Variables That Change The Cost Shape

Codebase size. A 500-line app is a one-week sprint. A 50,000-line app is several weeks plus.

Hosting. Vercel, Render, Fly, Railway, AWS, your own server. Each platform has its own secret-store path and its own rate-limit primitives. The diagnosis names which.

Real customer data. A B2B SaaS with one user is different from a consumer app with ten thousand users with PII. The hardening priorities differ. The audit trail requirements differ. The compliance pressure differs.

Compliance posture. POPIA, GDPR, HIPAA, PCI, SOC 2. The hardening sprint produces the technical controls. The compliance attestation is a separate engagement and may need a specialist partner. I will not pretend otherwise.

Original developer availability. If the original developer (you, a freelancer, an in-house person) is available to walk me through intent, the audit goes faster. If not, the audit is slower because I have to infer intent from code.

Existing tests. Tests are the cheapest way to lock in fixes. If you have none, we add the minimum to validate the hardening. If you have some, we extend them.

Why I Wrote This

Most articles on "AI code security" tell you that AI-generated code is dangerous and you should hire a security consultant. That is true and not useful. This article exists because the actual fix is plumbing, the actual diagnostic method is repeatable, and the buyer who recognises the pattern in their own app is the buyer I want to work with.

If you read this and thought "yes, that is exactly what my prototype looks like," the technical assessment is the route in. It is paid. It produces a written audit report with each gap rated by exploitability and business impact, plus a hardening fix path. Most assessments take two to four hours of focused review plus a written diagnosis returned within a week. You keep the audit whether or not you book the hardening sprint.

Frequently asked questions

Will you tell me my code is bad?

The diagnosis is factual, not judgmental. It tells you what works, what does not, and in what order to fix it.

Can you fix it without rewriting?

Usually yes. Most vibe-coded security gaps are plumbing: secrets, auth, validation, rate limiting. They can be added without a rewrite.

How long does the hardening sprint take?

Typically one to three weeks for a small app, longer for anything with real data or real traffic. The assessment is the only honest way to scope it.

Do you do compliance work (POPIA, GDPR, HIPAA, PCI)?

The hardening sprint handles the technical controls. Compliance attestation is a separate engagement and may need a specialist partner.

Will the AI-built code be replaced wholesale?

Only where rebuilding is genuinely cheaper than fixing. The assessment shows you which is which.

What if I built it myself with Cursor or Claude Code?

Same engagement. The audit does not depend on who wrote the code. It depends on what the code does.

Have a project in mind?

I review every enquiry personally. Tell me what you want to build and I'll tell you on the call if it's a fit.

Get in touch →