How to Secure AI-Generated Code Before Production

AI coding tools have changed the speed of software development.

That’s the upside.

The downside is they can also speed up the creation of insecure software.

A developer with Claude Code, Codex, Gemini CLI, or whatever comes next can now move across frontend, backend, infra, and CI/CD much faster than before. That’s useful. It also means people are shipping code into production that they didn’t fully reason through, didn’t fully test, and in some cases don’t fully understand.

⚠️
AI-generated code is not inherently insecure. Blindly trusted AI-generated code is.

That’s the real issue.

The conversation should not be “should we use AI to write software?” That ship has sailed.

The better question is: what security, compliance, and governance practices do you need if AI is now part of your delivery pipeline?

If you’re building real products, the answer is not paranoia. It’s discipline.

The risk is not just bad code, it’s compressed scrutiny

The reason this matters is not that AI suddenly invented new classes of vulnerabilities.

Most of the risks are old:

insecure auth flows
bad secret handling
dependency issues
injection vulnerabilities
over-permissive access
weak session management
broken CI/CD controls

What’s changed is the rate.

A weak engineer used to be limited by speed. Now they can generate a lot of code very quickly and mistake throughput for correctness.

A lot of teams are quietly replacing deep review with shallow confidence.

Unsafe AI shipping mindset

✗The code compiles, ship it
✗The model probably handled security
✗We'll scan it later
✗We don't need a real review for small changes
✗The package was popular so it must be fine

Secure AI shipping mindset

✓Generated code gets reviewed like junior output
✓Security controls live in the pipeline, not just in people's heads
✓Dependencies are treated as supply-chain risk
✓Auth and session flows get special scrutiny
✓Production release requires policy gates

That’s the posture shift teams need.

Start with governance, not tooling

A lot of teams jump straight to scanners.

Scanners matter, but governance comes first.

If your team is using AI to build production software, you need a simple, explicit policy for how that code is allowed to move into production.

That policy should answer questions like:

what classes of changes must be human-reviewed?
what security checks are mandatory in CI/CD?
what compliance baselines do we align with?
what model or agent is allowed to touch which environments?
what logs and audit trails do we retain?
who signs off on exceptions?

This is also where compliance becomes useful.

For Australian teams, the Essential Eight is a practical baseline for reducing common cyber risk. ASD’s guidance is explicitly risk-based and built around implementing prioritized mitigation strategies to reach an appropriate maturity level for your environment (ASD Essential Eight). For broader organizational controls, teams may also map to ISO 27001 or SOC 2 style controls depending on customer expectations.

The point is not to turn every startup into an audit bureaucracy. The point is to have a baseline that forces consistent thinking.

Secure by design matters more in the AI era

CISA’s Secure by Design guidance makes a point that more software teams need to internalize: security should be a core product requirement, not something dumped on the customer after release (CISA Secure by Design).

That hits differently in the AI coding era.

Because if AI lets you ship faster, then the pressure to defer security gets worse, not better.

A team using AI well should be increasing the amount of secure-by-default engineering they do, not reducing it.

That means things like:

MFA available and encouraged
sane default permissions
logging on by default
session expiry policies
strong secret management
safe dependency policies
review gates before production

🛡️
The right use of AI is not “move fast and ignore security.” It’s “move fast and encode security into the system so speed doesn’t degrade quality.”

Supply-chain security is now a first-class concern

One of the easiest ways to ship a vulnerability is through your dependencies.

That was already true before AI coding tools. It’s worse now because models happily suggest packages, snippets, and integrations with very little judgment about supply-chain risk.

This is where teams need to be much more deliberate.

Minimum controls I would expect

dependency scanning in every repo
lockfiles committed and reviewed
automated alerts for vulnerable packages
provenance and integrity checks where possible
explicit review before introducing new critical dependencies
CI/CD gates on high-severity dependency issues

GitHub’s supply-chain tooling, including dependency security and Dependabot-style alerts, is a practical baseline for many teams (GitHub supply chain security docs).

For code scanning, tools like CodeQL are useful because they can surface semantic vulnerability patterns across a codebase rather than just syntax issues (CodeQL).

A practical stack might include:

Dependabot or Renovate for dependency visibility and updates
CodeQL for code scanning
Trivy, Snyk, or equivalent for package/container scanning
secret scanning in CI
signed releases or provenance tooling where feasible

You need CI/CD gates, not just best intentions

A lot of security conversations die in Slack because everyone agrees in theory and nothing is enforced in the pipeline.

That doesn’t work.

If AI-generated code can land quickly, then your pipeline needs to be opinionated.

A sane minimum release gate for AI-built code

Static analysis and linting

Catch obvious issues early. Not enough on its own, but still required.

Dependency and container scanning

Block known vulnerable dependencies and images before they ship.

Secret scanning

Prevent tokens, keys, and credentials from leaking into the repo or build artifacts.

Auth and permission review for sensitive changes

Anything touching sessions, permissions, billing, infra, or account access gets elevated review.

Human approval before production

Especially for code substantially written by AI, major auth changes, or changes with high blast radius.

If a repo has none of this, it’s not serious production infrastructure yet. It’s just hopeful automation.

Prompt injection and insecure output handling are now app-layer concerns

Traditional app security is still here. But AI apps add their own patterns.

OWASP’s work on LLM application security is useful because it names the issues clearly: prompt injection, insecure output handling, supply-chain vulnerabilities, sensitive information disclosure, excessive agency, and more (OWASP GenAI Security Project).

This matters even if your product is “just a normal SaaS app” using AI in one feature.

If the output of a model is fed into another system without validation, you can create downstream security issues very quickly.

Examples:

model-generated SQL or filters passed through too loosely
model-generated Markdown or HTML rendered unsafely
agent outputs triggering tools without proper policy checks
retrieval systems pulling in malicious instructions from untrusted sources

Authentication and session hygiene deserve special attention

Auth is one of the easiest places for AI-generated code to create hidden risk.

Because auth code often looks straightforward while containing subtle problems:

token expiry too long
refresh token misuse
weak rotation logic
tokens stored in risky places
insufficient device/session invalidation
privilege escalation edge cases
unsafe “on behalf of” flows

You mentioned JWTs, and that’s a good example.

JWTs are not inherently bad, but they are easy to misuse. A token that lives too long, is exposed too broadly, or can be replayed too easily increases risk fast. If a token can be intercepted, copied from an unsafe client context, or reused before expiry, the attacker doesn’t care that the implementation looked clean in the diff.

A few practical rules:

keep access tokens short-lived
rotate refresh tokens properly
prefer secure cookie patterns over risky browser storage where the architecture allows it
support revocation and session invalidation
layer MFA where appropriate
treat impersonation and delegated access flows as high-risk features

🔐
Anything touching auth, sessions, permissions, or account recovery should get more review than average AI-generated code, not less.

Zero trust is the right mental model

A lot of people hear “zero trust” and think it’s just enterprise jargon.

The useful version is simple: don’t assume trust because something is inside your system boundary. Verify explicitly, minimize privilege, and design for compromise.

That mindset works well for AI-built systems because it avoids the most dangerous assumption of all: “the code came from our toolchain, so it’s probably fine.”

Zero trust in practice can mean:

least-privilege service accounts
scoped tokens and expiring credentials
strong separation between environments
policy checks before tool execution
network and service segmentation where needed
auditable access paths
no hidden admin bypasses

This is especially important when agents and coding tools start touching more of the stack.

AI can help with security too, but don’t outsource judgment

There’s a real upside here.

Newer models are getting better at spotting classes of bugs, risky flows, and insecure defaults. Security review is one of the highest-value uses of AI-assisted development.

But cyber security is still its own discipline.

It requires a different way of thinking. Not just writing code, but reasoning about:

attacker goals
attack paths
privilege boundaries
abuse cases
chained weaknesses
impact under real adversarial pressure

That’s why I would use AI as a force multiplier for review, not as an excuse to skip review.

A good workflow is:

AI helps generate or refactor code
automated scanners and policy checks run in CI/CD
AI-assisted review helps look for suspicious patterns
a human signs off before high-risk code reaches prod

My practical checklist before shipping AI-generated code

If I had to reduce this to a simple release checklist, it’d look like this.

Set a compliance and governance baseline

Essential Eight, ISO-style controls, or a similar internal policy. Something real, not vibes.

Treat AI-built code as review-required by default

Especially for auth, infra, payments, permissions, or data handling.

Harden the supply chain

Dependency scanning, lockfile review, secret scanning, provenance where feasible.

Enforce CI/CD gates

No vulnerable dependencies, no leaked secrets, no high-risk code merging without sign-off.

Review auth and session logic separately

JWTs, refresh flows, impersonation, delegated access, MFA, and expiry policies get elevated scrutiny.

Validate AI outputs before downstream use

Prompt injection and insecure output handling are real risks, not theory.

Use zero trust thinking

Least privilege, explicit verification, narrow access, strong auditing.

auth

Review target

supply chain

Review target

CI/CD gates

Review target

AI output handling

Review target

The bottom line

You can absolutely build secure apps in the AI coding era.

But only if you stop treating model output as trustworthy by default.

The future is not “humans code everything manually again.” That’s not happening.

The future is teams that combine AI speed with real engineering discipline: governance, review, secure defaults, supply-chain controls, auth hygiene, zero trust, and proper release gates.

That’s how you get the upside without turning your production stack into a security experiment.

Ship faster if you want.

Just don’t outsource your judgment.

If you’re using AI to build production software and want to compare notes on security, governance, or backend architecture, find me on X.