AI & Development

The AI-Generated UI Problem

Cursor and v0 are incredible tools. But the output they produce fails enterprise security review every time. Here is what is missing.

AM
Austin McDanielFounder & CEO
May 30, 20269 min read

A prospect sent us a Loom last month. Forty seconds in, their PM was clicking through a threat console that looked, frankly, great — severity counts, a live alert feed, a tidy area chart. "We built this in two days with AI," they said. "We just need you to make it real."

That phrase — make it real — is the whole problem. The interface was 80% there. The remaining 20% was the part that decides whether an enterprise security buyer ever signs.

The demo passes. The review does not.

AI-generated interfaces optimize for the happy path in your prompt. They are confident, plausible, and almost entirely untested against the conditions a security product actually lives in: empty states, permission boundaries, 4,000-row tables, screen readers, and an assessor with a checklist.

A security UI is not judged by how it looks in the demo. It is judged by how it behaves on the worst day your customer has all year.

Here is where the generated console broke the moment we pressure-tested it:

  • Contrast: the muted-gray secondary text failed WCAG AA against its own card. Half the metadata was effectively invisible.
  • Focus order: tabbing through the alert table jumped to the export button and back. Keyboard-only analysts were stranded.
  • Empty + error states: there were none. A failed collector rendered as a silent blank panel — the most dangerous state a SOC tool can have.
  • Density collapse: at 200 alerts the layout held. At 4,000 it repainted on every scroll tick and the tab locked.

Why this keeps happening

The model produces a component that satisfies the prompt, not a system that satisfies an invariant. "Make a severity badge" yields a severity badge. It does not yield the rule that severity color must be consistent everywhere, survive colorblindness, and never be the only signal carrying meaning.

tsx
// Generated: color is the only signal. Fails colorblind + audit.
<Badge color={sevColor(sev)}>{count}</Badge>

// Shipped: severity encoded in color + shape + label,
// driven by one token map the whole app shares.
<SeverityTag level={sev}>
  <SevGlyph level={sev} aria-hidden />
  <span>{SEV_LABEL[sev]}</span>
  <Count value={count} />
</SeverityTag>

The fix is not more prompting

You cannot prompt your way to an invariant, because the model has no memory of the promise it made three components ago. Closing the gap is senior work: a token layer every severity reference resolves through, a focus-management pass, real empty/loading/error states for every async surface, and a virtualization strategy for the tables that will actually get big.

Rule of thumb we use internally: if a state can occur in production but never occurred in the demo, it is unbuilt — no matter how finished the screen looks.

Ship-ready is not a coat of polish on top of generated code. It is the moment the interface holds its invariants under conditions nobody demoed: the empty tenant, the flooded queue, the keyboard user, the assessor. AI gets you a convincing 80%. The last 20% is the only part the buyer was ever going to test.

AM

Austin McDaniel

Founder & CEO

Austin founded Good Code and leads its product vision. He writes about the gap between AI-scaffolded code and audit-ready product, and about what it takes to design software the security industry actually trusts.

Got a prototype that needs to become a product?

That gap between demo-ready and ship-ready is the work we do for every client. Let's talk about yours.

Start a conversation →
Keep reading