The AI-Generated UI Problem

A prospect sent us a Loom last month. Forty seconds in, their PM was clicking through a threat console that looked, frankly, great — severity counts, a live alert feed, a tidy area chart. "We built this in two days with AI," they said. "We just need you to make it real."

That phrase — make it real — is the whole problem. The interface was 80% there. The remaining 20% was the part that decides whether an enterprise security buyer ever signs.

The demo passes. The review does not.

AI-generated interfaces optimize for the happy path in your prompt. They are confident, plausible, and almost entirely untested against the conditions a security product actually lives in: empty states, permission boundaries, 4,000-row tables, screen readers, and an assessor with a checklist.

A security UI is not judged by how it looks in the demo. It is judged by how it behaves on the worst day your customer has all year.

Here is where the generated console broke the moment we pressure-tested it:

Contrast: the muted-gray secondary text failed WCAG AA against its own card. Half the metadata was effectively invisible.
Focus order: tabbing through the alert table jumped to the export button and back. Keyboard-only analysts were stranded.
Empty + error states: there were none. A failed collector rendered as a silent blank panel — the most dangerous state a SOC tool can have.
Density collapse: at 200 alerts the layout held. At 4,000 it repainted on every scroll tick and the tab locked.

Why this keeps happening

The model produces a component that satisfies the prompt, not a system that satisfies an invariant. "Make a severity badge" yields a severity badge. It does not yield the rule that severity color must be consistent everywhere, survive colorblindness, and never be the only signal carrying meaning.

tsx

// Generated: color is the only signal. Fails colorblind + audit.
<Badge color={sevColor(sev)}>{count}</Badge>

// Shipped: severity encoded in color + shape + label,
// driven by one token map the whole app shares.
<SeverityTag level={sev}>
  <SevGlyph level={sev} aria-hidden />
  <span>{SEV_LABEL[sev]}</span>
  <Count value={count} />
</SeverityTag>

The fix is not more prompting

You cannot prompt your way to an invariant, because the model has no memory of the promise it made three components ago. Closing the gap is senior work: a token layer every severity reference resolves through, a focus-management pass, real empty/loading/error states for every async surface, and a virtualization strategy for the tables that will actually get big.

Rule of thumb we use internally: if a state can occur in production but never occurred in the demo, it is unbuilt — no matter how finished the screen looks.

Ship-ready is not a coat of polish on top of generated code. It is the moment the interface holds its invariants under conditions nobody demoed: the empty tenant, the flooded queue, the keyboard user, the assessor. AI gets you a convincing 80%. The last 20% is the only part the buyer was ever going to test.

The AI-Generated UI Problem

The demo passes. The review does not.

Why this keeps happening

The fix is not more prompting

Austin McDaniel

Got a prototype that needs to become a product?

The Open Source Advantage in Security Design

Why Security Products Fail at Onboarding

Building Accessible Dashboards for SOC Analysts