← All insights

Operational, Partial, Decorative, Gap: a 4-tier framework for AI systems in production

· AI governance · framework · model risk · methodology

After auditing AI systems across procurement, product, marketing, and contractual claims, a pattern emerges: every system lives in exactly one of four states. The state isn’t a function of how impressive the technology is or how much it cost to build. It’s a function of one thing — how well the organization can defend what the AI is actually doing.

We use these four tiers as the working classification in every engagement. They’re useful internally as a way to talk about risk, and useful externally as a way to explain to boards, regulators, and acquirers what kind of governance work is needed.

The four tiers

1. Operational

The AI does what it says it does, and you can prove it.

Documented capability claims tied to specific tests. Evidence of behavioral evaluation against those claims. Active monitoring of outputs in production. Clear escalation paths when behavior diverges. Named owner for the system.

This is the bar most organizations think they hit. Most don’t.

An operational AI is one a regulator can ask hard questions about and your team produces a binder, not improvisation. Board reporting on the system’s risk profile takes a day to assemble, not a quarter. A new model version gets deployed only after passing the same evaluation gates the old one passed.

What governance work is needed: maintaining the posture as the system evolves. Monitoring the operational evidence. Periodic re-audit to verify claims still hold.

2. Partial

The AI works in some cases, but the boundary isn’t mapped.

Performance is real. The system handles known scenarios well. It demonstrably produces value. But there’s no clean line between cases the system handles and cases it doesn’t. Failure modes exist but aren’t catalogued. The system is essentially well-engineered for happy paths and silent on edge cases.

Most production AI systems we’ve audited live here. This is not a bad place to be. It’s an honest place.

The risk in Partial isn’t that the AI breaks — it’s that nobody knows when it broke. A subtle drift in performance, a degradation on a particular demographic, a quiet failure on an edge case the original developers didn’t consider — these go unnoticed because there’s no instrumentation to notice them.

What governance work is needed: map the boundary. Define what the system handles and what it doesn’t. Build the monitoring that catches drift. The work is rarely about fixing the AI — it’s about understanding it.

3. Decorative

The AI exists for the appearance of using AI.

The system produces outputs. The outputs don’t change decisions. Often a chatbot layered onto a process that already worked. A model trained to predict something the team already knew. A natural-language interface where a structured form would have been faster.

Decorative AI is more common than people admit. It accumulates because AI is something organizations feel they’re supposed to have. The procurement happens, the integration happens, the launch happens — and nobody asks the question did this change anything?

The risk in Decorative isn’t dramatic harm. It’s the slow accumulation of compliance surface area for systems that aren’t doing meaningful work. Each decorative AI is a thing your legal team has to defend, your security team has to monitor, your data team has to feed — for output nobody depends on.

What governance work is needed: an honest decision about whether to keep the system. Often the answer is no. If yes, downgrade its governance footprint to match its actual decision impact.

4. Gap

The AI is doing something the organization isn’t prepared to defend.

Behavior diverges from policy, claims, or contracts. Sometimes by design (the AI was deployed faster than the policy work could catch up). Sometimes by drift (the model was retrained on new data without a corresponding update to its claimed behavior). Sometimes by misunderstanding (the team that built it doesn’t agree with the team that sells it about what it does).

Gap is the tier that creates exposure. If a regulator asks hard questions about a Gap-tier system, the answers don’t exist. If a customer audit lands, the documentation doesn’t match the behavior. If something goes wrong publicly, the public version of the story diverges from the internal one.

What governance work is needed: immediate. The work is rarely to change the AI — it’s to bring the organization’s claims, policies, and contracts into alignment with what the AI actually does. Sometimes the AI gets restricted to match the documentation. Sometimes the documentation gets rewritten to match the AI. Both are valid. Continuing to operate while the gap exists is not.

Why the framework matters

The four tiers aren’t about good vs. bad. Operational systems can have real problems. Partial systems can be hugely valuable. Decorative systems sometimes get built for legitimate reasons (proof-of-concept work, vendor pressure, internal politics). Gap systems are usually built by smart people who got moving faster than the governance could keep up.

The point is that each tier requires different work. Audit recommendations that ignore the tier — that prescribe the same controls regardless of which state the system is in — produce paper that nobody acts on. Tier-aware recommendations tell the organization what to do next given what they actually have.

Where most organizations land

Across the audits we’ve run, the rough distribution looks like:

  • Operational: ~10%. Usually one or two flagship systems, well-resourced, well-documented, owned by a serious team.
  • Partial: ~50-60%. The honest majority. Systems that work but whose boundaries aren’t mapped.
  • Decorative: ~20%. Often older systems that got built early in the organization’s AI investment and never produced meaningful decisions.
  • Gap: ~10-15%. The systems that create the most exposure. Usually fewer in count, but each one represents real risk.

Most engagements involve a portfolio of systems across all four tiers. Where the work starts depends on which tier creates the most exposure relative to the organization’s risk appetite. For most clients, that’s Gap first, Partial second, Decorative last.


How to use this framework

If you’re trying to assess your own AI portfolio, the Diagnostic is the cheapest first read. Ten questions, ten minutes, and you’ll get a written tier classification with two or three prioritized next steps.

If you already know roughly which tier your systems fall into and want a structured audit that produces defensible documentation, tell us about the system and we’ll scope an engagement.

If you’re working through this for a board or regulator and need the framework formally documented for that audience, the language above is yours to use — credit appreciated but not required.

The four tiers are the working tool. The work is what we do with them.