Exceptional Work

Enterprise Context Management

Mar 05, 2026

Over the past decade, companies spent trillions on data infrastructure. Cloud warehouses. Data lakes. Cleaner pipelines. Now every transaction flows into Snowflake tables and Databricks lakehouses.

Yet when these companies try to deploy AI on this data, projects stall at the pilot stage. The models reason and generate text at near-human levels. The data is cleaner than it has ever been.

But clean data turns out to be the easier half of the problem.

Intelligence vs Perception

Intelligence is the ability to reason: process information and draw conclusions. LLMs have this. Give GPT-5.3 or Opus 4.6 clear facts and they reason through them effectively.

Perception is the ability to understand facts inside operational context. This is where AI falls short.

Take a bank processing cross-border payments. A transaction gets flagged for review. The model can read the record. But it still can’t answer the questions an experienced operator resolves in minutes:

Is this a known pattern for this customer or jurisdiction?
Who can override the block?
What additional checks apply at this amount and destination?
What logs and audit trails update if approved?

Those answers live in escalation runbooks, ticket comments, email threads, and institutional practice accumulated over years.

They don’t live in the data warehouse.

Without that context, AI accumulates context debt. When a senior compliance officer with two decades of experience retires, they take that perceptual understanding with them. The replacement knows the process (documented in SOPs) but not the principles behind it. Why exceptions get made. When to escalate vs. resolve. How to read incomplete data.

That’s the environment most enterprises are actually deploying into. And AI doesn’t eliminate the problem. It accelerates the production of plausible answers that still require humans to catch what the system cannot perceive.

Why the Standard Fixes Fail

Enterprises have tried retrieval and structure. Both help at the margins. Neither produces operational understanding.

RAG breaks on relevance and state. Vector search retrieves text that looks related, not information that governs decisions. A query for “high value customers” pulls brand language about “value,” not the logic Finance uses to define top accounts. Even when retrieval is accurate, it’s mostly static: policies, manuals, old decisions. Exceptions require live state + precedent, not paragraphs that score well on similarity.

MDM and traditional knowledge graphs break on context. In Sales, “customer” is an account. In Finance, a billing entity. In Support, a ticket submitter. In Compliance, a risk profile. The meaning shifts by function and moment. Forcing a single definition removes the nuance operators depend on.

Both approaches try to make Systems of Record “smarter.” Neither provides what actually matters: judgment. A governed way to interpret records in context and convert them into decisions.

Rules → Principles

Standard enterprise automation is rules-based. If Condition A, then Action B. It works until reality drifts. That drift shows up as exceptions. And exceptions are the boundary of your rules. They’re also where your business principles already exist, applied informally by humans when the system can’t decide.

A rules system says “stop if condition X is met.”

A principle-based system says “prioritize validation based on customer relationship strength, transaction history, and regulatory risk.”

The difference is judgment. Principles work because they capture how a business actually operates. But AI can only exercise those judgments when it has operational context: what “high-value relationship” means in practice, what constitutes “unusual” for this specific customer, what “multiple risk signals” looks like in real-time.

That context has to be represented as operations, constraints, and permissions. Documents and entities alone won’t carry the weight.

Exceptions

If the shift from rules to principles is the conceptual foundation, exceptions are where it gets funded.

Exceptions are cases where rules-based automation fails and humans step in to supply missing context. They concentrate three things enterprises care about: real operational spend, measurable throughput, and high-consequence judgment. That makes them the natural entry point for any infrastructure that encodes operational understanding.

Here’s what that looks like in $$$$$$:

🏦 Financial Services: Cross-border payments have 2–5% exception rates. With 53 million SWIFT messages daily, that means 1–2.5 million manual investigations at $50–75 each. The global financial sector loses over $1 billion annually to payment exception handling inefficiencies.

🏥 Healthcare: Insurance claim denials average 16–20% of all submissions, with some payers rejecting up to 32%. Reworking a single claim costs approximately $25 in labor and overhead. For hospitals processing tens of thousands of claims monthly, this represents millions in operational drag and delayed revenue.

🏭 Logistics: Delivery exceptions can hit 12% on weekends. Wrong address, business closed, recipient unavailable. Every failed delivery incurs costs in fuel, driver time, and customer support.

Exceptions compound as automation scales. The remaining cases are the hardest and require the most context. Automate 95% of invoice processing and the remaining 5% becomes more expensive. Higher-cost labor. More context-switching overhead. Edge cases that demand deeper institutional knowledge.

The more enterprises automate, the more painful exceptions become.

Timing

The problem isn’t new. What’s changed is that the conditions to solve it have finally converged. Three shifts have made a new approach practical.

First, LLMs can reason through messy operational scenarios but cannot run unsupervised. That makes bounded reasoning architectures viable. Both model capability and human oversight are required. We’re in the sweet spot.

Second, exceptions get more expensive as automation improves. The problem concentrates rather than dissipates. The business case strengthens every quarter.

Third, enterprises are no longer shopping for “assistants.” They want systems that take actions safely. This requires infrastructure to delegate operational decisions. Retrieval alone won’t cut it.

These conditions have created space for a new discipline: Enterprise Context Management.

The Context Layer

The gap between “data” and “operational judgment” is real, and the major infrastructure vendors have noticed. Each is making architectural bets on how to close it.

Palantir’s Ontology is the most mature attempt. A semantic layer that maps entities, relationships, and actions into a unified model your applications can query. When you deploy an agent on Palantir’s AIP, it reasons over a structured representation of your operations: what actions exist, what permissions govern them, what happens when they execute.

The limitation is implementation cost. Palantir’s forward-deployed engineers spend months mapping a business before the system becomes useful. That works for defense contracts and large enterprises with existential problems. It doesn’t scale to mid-market companies or use cases where time-to-value matters.

Snowflake’s Cortex bets that the data warehouse itself should become the context layer. If your structured data already lives in Snowflake, Cortex layers LLM capabilities directly on top: permissions, lineage, and governance included.

The weakness: warehouses are optimized for analytics, not operations. They excel at “what happened” questions. “What should happen next” requires a different architecture.

Databricks Mosaic plays the model layer. The bet that context problems are actually capability problems. Train custom models on proprietary data. Build enterprise-specific reasoning into the weights.

This works for pattern recognition. But operational context shifts constantly based on who’s asking, what state the system is in, what happened five minutes ago. That dynamism requires runtime infrastructure, not frozen weights.

MongoDB’s Atlas Vector Search represents the RAG maximalist position. The tooling has matured. Sub-millisecond queries, smart chunking strategies.

But vector search optimizes for semantic similarity, not decision relevance. “What’s our refund policy?” and “Can this customer get a refund?” look nearly identical to an embedding model. Answering them requires completely different information.

What’s actually required: structured operational knowledge (Palantir) without multi-month implementations; tight data integration (Snowflake) without being limited to analytical queries; semantic retrieval (MongoDB) augmented with structured reasoning over live state. And governance throughout. Auditable, reversible, explainable. Table stakes for regulated industries. M AI vendors treat it as an afterthought.

Hybrid Loops

The dominant paradigm in enterprise AI right now is agent chains. Probabilistic model A passes output to model B, which passes to model C.

This works for demos. It falls apart in production.

LLMs are non-deterministic. When you chain them, errors compound. A three-agent chain where each agent has 90% accuracy delivers 72% system accuracy. Extend the workflow to ten steps and system accuracy falls to 34%.

The emerging alternative is the Hybrid Loop. The key insight is separation of concerns: use the LLM for what it’s good at (reasoning) and wrap it in deterministic code that handles execution, state management, and safety.

AI One, a New York-based startup founded in 2024, is making one of the more deliberate bets on this architecture. Their implementation breaks the hybrid loop into five components, each with a deliberate division of labor:

Planner — built on a frontier LLM, handles strategic reasoning. Given a goal, available tools, and constraints, it produces a structured plan: an executable object with phases, success criteria, and fallback conditions. This is where AI reasoning capability actually matters.
Supervisor — handles execution using conventional software, not another model. It tracks state, manages parallel tool calls, enforces budgets and timeouts, and orchestrates the workflow. Zero understanding of natural language. It executes the plan and keeps score.
Verifier — provides quality control with a narrow mandate: judge outputs, never take action. When the planner generates a SQL query or the supervisor receives tool output, the verifier labels it as valid, blocked, or needs review. These labels feed back into the supervisor’s decision logic.
Tool Layer — where work actually happens: APIs, databases, and external services. In most workflows, tool calls dominate both cost and latency. The architecture treats them as first-class concerns, handling them asynchronously and in parallel where possible.
Context Manager — orchestrates information flow. Rather than dumping the full context into every model call, it delivers targeted information to each component. Just enough to make the current decision, nothing more. This is where tight integration with a context layer matters: the context manager knows what information is relevant because it has a structured model of the operational domain.

The result: LLM reasoning happens in bounded contexts with deterministic code controlling data access, action execution, and audit logging. The model reasons. The system acts.

When something fails, you trace exactly what happened and why.

This pattern will show up across multiple vendors and implementations. What matters is the principle: enterprise AI has to clear a higher bar than “good outputs.” Explainable decisions. Reversible actions. Auditable proof that the system behaves within defined boundaries.

Agent chains can’t deliver that. Hybrid loops can.

AI-Shoring

Offshoring was a decision under labor constraints. Many enterprises pushed exception handling to low-cost centers because the work was manual and repetitive.

AI changes the equation. A well-governed agent can resolve a chunk of exceptions with lower latency, tighter system integration, and better data controls than a distributed human workflow. At lower cost than offshore teams.

That puts structural pressure on the $200+ billion global BPO model. What was pushed offshore for cost reasons is coming back inside the enterprise. Executed by AI rather than remote humans.

Where This is Headed

“Context Engineer” is a plausible job title because somebody has to encode operational meaning: definitions, permissions, precedents, and decision boundaries currently scattered across systems and people.

Enterprises won’t get durable AI operators by buying more copilots. They’ll get there by owning the layer that turns records into decisions.

Organizations that encode their judgment as machine-readable infrastructure will capture 10x more value from AI than those still running pilots. The window to build this capability is narrow. Maybe 12-18 months before it becomes a baseline capability.

The limiting factor in enterprise AI has never been data or models.

It’s the absence of a system that encodes judgment, context, and operational understanding.

That’s the layer being built right now.

✌🏽SR