Quick Facts
- Category: Software Tools
- Published: 2026-05-19 16:54:43
- How to Protect Your Metabolism from Fructose’s Hidden Effects
- IO Interactive's 007: First Light Breaks from Hitman Formula, Early Impressions Reveal
- Google Abruptly Shuts Down Project Mariner AI Agent, Migrates Tech to Gemini and Other Tools
- Moving to 240Hz OLED Monitors: Why I Can’t Return to LCD for Gaming
- Mobile Qubits: Bridging the Gap Between Manufactured and Atomic Quantum Systems
AI agents are everywhere—boards mandate them, teams prototype them, and demos impress. Yet when it's time to go live, many sputter and stall. The model isn't the culprit; the surrounding workflow is. Drawing on two decades of deploying software in regulated industries, this article breaks down why the easy part is the AI, and the hard part is everything else. Through a series of questions, we'll explore the real bottlenecks: domain knowledge, monitoring, rollback plans, and onboarding agents like human engineers.
# Why do AI agents that work perfectly in sandboxes fail so often in production?
A sandbox is a controlled, predictable environment. An AI agent can be tested with clean data, clear boundaries, and no real consequences. But production is messy—there are edge cases, integration quirks, and unplanned user behaviors. In my experience, the model itself usually performs fine. What breaks is everything around it: no monitoring to detect drift, no defined ownership when something goes wrong, and no rollback plan. In regulated industries, every release is preceded by risk analysis, metrics collection, and a clear path back to a safe state. If you skip those for an AI agent, you're essentially flying blind. The result is that the agent stalls, fails, or hallucinates without anyone realizing until it's too late. The sandbox gave false confidence because it tested only the model, not the workflow.

# What matters more: the AI model itself or the workflow around it?
The model is the easy part. You can swap one large language model for another in an afternoon—they're becoming commoditized. The workflow underneath is what you cannot replace. That includes how the agent decides, what data it reads, how outputs are validated, and where fallbacks exist. After 20 years building software in sectors where a hallucination stops planes or money, I've learned that domain knowledge baked into the decision logic is far more valuable than any model weight. The workflow is the product. It's the collection of processes, human reviews, monitoring dashboards, and escalation paths that make an agent reliable. Without a solid workflow, even the best model will produce unreliable results. The industry often focuses on model accuracy, but production success depends on integration and governance.
# What specific requirements must AI agents meet in regulated environments?
Three things are non-negotiable: control, traceability, and safety. First, the agent's decision logic must be explicit and auditable. You need to know why it chose a particular action. Second, every input and output must be logged, with clear metrics collected from day one. Without that, you can't answer questions when something goes wrong—and something will. Third, there must be a defined way to return to a safe state when the agent fails. This could be a manual override, a fallback to a simpler rule, or a complete stop. In addition, you need ownership: a person or team responsible for the agent's behavior. None of this changes just because the code is written by an AI rather than a human. The discipline we use for critical software applies directly to agents.
# Why is domain knowledge so critical for building AI agents that work in production?
Domain knowledge is the accumulated understanding of which systems interact, which processes are fragile, and where a small change cascades into a big problem. Companies often stick with the same engineering teams for years because those teams know the client's business inside out. You cannot automate something you don't fully understand. Without domain expertise, you risk building agents that optimize the wrong thing or break when an unusual scenario occurs. For instance, an agent that handles invoice processing might need to know that a particular supplier always uses a non-standard format. MIT's 2025 research found that 95% of enterprise AI pilots produce no measurable business impact—largely because organizations fail to integrate AI with their actual workflows and domain realities. Domain knowledge bridges that gap.
# How should organizations onboard AI agents, and what can we learn from onboarding humans?
You wouldn't let a new developer commit code to the main branch on day one. There's a ramp-up: start with small tasks, review their work closely, and increase scope as trust builds. AI agents need exactly the same treatment. Give the agent a clear definition of done, evaluate outputs against known benchmarks, and have a human review results until reliability is proven. Build an escalation path for when the agent hits something it can't handle. This isn't a new idea—we've spent decades perfecting human onboarding. Yet most organizations skip it for agents and are surprised when chaos ensues. Stack Overflow's 2025 Developer Survey, with over 49,000 respondents, found that 45% of developers say debugging AI-generated code is more time-consuming than expected. The code looks right but isn't. Structured onboarding catches these issues early.
# What do recent surveys and research reveal about enterprise AI adoption failures?
Two data points illustrate the problem clearly. MIT's 2025 study on enterprise AI pilots found that an astonishing 95% of them fail to create measurable business impact. The root cause isn't the technology—it's how organizations adopt, integrate, and govern AI. They treat AI as a magic box instead of fitting it into existing workflows. Separately, Stack Overflow's 2025 Developer Survey, with more than 49,000 respondents, showed that 45% of developers find debugging AI-generated code more time-consuming than writing it from scratch. The output looks plausible, so trust is misplaced. Both studies point to the same conclusion: the hard work is not in the model but in the surrounding process. Without proper governance, monitoring, and human oversight, even the best AI will fail to deliver value.
# How can you ensure your AI agent has a safe fallback when things go wrong?
Every production system needs a plan for failure. For AI agents, the fallback must be designed upfront. First, define what constitutes a failure: output out of bounds, timeout, low confidence score. Then, create an automatic response—for example, switch to a rule-based system, ask for human approval, or halt the process with an alert. The agent should also have clear boundaries: it should know its own limitations and not attempt tasks it's not trained for. In my work, we build a safe state into every agent, much like a circuit breaker. When confidence dips below a threshold, the agent passes control back to a human operator. This requires logging the context so the human can quickly decide. Without such a mechanism, a small hallucination can cascade into a costly error. The discipline of release engineering—rollback plans, canary launches, and monitoring—applies just as much to agents as to any software.