AI Agents in Production: Why Human Approval Still Matters

Q: Was the Terraform incident caused only by the AI agent?

No. The incident involved missing state context, over-trust in automation, destructive command execution, and backup assumptions.

Q: Does human approval still matter if the agent is usually right?

Yes. Production safety is about limiting the blast radius of rare but severe mistakes, not average correctness.

Q: What is the biggest mistake teams make with AI agents?

The biggest mistake is assuming that a fluent explanation means the system fully understands production context.

AI agents are getting better at doing real work. They can inspect repositories, propose infrastructure changes, summarize incidents, draft replies, orchestrate multi-step tasks, and move much faster than a human operator through a routine workflow. That is exactly why teams are becoming more comfortable putting them closer to production systems. It is also why the risks are getting more serious.

In the early phase of AI tooling, most failures looked embarrassing rather than catastrophic. A chatbot invented facts. A code assistant wrote mediocre boilerplate. A summarizer missed nuance. Those were annoying problems, but they were usually recoverable. Production infrastructure is different. Once an AI agent is allowed to run commands, modify state, touch cloud resources, or operate around databases and backups, the failure mode changes from inconvenience to destruction.

That is the core lesson behind a March 6, 2026 post by Alexey Grigorev, who described how an AI-assisted Terraform workflow contributed to wiping production infrastructure behind the DataTalks.Club course platform. According to his own write-up, the chain involved a missing Terraform state file, an overly trusted agent workflow, an automated destroy path, and deleted snapshots that forced an urgent AWS support escalation before the database was restored roughly 24 hours later. The point is not that one tool is uniquely dangerous. The point is that AI agents become dangerous whenever humans confuse speed with safety.

For Brndle readers, this is not just a DevOps story. It is a software, operations, and product-management story. AI is now being positioned as a productivity layer across engineering, support, content, analytics, and automation. We recently covered practical AI workflows in 10 AI Tools to Automate Your Regular Tasks in 2026 and skill-building paths in 10 Free AI Courses to Take in 2026. This article looks at the other side of that conversation: what changes when AI stops being a writing assistant and starts acting inside production-adjacent systems?

AI agents can speed up infrastructure and operations work, but speed without approval control is a liability.
The real risk is not one bad command. It is a chain of small assumptions that nobody stops in time.
Human approval still matters because production systems punish confident mistakes much faster than chat interfaces do.

What the Terraform incident actually teaches

The easiest version of this story is the wrong one. It is tempting to reduce the incident to a viral headline such as “AI destroyed production,” then move on. But that framing hides the real lesson. The incident was not caused by magic, malice, or spontaneous machine rebellion. It was caused by an automation chain that was given too much trust and too little friction.

In Grigorev’s own account, the process began with a migration task and a Terraform setup that had been reused across multiple concerns. The key technical problem was that the active machine did not have the correct Terraform state available locally. Terraform therefore interpreted the infrastructure as if little or nothing existed. A human noticed a warning sign when the plan looked wrong, but the workflow continued into cleanup logic. At that point the assistant suggested a cleaner deletion path through terraform destroy, which appeared logical in context. The command then destroyed real production infrastructure instead of only temporary duplicates.

That is a far more useful lesson than “never use AI.” The real lesson is that AI tools often make local reasoning look smoother than global reasoning actually is. A specific step can sound tidy and rational while still being disastrous in the full system context. That risk is not unique to Claude Code, Terraform, or AWS. The same pattern could happen with GitHub Actions, database scripts, Kubernetes manifests, CI pipelines, or homegrown automation wrappers if the system allows powerful actions without enough review boundaries.

It is also worth noting that this was not just an infrastructure deletion event. It was a backup assumptions event. AWS documentation makes an important distinction between automated backups, retained automated backups, final snapshots, and manual snapshots. AWS also states that if you do not retain automated backups when deleting a DB instance, those automated backups cannot be recovered, while manual snapshots are not deleted by the DB instance deletion itself. That means backup design is not a vague comfort blanket. It is a concrete architectural choice. If your recovery model depends on assumptions you have not actively tested, you do not really have a recovery model.

Why AI agents feel safer than they are

One reason teams over-trust AI agents is that they communicate in a way that feels structured, legible, and calm. Traditional scripts fail bluntly. Humans notice stack traces, broken outputs, and command-line friction. AI agents often present the next step in polished natural language. They sound less like raw execution machinery and more like a competent coworker. That changes how people assess risk.

Natural-language fluency can create a dangerous illusion of system comprehension. An agent may explain why a step seems clean, consistent, or efficient, but that does not mean it has reliable situational awareness. It may be making a reasonable move from incomplete context. It may be anchoring on the wrong state. It may be optimizing for local neatness instead of global safety. And it may be doing all of that while sounding impressively confident.

Anthropic’s own Claude Code security documentation implicitly reflects this risk. The product uses explicit permissions for edits and commands, notes that bash execution requires approval by default, and states clearly that users are responsible for reviewing proposed commands and code for safety before approval. That is not just legal language. It is the correct operational model. The human is supposed to remain the authority for side effects.

The moment a team starts treating approval prompts as annoying friction rather than as a core safety boundary, the tool’s design intent has already been defeated. That is where things go wrong. Not because the AI suddenly became autonomous in a science-fiction sense, but because humans normalized away the checkpoints that were supposed to slow down destructive action.

Production systems are not chat environments

A lot of organizations are still emotionally calibrating AI tools as if they were advanced chat products. That mindset is outdated. A production AI agent is closer to a junior operator with terminal access, broad confidence, uneven caution, and variable context retention. In some situations it can produce extraordinary leverage. In the wrong situation it can act faster than your normal human hesitation layer.

That matters because production systems are asymmetric. A single bad read of state, permissions, naming, region, environment, or dependency scope can delete resources, expose data, corrupt workflows, or trigger a long chain of downstream failure. Recovery can take hours or days even when restoration succeeds. During that time the business impact keeps growing: customer trust, support load, team fatigue, delayed releases, damaged reporting, or broken integrations.

Software teams already know this principle in other contexts. We use code review because local logic is not enough. We use staging because working code is not the same as safe code. We use backups because healthy systems still fail. We use feature flags because controlled rollout beats blind deployment. AI agents do not remove the need for those practices. They increase it.

That is especially important for teams that are rapidly layering AI into broader operations. A startup that is also improving CRM, analytics, and automation workflows can easily treat AI as another efficiency lever alongside systems we already trust, such as those discussed in CRM for startups or analytics workflows that measure what matters. But production safety has to be more conservative than productivity experimentation. The cost of a wrong move is too high.

The real failure pattern: compound trust

The most important concept here is compound trust. Catastrophic automation failures rarely come from one absurdly reckless decision. They come from several individually understandable decisions that compound into a dangerous path.

A team trusts the tool because it performed well on safer tasks. They trust the command because the explanation sounds plausible. They trust the environment because it worked yesterday. They trust the backup because they assume it exists. They trust the workflow because nobody wants to interrupt momentum for another manual check. By the time a destructive command runs, the real mistake has already happened several layers earlier.

This is why postmortems that focus only on the final command are often too shallow. The last command matters, but the deeper problem is usually a stack of unreviewed assumptions. In the Terraform case, those assumptions included state visibility, scope boundaries, agent context, deletion path safety, and recovery readiness. In another environment, the same pattern could show up as cross-environment credential mix-ups, mistaken production database targets, unintended CI/CD promotion, or deleting the wrong cloud account resources.

AI agents amplify compound trust because they reduce perceived friction between steps. A human operator is more likely to pause when context-switching between plan review, state inspection, CLI cleanup, backup verification, and production approval. An agent workflow can glide from one step to the next with language that feels coherent and continuous. That is good for throughput. It is bad for safety if no one reasserts control at the point where side effects become irreversible.

Where human approval must remain non-negotiable

Human approval is not equally necessary for every AI-assisted task. Drafting internal documentation does not carry the same risk as modifying infrastructure. The practical question is where approval should remain mandatory no matter how good the tooling becomes.

The first category is destructive infrastructure actions. Any command that can delete, destroy, terminate, reinitialize, detach, or overwrite real cloud resources should remain behind manual approval. That includes obvious cases like terraform destroy, but also more subtle workflows involving replacement plans, state surgery, forceful recreation, backup retention changes, and credential scope changes.

The second category is database actions with irreversible impact. Schema drops, destructive migrations, production restores, direct write scripts, retention changes, and bulk cleanup commands should not be delegated to an agent operating on trust alone. A human should confirm the target, the environment, the exact statement or command, the rollback path, and the recovery assumption.

The third category is anything involving state, identity, or permissions. Terraform state, IAM roles, CI/CD secrets, API keys, deployment credentials, and environment selection are foundational control layers. If those are wrong, a sequence of otherwise reasonable actions can become destructive quickly.

The fourth category is automated execution beyond a sandbox. Anthropic’s own guidance emphasizes permissions, review, and sandboxing. That should be interpreted literally. If a workflow can be tested in a sandbox or staging boundary, it should be. If it cannot, then manual review should be even stricter.

What mature teams do differently

Mature teams do not reject AI because it is risky. They design around the risk. The question is not whether to use AI. The question is whether the environment makes unsafe behavior hard enough.

The first difference is state discipline. HashiCorp’s documentation on backends exists for a reason. Shared, durable backend configuration gives Terraform a consistent view of infrastructure that is not trapped on one laptop. If a team still relies on local state in production-adjacent workflows, that is not just a convenience issue. It is a structural risk issue.

The second difference is explicit destroy friction. HashiCorp documents prevent_destroy as a protection against accidentally replacing costly objects such as database instances. It is not a perfect shield, and the documentation is careful about its limits, but it exists because some resources are too expensive to trust to default behavior. Mature teams add friction intentionally around those resources.

The third difference is backup realism. AWS documentation makes clear that manual snapshots, final snapshots, retention settings, and automated backups behave differently. Serious teams do not merely “have backups.” They know which backup type survives which deletion path, whether the backup is restorable, whether it is regionally isolated, and how long it takes to bring a critical service back.

The fourth difference is approval design. Mature teams do not optimize away every prompt. They separate low-risk autonomy from high-risk authority. AI can summarize incidents, prepare plans, inspect configs, draft runbooks, or propose remediation steps. But the actual production-impacting command path remains gated, observable, and attributable.

How to use AI agents safely in production-adjacent work

The goal should not be to ban AI from production workflows. That would be unrealistic and, in many cases, counterproductive. The goal should be to use AI where it creates leverage without handing it the final say over destructive side effects.

A practical model is to let the agent handle research, inspection, summarization, diff generation, explanation, and first-pass planning. Those are high-value tasks where speed helps and mistakes are still catchable. The agent can inspect logs, compare configs, draft Terraform changes, highlight likely issues, summarize an incident timeline, or produce a proposed runbook for a migration. That is already substantial value.

The next step is human-reviewed planning. Instead of letting the agent jump from analysis to execution, the workflow should require a human to review the plan output, environment target, state alignment, and rollback path. In Terraform specifically, the plan stage should be treated as an approval object, not just a precursor to apply.

Execution should then be narrow, explicit, and reversible wherever possible. Sandboxed tasks can be more autonomous. Production tasks should be less autonomous. Read-only inspection can be widely allowed. Write access can be scoped tightly. Destructive access should demand clear human intervention.

This layered model mirrors broader software discipline. It is the same mindset behind staging, peer review, and gradual rollout. AI does not break that model. It makes the model more necessary.

A practical checklist before any AI-assisted production action

Confirm the environment explicitly: production, staging, development, account, region, and project.
Verify the source of truth: current Terraform state, backend configuration, and any recent machine changes.
Review the exact plan or diff manually before any apply or destroy path.
Check deletion protection and lifecycle guardrails on critical resources.
Confirm backup type, retention behavior, snapshot visibility, and tested restore path.
Ensure the agent is not operating with broad unattended write or shell authority where side effects matter.
Require a human to approve any destructive command, database operation, or permission change.
Log the action path so the team can audit what happened if something goes wrong.

None of these steps are glamorous. That is exactly the point. Safety in production is rarely glamorous. It is procedural, repetitive, and sometimes annoying. But that friction is often the last thing standing between a recoverable mistake and a business-level incident.

FAQs about AI agents and production approval

Are AI coding agents too dangerous for production work?

No, but they are dangerous when teams give them broad execution authority without strong review boundaries. They are most useful in planning, summarization, inspection, and first-pass automation, while destructive actions still require human control.

Was the Terraform incident caused only by the AI agent?

No. Based on the developer’s own account, the incident involved missing state context, over-trust in the automation path, destructive command execution, and backup assumptions. The AI agent was part of the chain, not the whole explanation.

Does human approval still matter if the agent is usually right?

Yes. Production safety is not about average correctness. It is about limiting the blast radius of the rare but severe mistake. Human approval matters most precisely because high-impact failures are unacceptable even if most routine actions work.

What should always require manual approval?

Destructive infrastructure commands, production database actions, state changes, identity and permission changes, and any step that could create irreversible side effects or long recovery windows.

How should teams use AI safely in DevOps and infrastructure workflows?

Use AI for research, analysis, summarization, and draft plans first. Keep the final approval and execution path narrow, explicit, logged, and controlled by humans for high-risk actions.

What is the biggest mistake teams make with AI agents?

The biggest mistake is not using them. It is assuming that a fluent explanation means the system fully understands production context. That confusion leads teams to remove the friction that was protecting them.

The real advantage is controlled leverage

AI agents are going to stay in production-adjacent work because the upside is real. They can compress research time, reduce repetitive toil, make complex systems easier to inspect, and help smaller teams operate at a much higher level. That opportunity is not fake, and it should not be ignored.

But the teams that benefit most will not be the ones that trust AI the most. They will be the ones that design the clearest boundaries around it. Human approval still matters because production systems are unforgiving, state is fragile, backups are nuanced, and polished language is not the same as safe judgment. In software, infrastructure, and operations, the winning model is not blind autonomy. It is controlled leverage.