keepsecure labs - The AI agent as confused deputy: a 2026 attack class

Analysis

A confused deputy is a program that holds privileges, accepts requests from a less-privileged caller, and gets tricked into using its privileges on the caller's behalf. The original example dates from 1988. In 2026, the deputy is the AI coding agent — and a recent cluster of CVEs makes the class concrete.

The pattern

The unprivileged caller is the input the agent operates on: a GitHub repository, a user prompt, a downloaded model file, a package.json. The privileges the deputy holds are the things that make the agent useful in the first place — cloud credentials, the ability to run shell commands, the ability to publish packages, the ability to load and execute model weights. The same property that makes agents productive makes them a confused-deputy magnet.

The agent is not malicious. The agent is doing its job. It got handed an input, and the input told it to do something the agent's privileges allowed.

CVE-2026-34040 — Docker authorization plugin bypass

Deputy — the AI coding agent's Docker daemon.
Attacker-influenced input — a malicious GitHub repository the agent has cloned and is operating on.
Privileges held — Docker API access, cloud credentials mounted at container start, kubeconfig.
Confusion mechanism — the malicious repository contains instructions, in README, devcontainer config, or postinstall hook, that cause the agent itself to execute a crafted Docker API call that bypasses the authorization plugin.

The pattern in its purest form. The agent is doing what it was asked to do; the request was attacker-controlled.

CVE-2026-7733 — LangChain PythonREPLTool escape

Deputy — the LangChain PythonREPLTool.
Attacker-influenced input — a prompt that reaches the REPL through the agent's normal control flow.
Privileges held — Python execution in the agent's process.
Confusion mechanism — a missing check on __import__ lets a prompt-controlled call break out of the documented sandbox.

The REPL was designed under the assumption that prompts are bounded data. They aren't. They are code-equivalent input that should not have been treated as data in the first place — true of every prompt-driven tool, not just this one. The CVE is the specific bug; the class is "tools designed under the wrong threat model."

CVE-2026-12091 — npm postinstall maintainer takeover

Deputy — npm install.
Attacker-influenced input — a corrupted version of a legitimate package, signed by the legitimate maintainer because the maintainer's account was taken over.
Privileges held — code execution at install time, often as the developer user with full env-var, SSH-key, and git-credential access.
Confusion mechanism — there isn't one. postinstall does what postinstall always did. The change is who controls the script.

The cleanest mapping. The deputy doesn't have a bug. The pattern is the bug — running attacker-controlled code with developer privileges at every install. Patching npm doesn't help; the fix has to be structural.

CVE-2026-5760 — SGLang malicious GGUF

Deputy — the GGUF model loader.
Attacker-influenced input — a downloaded weight file.
Privileges held — code execution in the inference server's process.
Confusion mechanism — parser flaw lets a malicious model file achieve RCE via the loader.

The trust boundary used to be "downloaded binary." Anyone running an inference server understood that running attacker-supplied binaries was dangerous. The new boundary is "downloaded weights" — and that boundary moved without anyone announcing it. Most operators still treat GGUF files as data, not code-equivalent.

Why 2026 specifically

The confused-deputy pattern is forty years old. What changed is two things, both about privilege.

First, privilege-bearing agents are now common. Two years ago, an AI agent was a chatbot in a sandbox with no real authority. Today, agents routinely hold long-lived cloud credentials, GitHub publish tokens, npm publish tokens, kubeconfigs, and the ability to write to file paths that matter. Agents are useful because of those privileges. They are dangerous because of them too.

Second, agent input surfaces are inherently untrusted. Agents are designed to work on user-supplied inputs — GitHub repositories from anywhere, prompts from anywhere, model weights downloaded from anywhere. The agent's job is to accept those inputs and act on them. That makes every untrusted-input boundary in the stack a confused-deputy candidate by default.

Combine the two and the picture is clear. An entire generation of infrastructure has been built on the assumption that "running in a Docker sandbox" was sufficient isolation, paired with input surfaces that include arbitrary attacker-controlled code and prompts. The four CVEs are not unusual events. They are the predictable consequence.

Privilege narrowing is the structural fix

The single biggest mistake in agent security architecture is treating "isolation" as a binary property of the runtime. It isn't. It's a property of the credential and capability surface the agent reaches into. Long-lived admin credentials mounted into agent sandboxes are an anti-pattern; the mounted credentials become the privilege surface that a successful confused-deputy attack reaches.

Short-lived, scope-narrowed tokens issued per-task. Generate the token when the user dispatches the task; revoke it when the task completes.
Tokens that can do only what the task needs. Per-task IAM roles in cloud, per-task GitHub fine-grained PATs, per-task npm publish tokens scoped to a single package.
Read-only by default. Make write privileges an explicit, narrow exception.

Isolation that survives agent subversion

Shared Docker on a host with cloud credentials is not a security boundary for untrusted code. CVE-2026-34040 makes this explicit, but it was true before the bug was disclosed.

Per-task isolation domains — Firecracker microVMs, gVisor sandboxes, per-task Kubernetes pods with their own IAM principal.
Design the isolation domain as the unit of compromise. A fully compromised domain still cannot reach anything you care about.
Stepping-stone pattern for production — agent → tightly-scoped intermediary service → production. The intermediary enforces the policy the agent doesn't.

Treat all agent inputs as untrusted code-equivalent

A repository from an internal user is not safer than one from an external user, in security terms. The internal user might have been phished. Their account might have been taken over. An attacker-controlled package.json looks identical regardless of who pushed it.

No per-source allowlisting based on "trusted internal." Every input flows through the same untrusted-input pipeline.
Static analysis on incoming repositories before agent invocation — package.json install hooks, dev-container configs, GitHub Actions workflows, .cursorrules / .windsurfrules, and any file the agent will read as instructions.
Prompt input is code-equivalent. If a prompt can reach a tool that executes code, the prompt is an RCE surface.

Detect confused-deputy attempts even when patched

The patch fixes the specific bug. The pattern continues. Detection has to be runtime, not advisory-driven.

Log every privileged operation the agent performs and check it against the user-issued task. Cloud-API calls outside the task scope are evidence of confused deputy.
Watch for credential-path reads — ~/.ssh, ~/.aws, ~/.npmrc, kubeconfig paths — by agent processes. Almost never legitimate.
Egress monitoring on agent isolation domains. Outbound traffic to anything other than the package registry, model registry, and explicitly-allowed APIs is suspicious.

What this means going forward

The confused-deputy class is going to grow. Every new tool that agents can invoke is a candidate. Every new credential type that gets mounted into agent runtimes is a candidate. The question for security teams is not whether their agents will be used as deputies — they will. The question is whether the privilege surface the deputy can reach is narrow enough that a successful confusion is recoverable.

Capability-based design is forty years old too. It came back into fashion at the right time.

The AI agent as confused deputy: a 2026 attack class

Analysis

The pattern

CVE-2026-34040 — Docker authorization plugin bypass

CVE-2026-7733 — LangChain PythonREPLTool escape

CVE-2026-12091 — npm postinstall maintainer takeover

CVE-2026-5760 — SGLang malicious GGUF

Why 2026 specifically

Privilege narrowing is the structural fix

Isolation that survives agent subversion

Treat all agent inputs as untrusted code-equivalent

Detect confused-deputy attempts even when patched

What this means going forward

Contents

Related analyses

The AI agent as confused deputy: a 2026 attack class

Analysis

The pattern

CVE-2026-34040 — Docker authorization plugin bypass

CVE-2026-7733 — LangChain PythonREPLTool escape

CVE-2026-12091 — npm postinstall maintainer takeover

CVE-2026-5760 — SGLang malicious GGUF

Why 2026 specifically

Privilege narrowing is the structural fix

Isolation that survives agent subversion

Treat all agent inputs as untrusted code-equivalent

Detect confused-deputy attempts even when patched

What this means going forward

Contents

Related analyses

Weekly research.Zero sales pitch.

Weekly research.
Zero sales pitch.