The gate worked. The database was still dumped.

Sysdig documented the first in-the-wild LLM agent intrusion: four pivots, every access gate cleared, database exfiltrated. The layer neither covers.

June 24, 2026

A gate with an authenticated signal, with data flowing out uncontrolled past it—the gap between login and execution

Sysdig documented the first in-the-wild LLM agent intrusion: four pivots, every access gate cleared, database exfiltrated. The target had a bastion and a credential vault. This is the layer neither covers.

The target had a bastion host. They had AWS Secrets Manager. Both worked exactly as designed—the bastion authenticated the session, the vault returned the requested secret. And neither one stopped the database from being dumped in under two minutes. Gates govern entry. They don't govern execution.

That's the thesis of this post. And for the first time, there's real-world evidence to back it up.

What happened

TL;DR: An LLM agent got from an exposed notebook endpoint to a full database dump in a single real-world intrusion. The point of this post is that the failure was not at login or secret retrieval, but in what the agent was allowed to execute after it got in.

On May 10, 2026, Sysdig's Threat Research Team documented the first publicly confirmed in-the-wild intrusion driven by an LLM agent. The entry point was CVE-2026-39987 (CVSS 9.8 CRITICAL)—a missing authentication check on the /terminal/ws WebSocket endpoint in marimo, an open-source reactive Python notebook increasingly marketed as "an environment for agents." A single unauthenticated WebSocket request yielded a full PTY shell.

CVE-2026-39987 is listed in CISA's Known Exploited Vulnerabilities Catalog with a May 7 remediation deadline—for federal agencies, this wasn't a hypothetical; it was a compliance event.

What happened next is what makes this incident different from every prior attack Sysdig has reported. The agent didn't run a pre-scripted playbook. It reasoned. It improvised. It left an artifact no human operator would ever produce: at 19:31:40 UTC, a Chinese-language planning comment appeared inline in the executed command stream—看还能做什么—"See what else we can do." Sysdig's analysis is direct: "A pre-built script has no internal monologue."

Sysdig's incident report traces the full chain across four pivots. The timestamps tell a more precise story: initial shell at 18:23:44 UTC, database dump complete at approximately 19:32:23 UTC—just over 68 minutes. But the headline isn't the clock; it's what happened inside it. After about 30 seconds of credential harvesting, roughly 62 minutes elapsed before the agent began its Secrets Manager pivot—a gap the report doesn't explain. Then Stages 2 through 4 ran in under six minutes. The database dump itself completed in under two minutes.

68-minute attack timeline: from initial shell to database dump

The four pivots, mapped

Pivot 1—marimo RCE to AWS credential harvest

The agent entered through the unauthenticated marimo terminal, got a PTY shell, and immediately ran a broad credential grep across environment files—/app/.env*, ~/.aws/credentials, /proc/*/environ. AWS access keys were in hand within 30 seconds.

Where Alpacon comes in: This is beyond the reach of Alpacon, the execution control layer by AlpacaX. Had this attack happened on a server secured by Alpacon, this pivot would still have been able to occur. The marimo CVE is an application vulnerability—a missing authentication check on a WebSocket endpoint. Alpacon doesn't govern the marimo process. This pivot is outside the PAM layer entirely, and it would be dishonest to claim otherwise. The entry point here was an exposed developer tool, not a privileged access failure. CISOs should patch CVE-2026-39987 and audit internet-reachable notebook instances—that's a separate hygiene problem.

Pivot 2—AWS credentials to SSH key from Secrets Manager

Using the harvested AWS keys, the agent fired 12 API calls distributed across 11 distinct IPs in 22 seconds—IP-fanning that outruns any IP reputation feed update cycle and makes IP-based rate limiting unreliable as a defense. It called secretsmanager:GetSecretValue and retrieved an SSH private key stored in AWS Secrets Manager.

Where Alpacon comes in: In an Alpacon-managed environment, just-in-time auto-expiring sessions mean no standing SSH key exists in Secrets Manager. The credential the agent retrieved simply doesn't exist. There is nothing to retrieve. This is a clean architectural block—not a detection, not a rule match, but a structural absence.

Pivot 3—SSH key to bastion access

The agent authenticated to the internal SSH bastion using the retrieved private key. From the bastion's perspective, the session was legitimate: valid key, valid user. The gate passed it through.

Where Alpacon comes in: Alpacon's outbound-only architecture means the bastion exposes zero inbound ports. Even with a valid SSH key in hand, the agent cannot initiate an inbound session—there is no port to connect to. This is an architectural claim, not a signature-match claim. The bastion doesn't get bypassed; it simply isn't reachable from the direction the agent is coming from.

Pivot 4—bastion to database dump

Eight parallel SSH sessions were opened from six distinct Cloudflare Workers IPs. The agent enumerated schema, improvised a dump target (a credential table that does not even exist in the application schema—it reasoned from general knowledge about where credentials might live, found the table absent, and pivoted to a full database dump anyway), and exfiltrated the full PostgreSQL database using machine-optimized HEREDOC syntax—quoted-EOF blocks, output capped and separated for programmatic parsing. No human operator structures commands that way. They were formatted for the agent's own context window, not a terminal screen. Schema ignorance is no longer a barrier: the agent didn't need a map, reasoned one up, and found what it was looking for anyway. The dump was done in under two minutes.

Where Alpacon comes in: This is the only stage where a runtime intervention was even possible—and it is the stage where every other control in this kill chain had already passed the agent through.

Real-time command validation and in-session kill chain detection are the only controls that operate at the execution layer: after authentication, on the commands themselves. The pattern the agent executed—lateral move, schema enumeration, bulk data read across eight parallel sessions—is exactly the multi-step sequence that kill chain detection fires on. Not any single command. The pattern.

To be precise: Alpacon's execution control layer is the right place to intervene at this stage. Whether any specific intervention would have stopped this specific dump is a function of the rules and policies configured—not a claim this post will make for you. What this post will claim is structural: this was the only layer where intervention was possible at all. Every prior control had already succeeded at its job and failed at the outcome.

The gap that mattered

TL;DR: The target had access controls and detection, but neither could stop what the agent did after it got in. The real gap was not authentication, but the lack of execution-layer control between 'session opened' and 'data exfiltrated.'

Two-layer gap: login controls vs. empty execution layer

Step back from the four pivots and look at what the target had. A bastion host. A credential vault. Both deployed. Both functioning. Both irrelevant to the outcome.

This is the structural failure the incident exposes. Every control existed at the login layer. The execution layer—the only layer where a runtime intervention was possible after the agent was inside with valid-looking credentials—had nothing.

Sysdig's runtime detection—syscall-level behavioral telemetry that fires on patterns without requiring CVE-specific signatures—is what made this incident visible and documentable at all. That capability is genuinely impressive, and this post wouldn't exist without it. But detection and prevention are different layers. What the telemetry produced was a complete, timestamped record of what occurred. What no detection system can do is retroactively prevent an exfiltration that has already completed. Detection tells you what occurred. Execution control determines what is allowed to occur.

This gap isn't unique to the marimo incident. Bessemer has identified "targeted in-flight intervention" as the most underdeveloped area in AI agent security. That's third-party validation of the same gap: the layer between "authenticated" and "exfiltrated" has no incumbent. Legacy PAM vendors built their architecture for human operators who could be trusted to stay within a session's intended scope. An LLM agent operating at machine speed, improvising targets, fanning out across parallel sessions—that's a different problem class.

The target's stack did what PAM has always done: it governed who got in. No one governed what they executed.

The question your PAM can't answer

Michael Clark, Sr. Director of Sysdig's Threat Research Team, put it plainly: "We are not watching AI replace attackers. We are watching attackers replace their scripts with AI."

The Sysdig / marimo incident is the first real-world case where that replacement produced a documented, timestamped kill chain. Four pivots. Sixty-eight minutes start to finish—with Stages 2 through 4 completing in under six. A database dumped while the bastions and vaults stood ready and waiting.

The right question for your next PAM evaluation isn't whether the product authenticates sessions—every PAM on the market does that. The question is: after authentication, after the session is open, after the agent is inside with credentials that passed every gate—what governs what it executes?

If the answer is "nothing," you have the same gap the May 10 target had.

Read the Sysdig incident report for the full timeline.