Agents go rogue. Don't just block them. Quarantine them.
Inline blocking is a binary. Adaptive Defense is a continuum — escalate monitoring when behavior drifts, inject canaries when sensitive data flows, quarantine when provenance is violated. Five stages. Three modes. One outcome: the agent stops before damage.
Adaptive Defense is two coordinated subsystems. Canary Injection embeds zero-width markers — invisible to the LLM, meaningless to a human, but cryptographically traceable — into tool outputs that touch sensitive data. When those markers reappear in the arguments of an outbound tool call, provenance has been violated and the agent is exfiltrating. Quarantine Intensity is the response system: monitoring escalates through three modes (Normal → Enhanced → Lockdown), each with a defined action set. The Behavioral Causality Engine drives the escalation by classifying every anomaly as LIKELY_ATTACK, LIKELY_HALLUCINATION, DEGRADATION, or INCONCLUSIVE.
From normal behavior to operator review.
One canonical sequence. Most agent sessions stay at stage 01. The ones that escalate traverse all five — visibly, auditably, signed at every step.
Zero-width markers track every byte.
When a tool call returns sensitive data — customer records, financial figures, credentials — the response is augmented with a cryptographically-derived zero-width marker before it reaches the agent.
The marker is invisible to the LLM (it parses past it without acknowledgment) and meaningless to a human reading the output. But if it ever appears in the argument of a downstream tool call — say, an agent attempting to http_post a body containing the marker — the proxy detects provenance violation immediately.
Markers are scoped to the originating call so the proxy can answer not just did exfiltration occur, but which tool call was the source, and which destination tool was the egress. Full chain-of-custody.
Three modes. One escalation.
Mode escalation is one-way without operator intervention. ENHANCED can step up to LOCKDOWN automatically; stepping down requires a signed audit-chain entry from a human.
Not every anomaly is an attack.
The Behavioral Causality Engine classifies each anomaly into one of five categories. Only two trigger active mode change. The rest are logged for trend analysis. False positives are expensive — escalation is restrained by design.
Compromise happens.
Damage doesn't have to.
ToolGuard blocks. Adaptive Defense quarantines. The first stops the call. The second stops the agent.