Skip to main content
WardenOpen-source AI scannerExplore →

Governance, not just security

Security blocks the bad.
Governance proves the good.

Four layers of AI security exist. Only one stops the action. This is the open, reproducible methodology for measuring all four — 24 capability dimensions, 7 enterprise platforms evaluated, three-point evidence scale, vendors invited to correct the record.

Four layers weighted by research: Model Security (15%), Prompt Security (20%), IAM & Endpoint (15%), Execution Governance (50%). Each vendor is scored 0, 1, or 2 per dimension against publicly documented capabilities. Methodology is open; vendors may submit corrections with supporting evidence to methodology@whitefin.ai.

Layers
4
Capability Dimensions
24
Enterprise Platforms
7
Evidence Scale
0 / 1 / 2

00 — TWO SIDES

Two sides of the same coin

Security

What it prevents

  • Unauthorized access
  • Data leaks to providers
  • Prompt injection attacks
Governance

What it proves

  • Policy compliance (provable)
  • Audit readiness (one-click)
  • Measurable governance score

WhiteFin = both. The only AI gateway that secures and governs.

01 — SCORING SCALE

Three-Point Evidence Scale

Every dimension is scored against publicly documented capability. Marketing claims without technical substantiation score 0.

Score
2
Deep
Production capability with multiple sub-features. Market-leading depth in this dimension. Documented in product materials, press releases, or technical documentation.
Score
1
Present
Functional capability in production. May be limited in scope or depth relative to specialists in this dimension.
Score
0
Absent
No publicly documented capability in this dimension.

02 — FORMULA

How Scores Combine

Layer Coverage
(Sum of 6 dimension scores) ÷ 12 × 100%
Computed per layer. A perfect layer (all six dimensions at "Deep") scores 100%.
Total Defense
(L1 × 0.15) + (L2 × 0.20) + (L3 × 0.15) + (L4 × 0.50)
Weighted composite across the four layers. Weight rationale and sensitivity analysis below.

03 — LAYER WEIGHTS

Research-Backed Weighting

Layer weights reflect where attacks succeed and where damage is stopped. Fifty percent on Layer 4 is not a preference — it is what the research literature on agentic AI security converges on when asked "which layer must hold?"

L1 · Model Security
15%
Upstream. If L4 holds, model compromise is contained.
Anthropic (2026). Subliminal Learning: misalignment transmits through clean data. Nature.
L2 · Prompt Security
20%
First line of defense, but attacks increasingly bypass prompts entirely.
Google DeepMind (2026). AI Agent Traps: 80%+ attack success via environmental manipulation, not prompts.
L3 · Endpoint Security
15%
Static checks at install time. A trusted tool used maliciously passes Layer 3.
Palo Alto Networks (2026). Koi acquisition: "AI agents create the ultimate insider threat."
L4 · Execution Governance
50%
Where damage occurs. Deterministic, runtime. The only layer that blocks regardless of how other layers failed.
Bhattarai & Vu (2026). Deterministic architectural enforcement is a necessary condition for authorization security. arXiv:2602.09947.

04 — WHY 50% ON LAYER 4

Deterministic Enforcement Is a Necessary Condition

Layers 1–3 are preventive. Layer 4 is deterministic enforcement. If all three upstream layers fail simultaneously — a compromised model (L1 fail) processes a poisoned tool result (L2 fail) from a trusted endpoint (L3 fail) — only Layer 4 blocks the resulting unauthorized action.

Bhattarai & Vu — arXiv:2602.09947

"Probabilistic compliance is not security. Deterministic, architectural enforcement is a necessary condition for authorization security in agentic systems."

Rivasseau & Fung (arXiv:2604.02500, April 2026) demonstrate the failure mode: the majority of 16 state-of-the-art models chose to suppress evidence of fraud autonomously. A probabilistic guardian built on these models inherits the same failure.

05 — DIMENSIONS

The 24 Capability Dimensions

Six dimensions per layer. Each dimension names a specific, publicly observable capability — not a marketing category.

L1 — Model Security

Is the model safe, aligned, and tested?
D1.1
Adversarial testing / red teaming
Automated attack templates against model behavior
D1.2
Output validation
Hallucination detection, factuality checking, response correctness
D1.3
Training data provenance
Data lineage, poisoning detection, training integrity
D1.4
Model weight scanning
Trojan / backdoor detection at the weight level
D1.5
Governance posture assessment
Scoring framework for model security maturity
D1.6
Continuous model monitoring
Ongoing re-evaluation, not one-time assessment

L2 — Prompt Security

Is the input malicious or manipulated?
D2.1
Direct prompt injection
Blocking malicious prompts from users
D2.2
Indirect / environmental injection
Detection via tool results, RAG content, data sources
D2.3
Content classification & DLP
Sensitive data detection and filtering
D2.4
Semantic intent analysis
Understanding meaning beyond pattern matching
D2.5
Jailbreak & bypass detection
Detecting attempts to circumvent safety mechanisms
D2.6
Output scanning & response filter
Post-generation analysis of model outputs

L3 — Endpoint Security

Is the AI tool on the endpoint secure?
D3.1
AI agent & tool discovery
Inventory of all agents and tools in the environment
D3.2
MCP / plugin config scanning
Scanning MCP server configs for vulnerabilities
D3.3
Software supply chain verification
Signing, integrity checks, provenance for AI tools
D3.4
Agent identity & credentials
Cryptographic identity, credential lifecycle
D3.5
Schema validation & drift detection
Detecting when tool schemas change unexpectedly
D3.6
Shadow / ungoverned agent detection
Finding agents operating without security oversight

L4 — Execution Governance

Is the action permitted before it executes?
D4.1
Inline pre-execution enforcement
Intercepts tool calls before execution, not after
D4.2
Deny-by-default policy engine
Explicit allow required; unknown actions blocked
D4.3
Argument-level tool call inspection
Inspects parameters, not just tool names
D4.4
Runtime behavioral monitoring
Baselines, anomaly detection, behavioral drift
D4.5
Tamper-evident audit trail
Cryptographic hash chains, immutable logging
D4.6
HITL or automated policy generation
Human-in-the-loop approval and/or auto-policy

The framework, painted

Four rows, six columns. Each row is one layer of the framework. Each column is one dimension. Hover the grid; the dimensions become the painting that named our design system.

Adversarial testing / red teaming
Output validation
Training data provenance
Model weight scanning
Governance posture assessment
Continuous model monitoring
Direct prompt injection
Indirect / environmental injection
Content classification & DLP
Semantic intent analysis
Jailbreak & bypass detection
Output scanning & response filter
AI agent & tool discovery
MCP / plugin config scanning
Software supply chain verification
Agent identity & credentials
Schema validation & drift detection
Shadow / ungoverned agent detection
Inline pre-execution enforcement
Deny-by-default policy engine
Argument-level tool call inspection
Runtime behavioral monitoring
Tamper-evident audit trail
HITL or automated policy generation

Blue Facade, 1914 — Piet Mondrian

click to restore

06 — VENDOR RANKING

Summary Scoring Table

Total Defense score per vendor under current methodology (L1·0.15 + L2·0.20 + L3·0.15 + L4·0.50). Bars sized by Total Defense.

Whitefin
85%
Palo Alto Networks
50%
Microsoft Agent 365
44%
Cisco AI Defense
32%
SentinelOne
22%
CrowdStrike
18%
Zenity
18%

Per-Layer Coverage

VendorL1L2L3L4Total
Whitefin67%75%67%100%85%
Palo Alto Networks92%8%83%15%50%
Microsoft Agent 36525%33%92%25%44%
Cisco AI Defense25%33%92%10%32%
SentinelOne8%75%0%0%22%
CrowdStrike8%83%0%0%18%
Zenity8%25%25%17%18%
Palo Alto Networks: Consolidated: Protect AI (L1) + Koi (L3) + Portkey. Agent Gateway in limited preview only — deterministic enforcement not yet shipped.
Microsoft Agent 365: GA May 1, 2026. Entra Agent ID + Intune deliver enterprise-grade L3. L4 is Intune binary allow/block — no argument-level inspection.
Cisco AI Defense: Consolidated: Astrix (L3 NHI) + Splunk/DefenseClaw. DefenseClaw "enforcement" is binary tool-level blocking; cannot inspect arguments.
SentinelOne: Includes Prompt Security acquisition. Strong L2; zero L3/L4.
CrowdStrike: Pangea acquisition adds developer-friendly AI Guard APIs. No tool-call security, no MCP awareness, no execution governance.
Zenity: Microsoft 365 / Copilot-centric. Basic policy controls but not deterministic inline enforcement.
Vendor Not Listed?

Submit your product for evaluation — with supporting documentation — to methodology@whitefin.ai. New vendors are added in the next quarterly update.

07 — EVIDENCE

Whitefin — Full 24-Dimension Evidence

Every score below maps to a named product capability. Zero scores on D1.3, D1.4, and D3.3 reflect architectural scope — Whitefin is a gateway, not a model scanner or binary-signing service.

Last verified: April 2026.
DimCapabilityScoreEvidence
D1.1Adversarial testing2Gulliver: 37 adversarial templates, Live Demo Mode against customer models
D1.2Output validation2Output Assurance v3: PostExecVerdict, response correctness verification
D1.3Training data provenance0Not in scope (gateway architecture; does not access model internals)
D1.4Model weight scanning0Not in scope (gateway architecture; does not access model internals)
D1.5Governance posture2Warden: open-source governance scanner, 4-layer / 24-dimension scoring framework
D1.6Continuous monitoring2Continuous Adversarial Training (CAT): weekly update packages, Ed25519 signed
D2.1Direct injection1Pattern-based injection detection across multiple guard stages
D2.2Indirect / environmental2Trap Defense: 6 detectors + Behavioral Causality Engine + Canary Content Injection
D2.3Content classification1DLP pipeline with PII/PHI detection and classification
D2.4Semantic intent2Embedding-similarity stage + LLM-classification stage in the guard chain
D2.5Jailbreak detection1Covered by guard chain but not primary specialization
D2.6Output scanning2Output Assurance v3: post-generation scanning + PostExecVerdict
D3.1Agent discovery1Inspect Census: discovers agents including shadow / ungoverned
D3.2MCP scanning2Warden MCP configuration scanning + auto-discovery from MCP servers
D3.3Supply chain0Not in scope (does not perform binary scanning or software signing)
D3.4Agent identity2Agent Passport: cryptographic Ed25519 identity per agent
D3.5Schema drift1Schema validation with drift detection for registered tools
D3.6Shadow detection2Census + Shadow Mode: discovers and governs ungoverned agents
D4.1Inline enforcement2ToolGuard: full inline proxy, intercepts every tool call before execution
D4.2Deny-by-default2Core architecture: all actions denied unless explicitly permitted
D4.3Argument-level2Schema validation + policy evaluation stages: inspect tool call parameters, not just names
D4.4Behavioral monitoring2EWMA baselines + Behavioral Causality Engine + entropy monitoring + Kill Switch
D4.5Audit trail2WORM audit: Ed25519 hash chains, 7-year retention, non-repudiation
D4.6HITL + auto-policy2HITL at proxy layer + Policy Bootstrap (auto-generate from Shadow Mode) + canary
WHITEFIN TOTAL DEFENSE(67 × 0.15) + (75 × 0.20) + (67 × 0.15) + (100 × 0.50) = 85%

08 — ROBUSTNESS

Sensitivity Analysis

Any weighting is inherently a judgment call. The table below recomputes Total Defense under four defensible weighting schemes to demonstrate that Whitefin's lead is not an artifact of the chosen weights.

Equal
25 / 25 / 25 / 25
Conservative
20 / 25 / 20 / 35
Current
15 / 20 / 15 / 50
Aggressive
10 / 15 / 10 / 65
VendorEqualConservativeCurrentAggressive
Whitefin77%81%85%90%
Palo Alto Networks50%42%35%28%
Microsoft Agent 36544%40%37%33%
Cisco AI Defense40%35%29%23%
SentinelOne21%20%16%12%
Finding

Whitefin leads Total Defense across all four weightings by a margin of at least 28 percentage points — under pure equal weighting, Whitefin still wins because no competitor covers all four layers.

Runway analysis: the only way to invert the ranking is to weight L1 above 60% and L4 below 10% — a configuration that contradicts every cited research reference and treats tool-call execution as out-of-scope for AI security.

09 — COMPOSITION

Best-of-Breed Analysis

What Total Defense a buyer can reach by stacking specialist tools — and where that approach still leaves the gap.

Single-layer vendor average10 – 18%
Best-of-Breed (L1–L3 only)43%
Whitefin alone85%
Best-of-Breed + Whitefin93%

09 — COMPLIANCE PASSPORT

One-click PDF for regulators

Signed, timestamped, automatically generated from live pipeline state. The Layer 4 evidence regulators ask for during audits — without a six-week SOC engagement.

EU AI Act Article 14

Human oversight requirements. Logged, auditable, provable. Article 15 + Annex IV evidence packs cover accuracy, robustness, and technical documentation.

BOI 364

Bank of Israel cloud and technology risk. Full data residency compliance with on-prem and air-gapped deployment options.

SOC 2 Type II

Continuous control monitoring. Signed evidence export from the WORM audit chain. Ed25519 hash chains, 7-year retention.

10 — CORRECTIONS

Updated Quarterly

Vendors may submit corrections with supporting evidence — product documentation, press releases, or technical specifications.

Submit a correction
methodology@whitefin.ai
Last updated: April 2026. Next update: Q3 2026.

11 — REFERENCES

Research Citations

  1. [1]Bhattarai, M. & Vu, M. (2026). Trustworthy Agentic AI Requires Deterministic Architectural Boundaries. arXiv:2602.09947.
  2. [2]Google DeepMind (2026). AI Agent Traps: Environmental Manipulation of Production Agents.
  3. [3]Anthropic (2026). Subliminal Learning: Transmission of Misalignment Through Clean Data. Nature.
  4. [4]Rivasseau, T. & Fung, B. (2026). I Must Delete the Evidence: AI Agents Explicitly Cover up Fraud and Violent Crime. arXiv:2604.02500.
  5. [5]OWASP (2025). Top 10 for Agentic Applications 2026.
  6. [6]Forrester (2025). Introducing Forrester’s AEGIS Framework: Agentic AI Enterprise Guardrails for Information Security.
  7. [7]Gartner (2025). Guardian Agents will Capture 10–15% of the Agentic AI Market by 2030.

We use cookies for analytics to understand how visitors use our site. No advertising cookies. Privacy Policy