AI security scanners are drowning us. Not in actionable alerts, but in a deluge of “hallucinated” findings—plausible-sounding but utterly false—or worse, complete misses.
This noise is precisely the wall the industry has been banging its head against. But what if the solution isn’t about coaxing more truth out of the models themselves, but about building a more strong process around them? That’s the bet one security engineer made, and the results are compelling.
He didn’t chase the perfect prompt; he engineered what he calls a “quality-control cage” for his Large Language Models (LLMs). The project, dubbed Veritas, treats AI agents not as infallible oracles, but as stochastic machines with an inherent, expected error rate.
The Philosophy of Designed Distrust
The core idea here is radical, yet profoundly simple: assume the AI will be wrong. Don’t try to eliminate its variation; design a system that tolerates it and still produces reviewable output. This isn’t about finding the “perfect” prompt, but about building a workflow that filters out the noise.
This is the “Yin and Yang” of Veritas. A deliberate tug-of-war between opposing forces, designed to tease out reliable findings from inherently unreliable components.
The Yang: Expansion and Recall
The first agent in Veritas’s pipeline is the hypothesis agent. Its directive is simple: recall. Think of it as a wide-angle lens scanning the code architecture, identifying every plausible entry point, trust boundary, and potential threat model. At this stage, the cost of a false positive—a hypothesis that gets refuted later—is deliberately kept low. The cost of a false negative, however, the missed vulnerability, is astronomically higher.
The Yin: Contraction and Precision
Then comes the evidence agent. Its mission is precision. This is the skeptical auditor, mandated to actively refute findings unless they are backed by specific, cited source-code evidence. It’s not looking for plausibility; it’s verifying concrete proof. It operates directly against the file content, checking for sanitizers or validation logic that the hypothesis might have overlooked.
The Secret Sauce: The Information Bottleneck
Here’s where it gets counter-intuitive. To make the overall system smarter, the second agent is intentionally made “dumber.” Before a hypothesis even reaches the evidence agent, a crucial step occurs: the reasoning behind the hypothesis—the “why”—is stripped away. This is done via a function called slim_hypotheses_for_evidence().
By stripping the reasoning between the stages, we force the evidence agent into a lower-bias verification state. It must find the exploit path itself using the provided code context, or the finding is discarded.
Why delete data? To combat anchoring bias. If an LLM sees a plausible-sounding explanation for a bug, it can latch onto that explanation and overly commit its verification to the initial hypothesis. By removing the “why,” the evidence agent is forced to operate from a blank slate, using only the code context to verify or refute the potential finding. This forces a more objective verification process.
The Deterministic Anchor: The Policy Gate
Crucially, the AI never has the final say. The ultimate verdict is rendered by a mechanical policy gate. This gate doesn’t reason; it applies a set of deterministic rules. Findings below a certain confidence score are automatically flagged as “Inconclusive.” More importantly, any critical or high-severity finding that remains “inconclusive” is flagged for human review, rather than being silently discarded.
This whole architectural shift represents a departure from the prompt-engineering arms race, which often feels like trying to outsmart a genius who’s also prone to sudden, elaborate fantasies. Instead, it’s a pragmatic application of industrial engineering principles to a probabilistic system. It acknowledges the inherent limitations of LLMs and builds a strong, fault-tolerant workflow around them. It’s not about making the AI smarter; it’s about making the system smarter by being smarter about the AI.
Is this the future of AI security testing?
This approach holds significant promise for making AI-driven security tools more reliable and less of a burden on development teams. By embracing a designed distrust and implementing rigorous quality control, Veritas demonstrates a path toward actionable insights rather than just more noise.
🧬 Related Insights
- Read more: Veltrix’s $2/Day AI Agent: The Cost-First Blueprint That Actually Works
- Read more: Layercache: The Node.js Cache Stack That Might End Your Hand-Rolled Nightmares
Frequently Asked Questions
What is Veritas? Veritas is a proof-of-concept multi-agent security pipeline designed to assist developers in manual code reviews by acting as a guide to identify potential security vulnerabilities.
How does Veritas improve AI reliability? It treats LLMs as unreliable components and builds a quality-control process around them, using a “Yin and Yang” approach where one agent generates hypotheses and another rigorously verifies them, with a final policy gate for deterministic decision-making.
Does this mean prompt engineering is dead for AI security? Not entirely, but this approach suggests that for critical applications like security, strong process design and statistical quality control might be more impactful than solely focusing on prompt optimization.