AI Security: Quality Control Trumps Prompt Engineering

AI security scanners are drowning us. Not in actionable alerts, but in a deluge of “hallucinated” findings—plausible-sounding but utterly false—or worse, complete misses.

This noise is precisely the wall the industry has been banging its head against. But what if the solution isn’t about coaxing more truth out of the models themselves, but about building a more strong process around them? That’s the bet one security engineer made, and the results are compelling.

He didn’t chase the perfect prompt; he engineered what he calls a “quality-control cage” for his Large Language Models (LLMs). The project, dubbed Veritas, treats AI agents not as infallible oracles, but as stochastic machines with an inherent, expected error rate.

The Philosophy of Designed Distrust

The core idea here is radical, yet profoundly simple: assume the AI will be wrong. Don’t try to eliminate its variation; design a system that tolerates it and still produces reviewable output. This isn’t about finding the “perfect” prompt, but about building a workflow that filters out the noise.

This is the “Yin and Yang” of Veritas. A deliberate tug-of-war between opposing forces, designed to tease out reliable findings from inherently unreliable components.

The Yang: Expansion and Recall

The first agent in Veritas’s pipeline is the hypothesis agent. Its directive is simple: recall. Think of it as a wide-angle lens scanning the code architecture, identifying every plausible entry point, trust boundary, and potential threat model. At this stage, the cost of a false positive—a hypothesis that gets refuted later—is deliberately kept low. The cost of a false negative, however, the missed vulnerability, is astronomically higher.

The Yin: Contraction and Precision

Then comes the evidence agent. Its mission is precision. This is the skeptical auditor, mandated to actively refute findings unless they are backed by specific, cited source-code evidence. It’s not looking for plausibility; it’s verifying concrete proof. It operates directly against the file content, checking for sanitizers or validation logic that the hypothesis might have overlooked.

The Secret Sauce: The Information Bottleneck

Here’s where it gets counter-intuitive. To make the overall system smarter, the second agent is intentionally made “dumber.” Before a hypothesis even reaches the evidence agent, a crucial step occurs: the reasoning behind the hypothesis—the “why”—is stripped away. This is done via a function called slim_hypotheses_for_evidence().

By stripping the reasoning between the stages, we force the evidence agent into a lower-bias verification state. It must find the exploit path itself using the provided code context, or the finding is discarded.

Why delete data? To combat anchoring bias. If an LLM sees a plausible-sounding explanation for a bug, it can latch onto that explanation and overly commit its verification to the initial hypothesis. By removing the “why,” the evidence agent is forced to operate from a blank slate, using only the code context to verify or refute the potential finding. This forces a more objective verification process.

The Deterministic Anchor: The Policy Gate

Crucially, the AI never has the final say. The ultimate verdict is rendered by a mechanical policy gate. This gate doesn’t reason; it applies a set of deterministic rules. Findings below a certain confidence score are automatically flagged as “Inconclusive.” More importantly, any critical or high-severity finding that remains “inconclusive” is flagged for human review, rather than being silently discarded.

This whole architectural shift represents a departure from the prompt-engineering arms race, which often feels like trying to outsmart a genius who’s also prone to sudden, elaborate fantasies. Instead, it’s a pragmatic application of industrial engineering principles to a probabilistic system. It acknowledges the inherent limitations of LLMs and builds a strong, fault-tolerant workflow around them. It’s not about making the AI smarter; it’s about making the system smarter by being smarter about the AI.

Is this the future of AI security testing?

This approach holds significant promise for making AI-driven security tools more reliable and less of a burden on development teams. By embracing a designed distrust and implementing rigorous quality control, Veritas demonstrates a path toward actionable insights rather than just more noise.

🧬 Related Insights

Read more: Veltrix’s $2/Day AI Agent: The Cost-First Blueprint That Actually Works
Read more: Layercache: The Node.js Cache Stack That Might End Your Hand-Rolled Nightmares

Frequently Asked Questions

What is Veritas? Veritas is a proof-of-concept multi-agent security pipeline designed to assist developers in manual code reviews by acting as a guide to identify potential security vulnerabilities.

How does Veritas improve AI reliability? It treats LLMs as unreliable components and builds a quality-control process around them, using a “Yin and Yang” approach where one agent generates hypotheses and another rigorously verifies them, with a final policy gate for deterministic decision-making.

Does this mean prompt engineering is dead for AI security? Not entirely, but this approach suggests that for critical applications like security, strong process design and statistical quality control might be more impactful than solely focusing on prompt optimization.

AI Security: Quality Control Trumps Prompt Engineering

Key Takeaways

The Philosophy of Designed Distrust

The Yang: Expansion and Recall

The Yin: Contraction and Precision

The Secret Sauce: The Information Bottleneck

The Deterministic Anchor: The Policy Gate

Is this the future of AI security testing?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

The Philosophy of Designed Distrust

The Yang: Expansion and Recall

The Yin: Contraction and Precision

The Secret Sauce: The Information Bottleneck

The Deterministic Anchor: The Policy Gate

Is this the future of AI security testing?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

Claude Code Commands: 14 Secrets Revealed

Anthropic's AI Leaked: What It Means for Your Data

Prompt Engineering: The Secret Code to Unlocking AI's True Power

MCP's Poisoned Tools: The AI Agent Security Trap

Stay in the loop

Key Takeaways