AI Tools

AI Security: Quality Control Trumps Prompt Engineering

Most AI security scanners drown developers in noise. The breakthrough? Treating LLMs not as oracles, but as unreliable machines on an assembly line, wrapped in a quality-control cage.

{# Always render the hero — falls back to the theme OG image when article.image_url is empty (e.g. after the audit's repair_hero_images cleared a blocked Unsplash hot-link). Without this fallback, evergreens with cleared image_url render no hero at all → the JSON-LD ImageObject loses its visual counterpart and LCP attrs go missing. #}
AI Security: Quality Control Trumps Prompt Engineering — The AI Catchup

Key Takeaways

  • AI security scanners often suffer from noise (false positives and misses) due to LLM limitations.
  • Veritas, a new project, treats LLMs as unreliable machines and applies statistical quality control principles.
  • The system uses a 'Yin and Yang' approach with agents for recall (hypothesis generation) and precision (evidence verification), deliberately stripping reasoning between stages to avoid bias.

AI security scanners are drowning us. Not in actionable alerts, but in a deluge of “hallucinated” findings—plausible-sounding but utterly false—or worse, complete misses.

This noise is precisely the wall the industry has been banging its head against. But what if the solution isn’t about coaxing more truth out of the models themselves, but about building a more strong process around them? That’s the bet one security engineer made, and the results are compelling.

He didn’t chase the perfect prompt; he engineered what he calls a “quality-control cage” for his Large Language Models (LLMs). The project, dubbed Veritas, treats AI agents not as infallible oracles, but as stochastic machines with an inherent, expected error rate.

The Philosophy of Designed Distrust

The core idea here is radical, yet profoundly simple: assume the AI will be wrong. Don’t try to eliminate its variation; design a system that tolerates it and still produces reviewable output. This isn’t about finding the “perfect” prompt, but about building a workflow that filters out the noise.

This is the “Yin and Yang” of Veritas. A deliberate tug-of-war between opposing forces, designed to tease out reliable findings from inherently unreliable components.

The Yang: Expansion and Recall

The first agent in Veritas’s pipeline is the hypothesis agent. Its directive is simple: recall. Think of it as a wide-angle lens scanning the code architecture, identifying every plausible entry point, trust boundary, and potential threat model. At this stage, the cost of a false positive—a hypothesis that gets refuted later—is deliberately kept low. The cost of a false negative, however, the missed vulnerability, is astronomically higher.

The Yin: Contraction and Precision

Then comes the evidence agent. Its mission is precision. This is the skeptical auditor, mandated to actively refute findings unless they are backed by specific, cited source-code evidence. It’s not looking for plausibility; it’s verifying concrete proof. It operates directly against the file content, checking for sanitizers or validation logic that the hypothesis might have overlooked.

The Secret Sauce: The Information Bottleneck

Here’s where it gets counter-intuitive. To make the overall system smarter, the second agent is intentionally made “dumber.” Before a hypothesis even reaches the evidence agent, a crucial step occurs: the reasoning behind the hypothesis—the “why”—is stripped away. This is done via a function called slim_hypotheses_for_evidence().

By stripping the reasoning between the stages, we force the evidence agent into a lower-bias verification state. It must find the exploit path itself using the provided code context, or the finding is discarded.

Why delete data? To combat anchoring bias. If an LLM sees a plausible-sounding explanation for a bug, it can latch onto that explanation and overly commit its verification to the initial hypothesis. By removing the “why,” the evidence agent is forced to operate from a blank slate, using only the code context to verify or refute the potential finding. This forces a more objective verification process.

The Deterministic Anchor: The Policy Gate

Crucially, the AI never has the final say. The ultimate verdict is rendered by a mechanical policy gate. This gate doesn’t reason; it applies a set of deterministic rules. Findings below a certain confidence score are automatically flagged as “Inconclusive.” More importantly, any critical or high-severity finding that remains “inconclusive” is flagged for human review, rather than being silently discarded.

This whole architectural shift represents a departure from the prompt-engineering arms race, which often feels like trying to outsmart a genius who’s also prone to sudden, elaborate fantasies. Instead, it’s a pragmatic application of industrial engineering principles to a probabilistic system. It acknowledges the inherent limitations of LLMs and builds a strong, fault-tolerant workflow around them. It’s not about making the AI smarter; it’s about making the system smarter by being smarter about the AI.

Is this the future of AI security testing?

This approach holds significant promise for making AI-driven security tools more reliable and less of a burden on development teams. By embracing a designed distrust and implementing rigorous quality control, Veritas demonstrates a path toward actionable insights rather than just more noise.


🧬 Related Insights

Frequently Asked Questions

What is Veritas? Veritas is a proof-of-concept multi-agent security pipeline designed to assist developers in manual code reviews by acting as a guide to identify potential security vulnerabilities.

How does Veritas improve AI reliability? It treats LLMs as unreliable components and builds a quality-control process around them, using a “Yin and Yang” approach where one agent generates hypotheses and another rigorously verifies them, with a final policy gate for deterministic decision-making.

Does this mean prompt engineering is dead for AI security? Not entirely, but this approach suggests that for critical applications like security, strong process design and statistical quality control might be more impactful than solely focusing on prompt optimization.

Written by
theAIcatchup Editorial Team

AI news that actually matters.

Frequently asked questions

What is Veritas?
Veritas is a proof-of-concept multi-agent security pipeline designed to assist developers in manual code reviews by acting as a guide to identify potential security vulnerabilities.
How does Veritas improve AI reliability?
It treats LLMs as unreliable components and builds a quality-control process around them, using a “Yin and Yang” approach where one agent generates hypotheses and another rigorously verifies them, with a final policy gate for deterministic decision-making.
Does this mean <a href="/tag/prompt-engineering/">prompt engineering</a> is dead for AI security?
Not entirely, but this approach suggests that for critical applications like security, strong process design and statistical quality control might be more impactful than solely focusing on prompt optimization.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Towards AI

Stay in the loop

The week's most important stories from The AI Catchup, delivered once a week.