AI Research

Microsoft at NSDI '26: AI, Networks, Cloud Advances

Microsoft dropped its latest research at NSDI '26, and it’s not just more clouds. They're weaving AI deeper into the fabric of everything, from how data centers breathe to how LLMs actually run.

Abstract visualization of interconnected nodes and data flow, representing networked systems.

Key Takeaways

  • Microsoft's NSDI '26 research focuses on smarter, more efficient large-scale networked systems.
  • DroidSpeak promises up to 4x higher throughput for LLMs by enabling KV cache sharing.
  • Eywa uses LLMs to find bugs in network protocol implementations, discovering 16 new issues.
  • Octopus offers a switch-free design for memory pods, aiming for lower cost and faster speeds.
  • AVA tackles massive video analytics tasks, achieving 75.8% accuracy on benchmark videos exceeding 10 hours.

Everyone expects tech giants to churn out more cloud services and faster chips. Standard stuff. But this year at NSDI ’26, Microsoft’s research papers suggest a more fundamental shift.

They’re not just scaling up; they’re making the very infrastructure smarter, faster, and frankly, more apologetic. Think less brute force, more elegant engineering. Large-scale networked systems, the invisible backbone of everything from your streaming service to the latest AI chatbot, were the focus.

And Microsoft, a perennial sponsor, is clearly trying to remind everyone it still builds the plumbing. Eleven papers. Spanning networks, AI systems, and cloud guts. It’s a boast, sure, but with some genuinely interesting tech.

The LLM Cache Cache

One gem is DroidSpeak. This isn’t about making AI talk like your grandma. It’s about making LLMs share and reuse their memory—the KV cache, if you’re fancy. The claim? Up to 4x higher throughput. Faster responses. Minimal quality hit. If this scales, it’s huge for anyone running these massive models. It means fewer GPUs, less power, more uptime. Pretty practical.

DroidSpeak enables LLMs with the same architecture to share and partially reuse KV caches across models, delivering up to 4 times higher throughput and faster responses with minimal impact on output quality.

And here’s the thing: this isn’t just theoretical hand-waving. This is about real-world performance. Imagine your AI assistants responding instantly, not with a polite “one moment, please.” It’s the difference between a useful tool and a glorified loading screen.

When AI Cracks Codes

Then there’s Eywa. This system uses LLMs to, get this, build protocol models from natural language. It’s like teaching a computer to read network specs and then have it test the actual code. It already sniffed out 33 bugs, 16 of them brand spanking new. Network protocols are notoriously fiddly. Buggy ones can crash systems, leak data, the whole unpleasant shebang. Automating this kind of deep analysis? That’s smart. That’s the kind of AI application that actually makes things more secure and reliable. It’s AI as a meticulous, tireless auditor.

Goodbye, Network Switches?

Octopus is another head-scratcher. It’s a switch-free design for disaggregated memory pods. What does that even mean? Less hardware, lower cost, and faster communication between memory modules, even across racks. They’re touting speeds 3.2x faster than standard in-rack RDMA. If you’re building massive data centers, every percentage point in efficiency matters. This could be a significant cost-saver and performance booster. It’s a bold statement against the established wisdom of network switches.

Keeping the Lights On (and the Data Flowing)

HEDGE tackles optical networks. Think fiber optics, but for massive data transfers. It fights wavelength-specific faults. When one part of the light spectrum goes wonky, HEDGE keeps the whole show running. Stable capacity. Optimized traffic. Reduced disruptions. In a world addicted to constant connectivity, this isn’t just an upgrade; it’s insurance.

Video Analytics for the Insane

And AVA? This thing handles open-ended video analytics. It combines knowledge graphs with AI agents to sift through hours of footage. They even built a benchmark—AVA-100—with videos exceeding 10 hours. Accuracies around 75%. For that kind of data? That’s not trivial. It’s the kind of work that fuels everything from surveillance to scientific research. It’s AI tackling the truly gargantuan datasets.

Live Migration Gets Spicy

Pyrocumulus is all about fast, low-overhead live migration for storage VMs. Think moving a massive virtual machine from one server to another while it’s still running. They’re using FPGA SmartNICs and some clever protocol designs. If you’ve ever experienced downtime because of server maintenance, you’ll appreciate this. It’s about keeping services online, no matter what.

Spanning Trees Reimagined

Finally, ForestColl. It constructs optimal broadcast/aggregation spanning trees. Sounds dry, but it’s about how data is distributed and collected efficiently in large networks. It’s theoretically optimal and scalable. This is the kind of foundational network engineering that makes everything else possible. It’s the hidden gears that keep the machine turning.

Microsoft is throwing a lot of research at the wall here. Some of it is certainly niche. But taken together, it paints a picture of a company deeply invested in the very foundations of computing. They aren’t just selling AI services; they’re building the infrastructure that makes them viable at scale. It’s a quiet, but critical, battle for the future of digital infrastructure.


🧬 Related Insights

Written by
theAIcatchup Editorial Team

AI news that actually matters.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Microsoft Research AI

Stay in the loop

The week's most important stories from The AI Catchup, delivered once a week.