theAIcatchup
Large Language Models AI Tools AI Research Robotics
Computer Vision AI Hardware AI Business AI Ethics
AI Tools

#AI Inference

SageMaker console displaying reserved p5 GPU capacity for AI inference endpoint
AI Hardware

AWS SageMaker Locks in GPUs for AI Inference—Ending the Capacity Nightmare

GPU shortages derailed 35% of enterprise AI inference projects last year. AWS SageMaker's new training plans fix that—reserving p-family instances just for endpoints.

3 min read 1 week, 2 days ago
Gimlet Labs multi-silicon AI inference cloud diagram with diverse chips like NVIDIA GPU and AMD CPU
AI Hardware

$80M Gimlet Labs Cracks AI Inference with Hardware Agnosticism

AI data centers sit idle 70-85% of the time, torching billions. Gimlet Labs just raised $80M to change that—with software that juggles workloads across any silicon.

3 min read 1 week, 3 days ago
Tour of Amazon Trainium chip lab with directors Kristopher King and Mark Carroll
AI Hardware

Amazon's Trainium Lab: Nvidia Slayer or AWS Lifeline?

Amazon's flashing its Trainium chips like a shiny new toy. But after 20 years watching Valley hype, I'm asking: who's really cashing in?

4 min read 1 week, 4 days ago
Groq 3 LP30 chip rack integrated with Nvidia Vera Rubin NVL72 platform
AI Hardware

Nvidia's $20B Groq Gambit: SRAM Inferno Torches GPU-Only Inference

Nvidia just folded a startup's wild SRAM accelerator into its crown-jewel Rubin platform. Forget pure GPU racks; here's why inference is going hybrid, fast.

3 min read 2 weeks ago
Nvidia Rubin CPX GPU accelerator in disaggregated rack configuration
AI Hardware

Nvidia's Rubin CPX: Cheap Compute Muscle That Leaves Rivals Gasping

Nvidia just unveiled the Rubin CPX—a prefill beast packing 20 PFLOPS on cheap GDDR7. Competitors? They're sprinting to catch a bus that's already left the station.

3 min read 2 weeks ago
CloudWatch dashboard showing TimeToFirstToken and EstimatedTPMQuotaUsage graphs for Bedrock inference
AI Hardware

Bedrock's New Metrics End Inference Guesswork

AWS just plugged two massive holes in Bedrock monitoring. Time-to-first-token and quota burn rates, now in CloudWatch—game on for production AI.

3 min read 2 weeks ago
theAIcatchup

AI news that actually matters.

Categories

  • Large Language Models
  • AI Tools
  • AI Research
  • Robotics
  • Computer Vision
  • AI Hardware
  • AI Business
  • AI Ethics

More

  • RSS Feed
  • Sitemap
  • About
  • AI Tools
  • Advertise

Legal

  • Privacy
  • Terms
  • Work With Us

© 2026 theAIcatchup. All rights reserved.

📬

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.

No spam. Unsubscribe any time.