theAIcatchup

Architecture diagram of llm-d disaggregated inference flow on IBM Fusion HCI with prefill and decode pools

Red Hat's llm-d Splits LLM Inference in Two — And IBM Fusion HCI Makes It Stick

Everyone figured LLM serving would just scale by throwing more GPUs at monoliths. Red Hat's llm-d on IBM Fusion HCI flips that script, splitting inference brains for real enterprise muscle.

4 min read 2 hours ago

Architecture diagram of llm-d disaggregated inference on AWS GPUs with EFA interconnects

AI Hardware

AWS Disaggregates LLM Inference — llm-d Unlocks Scale

Inference bottlenecks are crumbling. AWS's llm-d rollout disaggregates prefill and decode, turning variable AI workloads into efficient machines.

3 min read 2 weeks ago

#llm-d

Red Hat's llm-d Splits LLM Inference in Two — And IBM Fusion HCI Makes It Stick

AWS Disaggregates LLM Inference — llm-d Unlocks Scale

Stay in the loop