AI Hardware
LLM Inference Unmasked: Prefill's Parallel Power and KV Cache's Clever Hack
Everyone figured LLMs recompute your whole prompt for every word. Wrong. Prefill and KV cache flip that script, slashing compute while scaling to novels.