Large Language Models
LLaMA-2 70B Memory: The Math They Don't Show
Every explainer on Grouped Query Attention says the same thing. But what's really going on under the hood with LLaMA-2 70B's architecture? We break down the math.