theAIcatchup

Apple Silicon just got a memory boost that LLMs crave. TurboQuant's 5x KV cache squeeze on MLX changes the game for on-device inference.

#KV cache quantization