Large Language Models
DPO or GRPO? Escaping SFT's Repetitive Output Trap in LLM Fine-Tuning
Your SFT-tuned model looks perfect on paper — loss converged, formats spot-on. Then production hits, and it churns out robotic repeats. Time for DPO or GRPO.