🤖 PyTorch Flight Recorder Exposes NCCL Watchdog Timeout Nightmares Your GPU cluster's humming, then—crash. NCCL watchdog timeout. PyTorch's new Flight Recorder turns black-box failures into crystal-clear diagnostics. 4 min read 1 month, 3 weeks ago