The Remarkable Robustness of LLMs: Stages of Inference?

We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.

June 2024 · Vedang Lad, Wes Gurnee, Max Tegmark