The Remarkable Robustness of LLMs: Stages of Inference?

We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.

<span title='2024-06-27 00:00:00 +0000 UTC'>June 2024</span>&nbsp;&middot;&nbsp;Vedang Lad, Wes Gurnee, Max Tegmark