The Remarkable Robustness of LLMs: Stages of Inference?
We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.