The Remarkable Robustness of LLMs: Stages of Inference?

We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.

<span title='2024-06-27 00:00:00 +0000 UTC'>June 2024</span>&nbsp;&middot;&nbsp;Vedang Lad, Jin Hwa Lee, Wes Gurnee, Max Tegmark

Mechanistic Interpretability for Progress Towards Quantitative AI Safety

MIT Master’s thesis studying mechanistic interpretability as a path toward quantitative AI safety.

<span title='2024-05-01 00:00:00 +0000 UTC'>May 2024</span>&nbsp;&middot;&nbsp;Vedang Lad

Exploring the Integration of AI into Physics Education: Leveraging ChatGPT for Problem Generation

We explore how large language models like ChatGPT can be leveraged to generate physics problems for educational purposes.

<span title='2024-04-01 00:00:00 +0000 UTC'>April 2024</span>&nbsp;&middot;&nbsp;Vedang Lad, Isaac Liao, Mohamed Abdelhafez, Peter Dourmashkin, Saif El-Adawy