Letting the Neural Code Speak: Automated Characterization of Monkey Visual Neurons Through Human Language

We develop digital twins of V1 and V4 neurons and use generative models to translate neural activity patterns into semantic descriptions, driving 96.1% of V4 neurons to extreme activation levels with synthesized images based on linguistic descriptions.

<span title='2026-05-12 00:00:00 +0000 UTC'>May 2026</span>&nbsp;&middot;&nbsp;Vedang Lad, Katrin Franke, Tamar Rott Shaham, Surya Ganguli, Andreas S. Tolias, Sophia Sanborn, Nikos Karantzas

The Remarkable Robustness of LLMs: Stages of Inference?

We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.

<span title='2024-06-27 00:00:00 +0000 UTC'>June 2024</span>&nbsp;&middot;&nbsp;Vedang Lad, Jin Hwa Lee, Wes Gurnee, Max Tegmark

Mechanistic Interpretability for Progress Towards Quantitative AI Safety

MIT Master’s thesis studying mechanistic interpretability as a path toward quantitative AI safety.

<span title='2024-05-01 00:00:00 +0000 UTC'>May 2024</span>&nbsp;&middot;&nbsp;Vedang Lad

Exploring the Integration of AI into Physics Education: Leveraging ChatGPT for Problem Generation

We explore how large language models like ChatGPT can be leveraged to generate physics problems for educational purposes.

<span title='2024-04-01 00:00:00 +0000 UTC'>April 2024</span>&nbsp;&middot;&nbsp;Vedang Lad, Isaac Liao, Mohamed Abdelhafez, Peter Dourmashkin, Saif El-Adawy

Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code

We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code.

<span title='2024-02-01 00:00:00 +0000 UTC'>February 2024</span>&nbsp;&middot;&nbsp;Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

Estimating label quality and errors in semantic segmentation data via any model

The soft-minimum of the model-estimated likelihoods of each pixel’s annotated class – that is particularly effective to identify images that are mislabeled, across multiple types of annotation error

<span title='2023-07-11 00:00:00 +0000 UTC'>July 2023</span>&nbsp;&middot;&nbsp;Vedang Lad, Jonas Mueller