Large Language Models

The Remarkable Robustness of LLMs: Stages of Inference?

We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, and hypothesize the existence of four universal stages of inference across eight different models.

Mechanistic Interpretability for Progress Towards Quantitative AI Safety

MIT Master’s thesis studying mechanistic interpretability as a path toward quantitative AI safety.

Exploring the Integration of AI into Physics Education: Leveraging ChatGPT for Problem Generation

We explore how large language models like ChatGPT can be leveraged to generate physics problems for educational purposes.