Opening the AI black box: program synthesis via mechanistic interpretability

We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code.

February 2024 · Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

The Effect of Activation Functions On Superposition in Toy Models

An in-depth exploration of how different activation functions influence superposition in neural networks.

December 2023 · Vedang Lad, Timothy Kostolansky