At the Interface of AI Safety and Neuroscience

With Tom Burns

Date

30 April 2024

Neuroscience is a burgeoning field with many opportunities for novel research directions. Due to experimental and physical limitations, however, theoretical progress relies on imperfect and incomplete information about the system. Artificial neural networks, for which perfect and complete information is possible, therefore offers those trained in the neurosciences an opportunity to study intelligence to a level of granularity which is beyond comparison to biological systems, while still relevant to them. Additionally, applying neuroscience methods, concepts, and theory to AI systems offers a relatively under-explored avenue to make headwind in the daunting challenges posed by AI safety — both for present-day risks, such as enshrining biases and spreading misinformation, and for future risks, including on existential scales. In this talk, I will present two emerging examples of interactions between neuroscience and AI safety. In the direction of ideas from neuroscience being useful for AI safety, I will demonstrate how associative memory has become a tool for interpretability of Transformer-based models. In the opposite direction, I will discuss how statistical learning theory and the developmental interpretability research program have applicability in understanding neuroscience phenomena, such as perceptual invariance and representational drift. These two examples motivate my current research in broadening and deepening the interface between these fields, especially in interpretability of biological and artificial neural networks.

At the Interface of AI Safety and Neuroscience

Fund the science of the future.

Find us on

Contact

Focus Areas

Resources

Engage

At the Interface of AI Safety and Neuroscience

Fund the science of the future.

Subscribe to our Newsletter:

Find us on

Contact

Focus Areas

Resources

Engage