Resources
Nail down fundamentals and explore the community.
Foundational
Intro to AI Safety
Robert Miles · 2021
A clear, accessible introduction to core AI safety concepts and motivations.
A.I. Poses 'Risk of Extinction'
New York Times · 2023
Industry leaders and experts sign statement warning about existential risks from AI.
Cold Takes on AI
Holden Karnofsky
In-depth explorations of AI risks and alignment challenges from the CEO of Open Philanthropy.
Planned Obsolescence
Ajeya Cotra & Kelsey Piper
High-level perspectives on AI safety concerns from researchers at Open Philanthropy and Vox.
Is Power-Seeking AI an Existential Risk?
Joe Carlsmith · 2023
A rigorous analysis of existential risks from advanced AI systems seeking power.
Why Geoffrey Hinton is Scared of AI
MIT Technology Review · 2023
The "godfather of AI" explains why he left Google and his concerns about the technology he helped create.
Technical
Transformer Circuits Thread
Anthropic
Collection of articles on analyzing neural network weights and interpretability.
Getting Started with Mech Interp
Neel Nanda
Beginner's guide to mechanistic interpretability with concrete steps.
The Persona Selection Model
Marks, Lindsey, Olah · Anthropic 2026
LLMs learn to simulate diverse characters during pre-training; post-training elicits a particular Assistant persona — with implications for AI psychology and development.
Constitutional AI
Anthropic · 2022
Harmlessness from AI feedback—using principles to guide model behavior.
Evaluations for Extreme Risks
DeepMind · 2023
Framework for identifying and evaluating novel AI risks before deployment.
Goal Misgeneralization
DeepMind · 2022
Analysis of how AI systems can retain capabilities while pursuing unintended objectives.
Training + Continued Engagement
Events and Training
AI Safety
Discover upcoming AI safety events, workshops, and training programs to develop your skills.
Continued Engagement Resources
AISCI
Reading sources, Twitter lists, fellowship opportunities, technical upskilling programs, and job boards to stay engaged with AI safety.