Technical Intro Fellowship

Week 4

Threat Models

Learning Objectives

Articulate the orthogonality thesis and instrumental convergence and their centrality to advanced AI risk
Connect technical alignment failures to real-world harm scenarios at scale

Core Readings

What is the Orthogonality Thesis?AISafety.info

Why Would AI Want to do Bad Things? Instrumental ConvergenceRobert Miles · 2018

Statement on AI RiskCenter for AI Safety · 2023

An Overview of Catastrophic AI RisksCenter for AI Safety · 2023

We're Not Ready for SuperintelligenceAI in Context · 2025

Recommended

AI 2027AI Futures Project · 2025

The ProblemMIRI · 2025

Intelligence and Stupidity: The Orthogonality ThesisRobert Miles · 2018

The Basic AI DrivesSteve Omohundro · 2008

Gradual DisempowermentKulveit et al. · 2025

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial AgentsNick Bostrom · 2012

Existential Risk from Power-Seeking AIJoe Carlsmith · 2022

Further Reading

Instrumental ConvergenceEliezer Yudkowsky · 2025

Why Would AI "Aim" To Defeat Humanity?Cold Takes · 2022

AI Could Defeat All Of Us CombinedCold Takes · 2022

Two Types of AI Existential Risk: Decisive and AccumulativeAtoosa Kasirzadeh · 2025

International AI Safety Report 2026International AI Safety Report · 2026

The Vulnerable World HypothesisNick Bostrom · 2019

AI-Enabled Coups: How a Small Group Could Use AI to Seize PowerForethought · 2025

Impact of AI on Cyber Threat From Now to 2027NCSC · 2025

How AI Threatens DemocracyKreps & Kriner · 2023

Can Democracy Survive the Disruptive Power of AI?Carnegie Endowment · 2024

The Authoritarian Risks of AI SurveillanceLawfare · 2025

The Operational Risks of AI in Large-Scale Biological AttacksRAND · 2024