AI Alignment Forum
- An issue with training schemers with supervised fine-tuning
- Live Theory Part 0: Taking Intelligence Seriously
- Instrumental vs Terminal Desiderata
- What is a Tool?
- Formal verification, heuristic explanations and surprise accounting
- Compact Proofs of Model Performance via Mechanistic Interpretability
- SAE feature geometry is outside the superposition hypothesis
- LLM Generality is a Timeline Crux
- Different senses in which two AIs can be “the same”
- Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data