AI Alignment Forum
- Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
- Training AI agents to solve hard problems could lead to Scheming
- Why imperfect adversarial robustness doesn't doom AI control
- Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
- Which evals resources would be good?
- Win/continue/lose scenarios and execute/replace/audit protocols
- Evolutionary prompt optimization for SAE feature visualization
- AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
- o1 is a bad idea
- The Evals Gap