
Anthropic Fellows Program for AI safety research ...
The Anthropic Fellows program provides funding and Anthropic mentorship for engineers and researchers to investigate some of Anthropic’s highest priority AI safety research questions.
Introducing the Anthropic Fellows Program
The program is focused on helping mid-career technical professionals transition into the AI safety research space, but regardless of your background or career stage, we’re open to your application.
Alignment Science Blog
We're launching the Anthropic Fellows Program for AI Safety Research, a pilot initiative designed to accelerate AI safety research and foster research talent.
Subliminal Learning: Language Models Transmit Behavioral ...
Jul 22, 2025 · This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. 📄Paper, …
The Hot Mess of AI: How Does Misalignment Scale with Model ...
February 2026 When AI systems fail, will they fail by systematically pursuing goals we do not intend? Or will they fail by being a hot mess—taking nonsensical actions that do not further any goal? 📄 Paper, 💻 …
Introducing Anthropic's Safeguards Research Team
We're currently expanding our team and seeking pragmatic research scientists and engineers with strong experimental skills. You can apply to join our team here. We also work closely with the …
Alignment Faking Revisited: Improved Classifiers and Open ...
John Hughes and Abhay Sheshadr are the main contributors and did this project as part of the MATS program. In this post, we present a replication and extension of an alignment faking model organism: