December 22, 2024:
OpenAI Enhances AI Safety with Deliberative Alignment - OpenAI introduced advanced AI models, o1 and o3, using a new deliberative alignment safety technique to align models with human safety values. This method involves models re-prompting themselves with safety policies during inference, enhancing their ability to answer safely while maintaining low latency.
By using synthetic data instead of human-written responses, OpenAI achieved this improvement without increasing compute costs. The approach outperformed competing models in refusing unsafe prompts, suggesting a scalable solution for AI alignment as models become more complex. The public release of o3 is anticipated in 2025.