Anthropic Leaks Secret AI Recipe - Constitutional AI Exposed
Source: ai.plainenglish.io
- Anthropic accidentally leaked details of its secret AI training method called Constitutional AI during a demo.
- The method uses a "constitution" of rules to guide AI behavior, tested on models like Claude 3.5 Sonnet with strong results in safety and performance.
- This leak reveals how Anthropic builds safer AI, sparking debate on transparency in the industry.
Anthropic, the AI company behind Claude, unintentionally exposed its core technique for making AI safer and more aligned during a live demo. The method, Constitutional AI, involves giving AI a set of ethical rules - like a constitution - to self-improve without human feedback. It matters because it shows a novel way to tackle AI risks, potentially influencing competitors and regulators as AI power grows.