This AI Chatbot Has Learned the Difference Between Good and Evil

Anthropic’s Claude AI has developed a set of ethical principles through a “constitution”—and can handle prompts the size of a book.

4 min read

May 15, 2023

With artificial intelligence (AI) often generating fictitious and offensive content, Anthropic, a company helmed by former OpenAI researchers, is charting a different course—developing an AI capable of knowing what’s good and evil with minimum human intervention.

Anthropic's chatbot Claude is designed with a unique "constitution," a set of rules inspired by the Universal Declaration of Human Rights, crafted to ensure ethical behavior alongside robust functionality, along with other “ethical” norms like Apple’s rules for app developers.

The concept of a "constitution," however, may be more metaphorical than literal. Jared Kaplan, an ex-OpenAI consultant and one of Anthropic's founders, told Wired that Claude's constitution could be interpreted as a specific set of training parameters —which any trainer uses to model its AI. This implies a different set of considerations for the model, which aligns its behavior more closely with its constitution and discourages actions deemed problematic.

Anthropic’s training method is described in a research paper titled “Constitutional AI: Harmlessness from AI Feedback,” which explains a way to come up with a “harmless” but useful AI that, once trained, is able able to self-improve without human feedback, identifying improper behavior and adapting its own conduct.

“Thanks to Constitutional AI and harmlessness training, you can trust Claude to represent your company and its needs,” the company says on its official website. “Claude has been trained to handle even unpleasant or malicious conversational partners with grace.”

Notably, Claude can handle over 100,000 tokens of information—way more than ChatGPT, Bard, or any other competent Large Language Model or AI chatbot currently available.

In the realm of AI, a "token" generally refers to a chunk of data, such as a word or character, that the model processes as a discrete unit. Claude’s token capacity allows it to manage extensive conversations and complex tasks, making it a formidable presence in the AI landscape. For context, you could easily provide a whole book as a prompt, and it would know what to do.

AI and the relativism of good vs evil

The concern over ethics in AI is a pressing one, yet it's a nuanced and subjective area. Ethics, as interpreted by AI trainers, might limit the model if those rules don't align with wider societal norms. An overemphasis on a trainer's personal perception of "good" or "bad" could curtail the AI's ability to generate powerful, unbiased responses.

This issue has been hotly debated among AI enthusiasts, who both praise and criticize (depending on their own biases) OpenAI’s intervention in its own model in an attempt to make it more politically correct. But as paradoxical as it might sound, an AI must be trained using unethical information in order to differentiate what is ethical from unethical. And if the AI knows about those data points, humans will inevitably find a way to “jailbreak” the system, bypass those restrictions, and achieve results that the AI’s trainers tried to avoid.

The implementation of Claude's ethical framework is experimental. OpenAI's ChatGPT, which also aims to avoid unethical prompts, has yielded mixed results. Yet, the effort to tackle the ethical misuse of chatbots head-on, as demonstrated by Anthropic, is a notable stride in the AI industry.

Claude's ethical training encourages it to choose responses that align with its constitution, focusing on supporting freedom, equality, a sense of brotherhood, and respect for individual rights. But can an AI consistently choose ethical responses? Kaplan believes the tech is further along than many might anticipate. "This just works in a straightforward way," he said at the Stanford MLSys Seminar last week. "This harmlessness improves as you go through this process."

Helpfulness to harmlessness ratio of a model using Constitutional AI (grey) vs standard methods (colors). Image: Anthropic

Anthropic’s Claude reminds us that AI development isn't just a technological race; it's a philosophical journey. It's not just about creating AI that is more "intelligent"—for researchers on the bleeding edge, it's about creating one that understands the thin line that separates right from wrong.

Interested in learning more about AI? Check out our latest Decrypt U course, “Getting Started with AI.” It covers everything from the history of AI to machine learning, ChatGPT, ChainGPT, and more. Find out more here.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

The Best Co-Op Games of 2025 to Play With Your Pals

Gamers love competitive shooters—that much is obvious from the popularity of games like Battlefield, Call of Duty, and Fortnite. Cooperative games are often overlooked, though. They aren't quite as popular and can get washed away in the mix of ultra-popular competitive shooters and high-profile single-player games. But they offer an alternative to either staying offline or going online and getting stomped by people who have much more time to get good at the game than you do. They offer the conne...

How Pudgy Penguins Landed the Las Vegas Sphere—After Dogwifhat Couldn't

Pudgy Penguins will wrap the Las Vegas Sphere for Christmas after debuting on the glowing venue on Tuesday. The crypto-native brand's recent announcement sent some traders into a meltdown, as the community behind Solana meme coin Dogwifhat (WIF) failed to advertise on the venue earlier this year despite raising $700,000 in an attempt to do so—funds the team later refunded to contributors. A Sphere spokesperson previously told Decrypt that it would only accept crypto advertising from exchanges or...

GG Story of the Year 2025: Crypto Gaming Collapses as Funding Dries Up

This was the year that, for many, the crypto gaming dream died. But that dream didn't end on its own—it was smothered by venture capitalists who stopped believing in it, and crucially, stopped investing in it. In 2025, numerous notable crypto games shut down their operations, which Decrypt has tracked throughout the year. These closures left developers unemployed, dedicated player bases abandoned, and entire collections of digital assets rendered practically useless. Some of these titles even pr...

News

Courses

Deep Dives

Coins

Videos