Skip to main content

Posts

Featured

Training a Constitutional AI model using Hugging Face

Authors: Perusha Moodley  and Maria Kapros This post is one of several documenting our joint submission for the Blue Dot Alignment project (Oct 2024-Jan 2025) where we evaluated the impact of training  Constitutional AI (CAI)  models with alternative constitutions. The course write-up will be made available shortly.  This first post documents our experience training a CAI model using existing code and datasets provided by Hugging Face and Anthropic. The duration for the project was 1 month, working approximately 4-5 hours a week. Given the time restraints we opted to work with existing code as much as possible, therefore the Hugging Face tutorial on implementing CAI using open LLM models formed the basis of our training process.  The CAI process in a nutshell: CAI process flow (Source: Anthropic paper ) An overview of the original Anthropic CAI process flow is available in this blog post , however the  Hugging Face t...

Latest posts

Blue Dot AI Alignment Course - Week 3 - RLHF and CAI

Getting started with model evaluations

Blue Dot AI Alignment Course - Week 2

Blue Dot AI Alignment Course - Week 1