Training a Constitutional AI model using Hugging Face
Authors: Perusha Moodley and Maria Kapros This post is one of several documenting our joint submission for the Blue Dot Alignment project (Oct 2024-Jan 2025) where we evaluated the impact of training Constitutional AI (CAI) models with alternative constitutions. The course write-up will be made available shortly. This first post documents our experience training a CAI model using existing code and datasets provided by Hugging Face and Anthropic. The duration for the project was 1 month, working approximately 4-5 hours a week. Given the time restraints we opted to work with existing code as much as possible, therefore the Hugging Face tutorial on implementing CAI using open LLM models formed the basis of our training process. The CAI process in a nutshell: CAI process flow (Source: Anthropic paper ) An overview of the original Anthropic CAI process flow is available in this blog post , however the Hugging Face t...