Blue Dot AI Alignment Course - Week 1
I am working on the AI Safety Fundamentals course on AI alignment from Blue Dot and so far it has exceeded my expectations and left me a lot more concerned than I expected it to. The course introduces concepts in AI safety and alignment that are more relevant now that AI models are increasing more effective and pervasive in our lives. Transformative AI (TAI) can radically improve our lives but raises questions of privacy, security, bias, harmfulness, goal alignment and other crucial topics. The course walks you through the relevant AI safety (AIS) topics in a very structured way, building on concepts from week to week. It runs over 12 weeks, with 8 weeks of content accompanied by discussion and debate with a cohort, followed by 4 weeks of project work.
This post and following posts will attempt to provide a weekly review to keep track of how my views and thoughts evolve over the duration of the course. But perhaps more importantly, it is my first attempt at writing down my thoughts and internal processings. I meant to do this during my PhD and I have a graveyard of never-published blogposts to show for it! Part of what the course does is force you to write and I found I enjoyed it as it really helped clarify my thoughts.
This post covers the first week.
Week Zero:
My initial position was apparently one of extreme ignorance, I am sad to say, yet this is possibly a common position amongst ML researchers. The growth of the AI safety community was at the edge of my awareness while I tried to focus on my own research and the completion of my PhD dissertation. Then I dabbled with mechanistic interpretation for decision transformers to help me interpret my agent's behaviour and this opened up the AIS can of worms.
While I was ignorant, I had opinions based on a very high level understanding of the need for safe AI and an acknowledgement of its impact on societies. Yes, jobs will be lost initially but that just meant we needed to adapt and that is what humans do, right? We just needed to get ahead of this and retrain people. No one wants to unleash an incomplete and therefore dangerous set of tools into the world (goodness knows the software industry has done enough of this already!). The potential for AIs to advance the hacking of the general population, companies and infrastructure or to help build weapons was concerning. But I was comfortable in the knowledge that the big model players such as OpenAI, Anthropic, Google, etc. were already focused and working on those very problems. I had not stopped to look at how hard those problems are, how they scale and how we could very quickly lose the ability to keep track of models.
My review will include my assessment of the weekly content and discussions. I have found the discussions to be very helpful, sometimes causing me to rethink my position entirely! This happened several times and is kind of what prompted me to write down my thoughts. As a result you should be aware that my views are still in development; I am a continuous learning model so expect nothing more!
PS I should mention that it took me a lot longer than the estimated 2-3 hours a week to read the content. I often required more context (enter Perplexity) and delving into any of the optional readings takes significantly more time. If you are having doubts, I highly recommend taking the course - it's very well structured and the discussions are well facilitated and time-controlled, and it is worth your time.
Week One - AI and the years ahead
Quote of the week is from Rodney Brooks: "Every time we figure out a piece of it, it stops being magical; we say, 'Oh, that's just a computation.'"
The week starts with a gentle introduction to neural networks, a broad view of what AI is and what are the types of things we should be thinking about when we reason about AI. Some of these include the usual AI and human interactions and the less usual cooperation of multiple models or agents that are embedded in all aspects of our economy and society; the slightly worrying company-AI link as it pertains to mitigation and responsibility; how to design fail-safes.
Visualising the DL revolution demonstrates the large jumps in AI due to the impact of scaling up compute. Much progress was made in 2022 which prompted all the key players to look very seriously at AGI safety.
After the gentle start, Why are people building AI systems? by Adam Jones delivers a sucker-punch.
A key takeaway from this post is that people underestimate how much the corporates that own the models are going to make and the overall impact on a country's economy, where economies are suddenly reliant on AI companies. Companies building AI agents for hire could earn a significant proportion of wages (the post quotes ~46% in the US at the time of writing). This is based on the proportion of jobs that can be performed remotely which is high in the US, UK and EU. This effect would concentrate the economic power in a few companies that own the best performing models.
Such a trajectory should place immediate focus, and pressure, on the way we regulate AI models, which could impact the development of competition in the model space. The bigger players will always be able to throw enough money and resources into regulation but we should ensure other contenders, including open source contenders, are able to compete.
An alternative that pops up in Week 3 ( spoiler: the only way for smaller players to compete is if they use AIs to help regulate other AIs) is contentious at best.
The course often asks you to read case studies or pops one into the weekly discussion. This week the case study proposed targeted the escalation and damage that advanced models could impose on the world.
The setting: a number of the best minds in the AI model industry get together and concoct the most advanced model yet. The day before it is released in the public domain, the USA receives a threat from North Korea (NK) that states it has obtained the weights for this most advanced of models and requests weapons in exchange.
The question: what could the impact be?
- The obvious answers were related to white box hacking which would be a disaster for the company if NK managed to develop jailbreaks and attacks as they suggested.
- A more chilling suggestion from my cohort was all NK had to do was threaten to release the model with weights into the public domain! The potential chaos that would ensue would be difficult to imagine!
So that was week one... it was a thrilling start and I was looking forward to week 2!
Comments
Post a Comment