What happened in 2024 that is new and significant in the world of AI ethics? The new technology developments have come in fast, but what has ethical or values implications that are going to matter long-term?
I’ve been working on updates for my 2025 class on Values and Ethics in Artificial Intelligence. This course is part of the Johns Hopkins Education for Professionals program, part of the Master’s degree in Artificial Intelligence.
Overview of my changes:
I’m doing major updates on three topics based on 2024 developments, and a number of small updates, integrating other news and filling gaps in the course.
Topic 1: LLM interpretability.
Anthropic’s work in interpretability was a breakthrough in explainable AI (XAI). We will be discussing how this method can be used in practice, as well as implications for how we think about AI understanding.
Topic 2: Human-Centered AI.
Rapid AI development adds urgency to the question: How do we design AI to empower rather than replace human beings? I have added content throughout my course on this, including two new design exercises.
Topic 3: AI Law and Governance.
Major developments were the EU’s AI Act and the raft of California legislation, including laws targeting deep fakes, misinformation, intellectual property, medical communications and minor’s use of ‘addictive’ social media, among other. For class I developed some heuristics for evaluating AI legislation, such as studying definitions, and explain how legislation is only one piece of the solution to the AI governance puzzle.
Miscellaneous new material:
I am integrating material from news stories into existing topics on copyright, risk, privacy, safety and social media/ smartphone harms.
Topic 1: Generative AI Interpretability
What’s new:
Anthropic’s pathbreaking 2024 work on interpretability was a fascination of mine. They published a blog post here, and there is also a paper, and there was an interactive feature browser. Most tech-savvy readers should be able to get something out of the blog and paper, despite some technical content and a daunting paper title (‘Scaling Monosemanticity’).
Below is a screenshot of one discovered feature, ‘syncophantic praise’. I like this one because of the psychological subtlety; it amazes me that they could separate this abstract concept from simple ‘flattery’, or ‘praise’.
What’s important:
Explainable AI: For my ethics class, this is most relevant to explainable AI (XAI), which is a key ingredient of human-centered design. The question I will pose to the class is, how might this new capability be used to promote human understanding and empowerment when using LLMs? SAEs (sparse autoencoders) are too expensive and hard to train to be a complete solution to XAI problems, but they can add depth to a multi-pronged XAI strategy.
Safety implications: Anthropic’s work on safety is also worth a mention. They identified the ‘syncophantic praise’ feature as part of their work on safety, specifically relevant to this question: could a very powerful AI hide its intentions from humans, possibly by flattering users into complacency? This general direction is especially salient in light of this recent work: Frontier Models are Capable of In-context Scheming.
Evidence of AI ‘Understanding’? Did interpretability kill the ‘stochastic parrot’? I have been convinced for a while that LLMs must have some internal representations of complex and inter-related concepts. They could not do what they do as one-deep stimulus-response or word-association engines, (‘stochastic parrots’) no matter how many patterns were memorized. The use of complex abstractions, such as those identified by Anthropic, fits my definition of ‘understanding’, although some reserve that term only for human understanding. Perhaps we should just add a qualifier for ‘AI understanding’. This is not a topic that I explicitly cover in my ethics class, but it does come up in discussion of related topics.
SAE visualization needed. I am still looking for a good visual illustration of how complex features across a deep network are mapped onto to a very thin, very wide SAEs with sparsely represented features. What I have now is the Powerpoint approximation I created for class use, below. Props to Brendan Boycroft for his LLM visualizer, which has helped me understand more about the mechanics of LLMs. https://bbycroft.net/llm
Topic 2: Human Centered AI (HCAI)
What’s new?
In 2024 it was increasingly apparent that AI will affect every human endeavor and seems to be doing so at a much faster rate than previous technologies such as steam power or computers. The speed of change matters almost more than the nature of change because human culture, values, and ethics do not usually change quickly. Maladaptive patterns and precedents set now will be increasingly difficult to change later.
What’s important?
Human-Centered AI needs to become more than an academic interest, it needs to become a well-understood and widely practiced set of values, practices and design principles. Some people and organizations that I like, along with the Anthropic explainability work already mentioned, are Stanford’s Human-Centered AI, Google’s People + AI effort, and Ben Schneiderman’s early leadership and community organizing.
For my class of working AI engineers, I am trying to focus on practical and specific design principles. We need to counter the dysfunctional design principles I seem to see everywhere: ‘automate everything as fast as possible’, and ‘hide everything from the users so they can’t mess it up’. I am looking for cases and examples that challenge people to step up and use AI in ways that empower humans to be smarter, wiser and better than ever before.
I wrote fictional cases for class modules on the Future of Work, HCAI and Lethal Autonomous Weapons. Case 1 is about a customer-facing LLM system that tried to do too much too fast and cut the expert humans out of the loop. Case 2 is about a high school teacher who figured out most of her students were cheating on a camp application essay with an LLM and wants to use GenAI in a better way.
The cases are on separate Medium pages here and here, and I love feedback! Thanks to Sara Bos and Andrew Taylor for comments already received.
The second case might be controversial; some people argue that it is OK for students to learn to write with AI before learning to write without it. I disagree, but that debate will no doubt continue.
I prefer real-world design cases when possible, but good HCAI cases have been hard to find. My colleague John (Ian) McCulloh recently gave me some great ideas from examples he uses in his class lectures, including the Organ Donation case, an Accenture project that helped doctors and patients make time-sensitive kidney transplant decision quickly and well. Ian teaches in the same program that I do. I hope to work with Ian to turn this into an interactive case for next year.
Topic 3: AI governance
Most people agree that AI development needs to be governed, through laws or by other means, but there’s a lot of disagreement about how.
What’s new?
The EU’s AI Act came into effect, giving a tiered system for AI risk, and prohibiting a list of highest-risk applications including social scoring systems and remote biometric identification. The AI Act joins the EU’s Digital Markets Act and the General Data Protection Regulation, to form the world’s broadest and most comprehensive set of AI-related legislation.
California passed a set of AI governance related laws, which may have national implications, in the same way that California laws on things like the environment have often set precedent. I like this (incomplete) review from the White & Case law firm.
For international comparisons on privacy, I like DLA Piper‘s website Data Protection Laws of the World.
What’s Important?
My class will focus on two things:
- How we should evaluate new legislation
- How legislation fits into the larger context of AI governance
How do you evaluate new legislation?
Given the pace of change, the most useful thing I thought I could give my class is a set of heuristics for evaluating new governance structures.
Pay attention to the definitions. Each of the new legal acts faced problems with defining exactly what would be covered; some definitions are probably too narrow (easily bypassed with small changes to the approach), some too broad (inviting abuse) and some may be dated quickly.
California had to solve some difficult definitional problems in order to try to regulate things like ‘Addictive Media’ (see SB-976), ‘AI Generated Media’ (see AB-1836), and to write separate legislation for ‘Generative AI’, (see SB-896). Each of these has some potentially problematic aspects, worthy of class discussion. As one example, The Digital Replicas Act defines AI-generated media as “an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.” There’s a lot of room for interpretation here.
Who is covered and what are the penalties? Are the penalties financial or criminal? Are there exceptions for law enforcement or government use? How does it apply across international lines? Does it have a tiered system based on an organization’s size? On the last point, technology regulation often tries to protect startups and small companies with thresholds or tiers for compliance. But California’s governor vetoed SB 1047 on AI safety for exempting small companies, arguing that “Smaller, specialized models may emerge as equally or even more dangerous”. Was this a wise move, or was he just protecting California’s tech giants?
Is it enforceable, flexible, and ‘future-proof’? Technology legislation is very difficult to get right because technology is a fast-moving target. If it is too specific it risks quickly becoming obsolete, or worse, hindering innovations. But the more general or vague it is, the less enforceable it may be, or more easily ‘gamed’. One strategy is to require companies to define their own risks and solutions, which provides flexibility, but will only work if the legislature, the courts and the public later pay attention to what companies actually do. This is a gamble on a well-functioning judiciary and an engaged, empowered citizenry… but democracy always is.
How does legislation fit into the bigger picture of AI governance?
Not every problem can or should be solved with legislation. AI governance is a multi-tiered system. It includes the proliferation of AI frameworks and independent AI guidance documents that go further than legislation should, and provide non-binding, sometimes idealistic goals. A few that I think are important:
- The NIST AI Risk Management Framework. NIST has a good reputation, at least within the federal government, and this framework is being used as the foundation for a lot of other work.
- The Santa Clara Principles, focused on content moderation, has some industry buy-in and specifically calls out government regulators and Mark Zuckerberg for compliance.
- The AI Moratorium addresses the longer-term, existential safety risks. ‘An Overview of AI Catastrophic Risk’ is a good follow-up.
- Professional societies have been active also; IEEE is very involved in publishing standards and has one for Ethically Aligned Design, ACM has a professional Code of Ethics.
- Microsoft Responsible AI Standards is a good example of a corporate AI document with specific requirements, tools and practices.
- Here’s a partial list of many other frameworks
Other miscellaneous topics
Here’s some other news items and topics I am integrating into my class, some of which are new to 2024 and some are not. I will:
- Include a summary of the 2024 bestseller The Anxious Generation, an important synthesis of harms related to social media, smartphones, and associated lifestyle changes. I’m a big fan of Jonathan Haidt’s work.
- Note whether a verdict on the NY Times/ OpenAI case comes out during the semester.
- Mention the cautionary tale of the OnStar/ Lexus-Nexus privacy violation in the module on privacy.
- Work in the issues of AI’s increasing power demands, and the questionable labor practices related to AI content moderators and data labelers.
- Give an assignment on the dangers of synthetic biology, starting with ‘The New Bioweapons’ article from RAND and a (paywalled) article from a Chinese research team, ‘Challenges and recent progress in the governance of biosecurity risks in the era of synthetic biology’, (DOI: 10.1016/j.jobb.2022.02.002). Students can go deeper with a ‘Special Issue on Artificial Intelligence for Synthetic Biology’.
- Include more on LLM-aided misinformation, with good starter articles ‘Combatting Misinformation in the age of LLMs’ and ‘A Survey on the use of Large Language Models (LLMs) in Fake News’. Go deeper with ‘A meta-analysis of correction effects in science-relevant misinformation.’
- Add material on the Content Authenticity Initiative and related C2PA.
- Compare student predictions to the aggregate opinions of ‘Thousands of AI Authors on the Future of AI’, an update on forecasting work that I use every year. If anyone else has a class of students and wants to compare predictions I can send a Qualtrics link.
Thanks for reading! I always appreciate making contact with other people teaching similar courses or with deep knowledge of related areas. And I also always appreciate Claps and Comments!
What I’m Updating in My AI Ethics Class for 2025 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
What I’m Updating in My AI Ethics Class for 2025
Go Here to Read this Fast! What I’m Updating in My AI Ethics Class for 2025