Safety in Generative AI
The rapid advancement of generative AI has brought remarkable innovations, from creating realistic images to generating human-like text. However, this power comes with significant responsibility. AI safety is crucial to ensure these systems are used ethically, fairly, and without unintended harm. Without proper safety measures, generative AI can spread misinformation, reinforce biases, or be exploited for malicious purposes. By prioritizing AI safety, researchers and developers can build trust, create guidelines for responsible use, and mitigate risks before they affect society on a large scale. This course covers the following topics:
Introductory Videos
Part 1: Introduction to Modern Generative AI
Foundations of Generative AI
Lectures
- Lecture 1: Introduction and Basics of Deep Learning
- Lecture 2: Foundation Models and Emerging Abilities
Reading
Additional Recommended Reading
Large Language Models and Diffusion Models
Lectures
- Lecture 3: Large Language Models and Pretraining
- Lecture 4: Post-Training and Safety Alignment
Reading
Additional Recommended Reading
Generative AI as Agents
Lectures
- Lecture 5: Image Generation Models
- Lecture 6: AI Agents
Reading
Additional Recommended Reading
- Learning Transferable Visual Models from Natural Language Supervision
- Score-based Generative Modeling through Stochastic Differential Equations (SDEs)
- Gemini 2.5 Pro Capable of Winning Gold at the 2025 International Math Olympiad (IMO)
- Evaluating Large Language Models Trained on Code
- Competition-Level Code Generation with AlphaCode
- Intelligent Agents (chapter 2 of Artificial Intelligence: A Modern Approach)
- Language Agents: Foundations, Prospects, and Risks
Part 2: Safety Risks and Mitigation
Inference‑Time Adversarial Attacks
Lectures
- Lecture 7: Adversarial Examples
- Lecture 8: Jailbreaking and Prompt Injection
Reading
Additional Recommended Reading
Training/Post‑Training Time Attacks
Lectures
- Lecture 9: Data Poisoning (part 1 of 2)
- Lecture 10: Data Poisoning (part 2 of 2) and Model Collapse
Reading
Additional Recommended Reading
- Data Poisoning
- Poisoning Attacks on Neural Networks
- Targeted Backdoor Attacks Using Data Poisoning
- Clean-Label Data Poisoning Attacks
- LLM Data Poisoning
- Universal Jailbreak Backdoors with Poisoned Human Feedback
- Poisoning Language Models During Instruction Tuning
- Fine-tuning Aligned Language Models Compromises Safety
- Persistent Pre-training Poisoning of LLMs
- Backdooring Language Models at Pre-Training with Indirect Data Poisoning
- Model Collapse
Societal Risks
Lectures
- Lecture 11: Privacy and Copyright
- Lecture 12: Existential Threats
Reading
- Extracting Training Data from ChatGPT
- Foundation Models and Fair Use
- LawZero Initiative: Safe AI for Humanity
Additional Recommended Reading
- Data Privacy Risks
- Copyright
- Existential Risk
Deepfakes, Plagiarism, AI Detectors
Lectures
- Lecture 13: Towards Detecting AI-Generated Content
Reading
- CNN-generated images are surprisingly easy to spot... for now
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Additional Recommended Reading
- Are AI Text Detectors Reliable?
- AI Image Detectors
Part 3: Watermarking
Watermarking Generative AI
Lectures
- Lecture 14: Watermarking Generative AI (part 1 of 3)
- Lecture 15: Watermarking Generative AI (part 2 of 3)
Reading
Additional Recommended Reading
- Invisible Watermarks are Provably Removable Using GenAI
- Provable Robust Watermarking for AI-Generated Text
- On the Reliability of Watermarks for LLMs
- The Impossibility of Strong Watermarking for Generative Models
- Stable Signature: A new method for watermarking images created by open source generative AI
- Watermarking for AI-Generated Content
- Scalable watermarking for identifying large language model outputs
Watermarking LLMs and Beyond
Lectures
- Lecture 16: Watermarking Generative AI (part 3 of 3)
Reading
Additional Recommended Reading
