Peeking Inside the Secret Lives of AI Chatbots

A team led by Adityanarayanan Radhakrishnan at the Massachusetts Institute of Technology and Mikhail Belkin at the University of California San Diego has developed a method to hunt down and manipulate hidden concepts lurking inside large language models — the AI systems behind ChatGPT, Claude and their kin. Their approach, published this week in Science, can […]

Read More

Innovative AI Steering Technique Reveals System Vulnerabilities and Paths for Enhancement

In a pioneering breakthrough for artificial intelligence research, a team of scientists has unveiled a novel technique to precisely steer the output of large language models (LLMs) by manipulating specific internal concepts encoded within these models. This innovative approach promises significant advancements in making LLMs more reliable, efficient, and adaptable, while simultaneously shedding light on […]

Read More