ABSTRACT
Large language models (LLMs) like ChatGPT are increasingly determining the course of our everyday lives. They can decide what content we are likely to see on social media. They are already being deployed in high-stakes settings, e.g. as mental health support tools. So what happens when LLMs break? What are the risks posed from LLM behavior that is misaligned from human expectations and norms? How do we mitigate the negative effects of these failures on society and prevent discriminatory decision-making?
In this talk, Professor Saadia Gabriel discusses the growing disconnect between scalability and safety in LLMs. She walks through three recent studies that highlight the need for a community-grounded approach that bridges the gap between AI systems and the users who interact with them. First, she will describe work from the UCLA Misinformation, AI & Responsible Society (MARS) lab exploring how AI agents can change the beliefs of cognitively biased users. Next, she will discuss her recent work on when LLMs learn to reason with weak supervision, which highlights the need for structured exploration during reinforcement learning to elicit reliable reasoning. Lastly, she discusses a multi-objective preference learning framework for ensuring LLM-based mental health chatbots are aligned with both clinical guidance and patient perspectives.


