BEGIN:VCALENDAR
VERSION:2.0
PRODID:-// - ECPv6.16.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://tilos.ai
X-WR-CALDESC:Events for 
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20260308T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20261101T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251208T100000
DTEND;TZID=America/Los_Angeles:20251208T110000
DTSTAMP:20260525T062222
CREATED:20251021T125343Z
LAST-MODIFIED:20260227T214449Z
UID:7677-1765188000-1765191600@tilos.ai
SUMMARY:TILOS-HDSI Seminar: Incentivizing Emergent Behaviors for LLMs via Reinforcement Learning
DESCRIPTION:Yi Wu\, Tsinghua University \nAbstract: Reinforcement Learning (RL) has become a powerful post-training method for eliciting advanced behaviors in large language models (LLMs). This talk presents recent results showing how RL can incentivize the emergence of LLM capabilities across three domains: (1) multi-player deduction game\, Werewolf\, where RL-trained LLM agents develop strategic behaviors and outperform strong human players; (2) agentic search\, where large-scale RL enables a 32B model to run multi-step search to answer non-trivial questions beyond commercial baselines; and (3) efficient reasoning\, where RL mitigates over-thinking and improves both reliability and compute efficiency. \nThe papers can be found at \n\nWerewolf: https://arxiv.org/abs/2310.18940 (ICML24)\, https://arxiv.org/abs/2502.04686 (ICML25)\nASearcher: https://arxiv.org/abs/2508.07976\nThinking Efficiency: https://www.arxiv.org/abs/2506.07104 (NeurIPS25)\n\nAll the projects are trained using our large-scale agentic RL system\, AReaL\, which is open-source at https://github.com/inclusionAI/AReaL with its paper at https://arxiv.org/abs/2505.24298 (NeurIPS25). \n\nYi Wu is an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS)\, Tsinghua University. He obtained his Ph.D. from UC Berkeley and was a researcher at OpenAI from 2019 to 2020. His research focuses on reinforcement learning\, multi-agent learning\, and LLM agents. His representative works include the value iteration network\, the MADDPG/MAPPO algorithm\, OpenAI’s hide-and-seek project\, and the AReaL project. He received the best paper award at NIPS 2016\, the best demo award finalist at ICRA 2024\, and MIT TR35 Asia Pacific 2025 award.
URL:https://tilos.ai/event/tilos-hdsi-seminar-with-yi-wu-tsinghua-university/
LOCATION:Qualcomm Conference Center Room B (Jacobs Hall first floor) and Virtual\, 9736 Engineers Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:TILOS Seminar Series
ATTACH;FMTTYPE=image/jpeg:https://tilos.ai/wp-content/uploads/2025/10/wu-yi.jpg
END:VEVENT
END:VCALENDAR