BEGIN:VCALENDAR
VERSION:2.0
PRODID:-// - ECPv6.16.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://tilos.ai
X-WR-CALDESC:Events for 
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20260308T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20261101T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250523T110000
DTEND;TZID=America/Los_Angeles:20250523T120000
DTSTAMP:20260525T170705
CREATED:20250828T192125Z
LAST-MODIFIED:20260430T151305Z
UID:7272-1747998000-1748001600@tilos.ai
SUMMARY:TILOS Seminar: Optimal Quantization for LLMs and Matrix Multiplication
DESCRIPTION:Yury Polyanskiy\, MIT \nAbstract: The main building block of large language models is matrix multiplication\, which is often bottlenecked by the speed of loading these matrices from memory. A number of recent quantization algorithms (SmoothQuant\, GPTQ\, QuIP\, SpinQuant etc) address this issue by storing matrices in lower precision. We derive optimal asymptotic information-theoretic tradeoff between accuracy of the matrix product and compression rate (number of bits per matrix entry). We also show that a non-asymptotic version of our construction (based on nested Gosset lattices and Conway-Sloan decoding)\, which we call NestQuant\, reduces perplexity deterioration almost three-fold compared to the state-of-the-art algorithms (as measured on LLama-2\, Llama-3 with 8B to 70B parameters). Based on a joint work with Or Ordentlich (HUJI)\, Eitan Porat and Semyon Savkin (MIT EECS). \n\nYury Polyanskiy is a Cutten Professor of Electrical Engineering and Computer Science\, a member of IDSS and LIDS at MIT\, and an IEEE Fellow (2024). Yury received M.S. degree in applied mathematics and physics from the Moscow Institute of Physics and Technology in 2005 and Ph.D. degree in electrical engineering from Princeton University in 2010. His research interests span information theory\, machine learning and statistics. Dr. Polyanskiy won the 2020 IEEE Information Theory Society James Massey Award\, 2013 NSF CAREER award and 2011 IEEE Information Theory Society Paper Award.
URL:https://tilos.ai/event/tilos-seminar-optimal-quantization-for-llms-and-matrix-multiplication/
LOCATION:HDSI 123 and Virtual\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:TILOS Seminar Series
ATTACH;FMTTYPE=image/jpeg:https://tilos.ai/wp-content/uploads/2025/04/polyanskiy-yuri.jpg
END:VEVENT
END:VCALENDAR