TITLE: Understanding the Trade-Offs Between Hallucinations and Mode Collapse in Language Generation
SPEAKER:
Grigoris Velegkas(Yale University, USA)
ABSTRACT: Specifying all desirable properties of a language model is challenging, but certain requirements seem essential. Given samples from an unknown language, the trained model should produce valid strings not seen in
the training set, and be expressive enough to capture the language's full breadth. Otherwise, outputting invalid strings constitutes "hallucination," and failing to capture the full breadth leads to "mode collapse." Recent work by Kleinberg and Mullainathan
[KM24], building on classical work on the closely related problem of language identification by Gold [Gol67] and Angluin [Ang79, 80], provides a concrete mathematical framework to study the problem of language generation. Kleinberg and Mullainathan showed
that for all countable collections of languages, it is possible to create a language model that does not hallucinate but suffers from mode collapse. They asked whether this tension between validity and breadth is inherent for language generation.
In this talk, we define various notions of breadth for language generation, and completely characterize when generation with validity and breadth is possible under each of these notions. Our results answer the question of [KM24] and show that this tension
between validity and breadth is indeed inherent for language generation. Moreover, we formalize the notion of stable generation, a natural requirement derived from Gold’s work [Gold67], and discuss when this type of generation is achievable. Finally, we discuss
the implications of our results in the universal rates setting of Bousquet, Hanneke, Moran, van Handel, and Yehudayoff [BGMvY21]. The talk is based on joint works with Alkis Kalavasis and Anay Mehrotra.
SHORT BIO: Grigoris Velegkas
is a final-year PhD student in Computer Science at Yale University, working with Prof. Amin Karbasi. Before that, he studied Electrical and Computer Engineering at the National Technical University of Athens, where he worked with Prof. Dimitris Fotakis.
His research lies at the intersection of machine learning and theoretical computer science, and focuses on three main directions: i) understanding generalization properties of ML algorithms, ii) exploring responsible use of ML systems and designing algorithms
with provable replicability guarantees, and iii) understanding the interaction between ML algorithms and mechanisms. He was a research intern at Google Research in summer 2023 and summer 2024, and a student researcher from October 2023 to May 2024.
________________________________________________________________________________
Meeting ID:
364 928 182 762
Passcode:
w24fHy
________________________________________________________________________________