Archimedes Talk on "From Approximate Membership Filters to LLM Hallucination: A Rate-Distortion View" by Jingwei Li (IEOR, Columbia University)

Dates
2026-06-29 13:00 - 14:00
Venue
Archimedes 1 - Amphitheater

Abstract: 

Large language models often hallucinate with high confidence on “random facts” that lack inferable patterns. This work formalizes the memorization of such facts as a membership testing problem, connecting the discrete error metrics of Bloom-type filters with the continuous confidence scores of LLMs. In the sparse regime, the optimal memory-error tradeoff is characterized by a rate-distortion theorem: the memory required per stored fact is determined by the minimum KL divergence between score distributions on facts and non-facts. This framework gives a distinctive explanation for hallucination under an idealized setting. Even with optimal training, perfect data, and a closed-world assumption, the information-theoretically optimal strategy under limited capacity is not simply to abstain, forget, or remain uncertain, but to assign high confidence to some non-facts. Thus, hallucination emerges as a natural consequence of lossy compression. The same theorem also recovers and sharpens classical space lower bounds for Bloom-type and two-sided filters, highlighting a fundamental frontier between hallucination, over-refusal, and memory.

 
Short bio:
 
Jingwei Li is a PhD student in the Department of Industrial Engineering and Operations Research at Columbia University. Her research focuses on algorithms and data structures. Her work includes online and approximation algorithms, with a recent focus on scheduling and related problems, as well as data structures, including approximate membership filters and concurrent data structures.
 
— Online participants: 
 
 
Mon Tue Wed Thu Fri Sat Sun
1
2
3
4
5
6
7
8
10
11
12
13
14
15
16
17
18
20
21
22
25
26
27
28
30
 
 

The project “ARCHIMEDES Unit: Research in Artificial Intelligence, Data Science and Algorithms” with code OPS 5154714 is implemented by the National Recovery and Resilience Plan “Greece 2.0” and is funded by the European Union – NextGenerationEU.

greece2.0 eu_arch_logo_en

 

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)