Archimedes Talk on "From Approximate Membership Filters to LLM Hallucination: A Rate-Distortion View" by Jingwei Li (IEOR, Columbia University)

9

Reading Group

[Archimedes NLP Theme: Invited Lecture] "Retrieval Augmented Large Language Models (RAG-LLMs)" by Georgios Moschovis( Informatics Department at AUEB, AUEB's NLP Group)

18:00

Archimedes 1 - Amphitheater

Archimedes NLP Theme: Invited Lecture, Tuesday 9 June, 18:00-19:30 (Greek time) Speaker: Georgios Moschovis (https://geomos.sites.aueb.gr/) Title: "Retrieval

Date : 2026-06-09

19

Archimedes Talk

Archimedes Seminar Talk on "From Prediction to Impact: Building Clinical Trust and Operational Value in Health AI" by Prof. Michael M. Zavlanos (Duke University, USA)

12:00

Archimedes 1 - Amphitheater

Abstract Artificial Intelligence is reshaping every layer of modern healthcare, from disease prediction and medical imaging to personalized medicine

Date : 2026-06-19

23

Archimedes Talk

Archimedes Talk Series on: "Towards Trustworthy AI: Understanding Memorization, Privacy, and Security in Deep Learning" by Dr. Deepak Ravikumar (Amazon, Purdue University, USA)

17:00

Archimedes 1 - Amphitheater

Abstract As deep learning systems are increasingly deployed in safety-critical domains such as healthcare, finance, and autonomous navigation, ensuring that these systems are not only

Date : 2026-06-23

24

Archimedes Talk

Archimedes Talk Series on "Learning Saliency-Preserving Latent Representations" by Prof. Stratis Ioannidis (Northeastern University, Boston, USA)

13:30

Archimedes 1 - Amphitheater

Abstract: We introduce an algorithm for learning salient feature representations through the explicit decomposition of salient and non-salient features

Date : 2026-06-24

29

Archimedes Talk

Archimedes Talk on "From Approximate Membership Filters to LLM Hallucination: A Rate-Distortion View" by Jingwei Li (IEOR, Columbia University)

13:00

Archimedes 1 - Amphitheater

Abstract: Large language models often hallucinate with high confidence on “random facts” that lack inferable patterns. This work formalizes the memorization of such facts as a membership

Date : 2026-06-29

Dates

2026-06-29 13:00 - 14:00

Venue

Archimedes 1 - Amphitheater

Abstract:

Large language models often hallucinate with high confidence on “random facts” that lack inferable patterns. This work formalizes the memorization of such facts as a membership testing problem, connecting the discrete error metrics of Bloom-type filters with the continuous confidence scores of LLMs. In the sparse regime, the optimal memory-error tradeoff is characterized by a rate-distortion theorem: the memory required per stored fact is determined by the minimum KL divergence between score distributions on facts and non-facts. This framework gives a distinctive explanation for hallucination under an idealized setting. Even with optimal training, perfect data, and a closed-world assumption, the information-theoretically optimal strategy under limited capacity is not simply to abstain, forget, or remain uncertain, but to assign high confidence to some non-facts. Thus, hallucination emerges as a natural consequence of lossy compression. The same theorem also recovers and sharpens classical space lower bounds for Bloom-type and two-sided filters, highlighting a fundamental frontier between hallucination, over-refusal, and memory.

Short bio:

Jingwei Li is a PhD student in the Department of Industrial Engineering and Operations Research at Columbia University. Her research focuses on algorithms and data structures. Her work includes online and approximation algorithms, with a recent focus on scheduling and related problems, as well as data structures, including approximate membership filters and concurrent data structures.

— Online participants:

MS-Teams, https://teams.microsoft.com/meet/373205599702109?p=grtKEheeWgjT9sVqWN

Mon

Tue

Wed

Thu

Fri

Sat

Sun

Reading Group

[Archimedes NLP Theme: Invited Lecture] "Retrieval Augmented Large Language Models (RAG-LLMs)" by Georgios Moschovis( Informatics Department at AUEB, AUEB's NLP Group)