[Archimedes NLP Theme Meeting] Knowledge Awareness in LLMs

14

Archimedes Talk

Archimedes BioMed Talk on "A New World of Science" by Prof. Sotos Tsaftaris (University of Edinburgh and Archimedes Unit, Athena RC)

12:00

Archimedes 1 - Amphitheater

Abstract: Artificial intelligence is transforming biology not simply by providing better analytical tools, but by reshaping the scientific process itself. This talk argues that AI is

Date : 2026-07-14

Dates

2025-04-07 17:30 - 18:30

Venue

Archimedes Amphitheater

Archimedes NLP Theme Meeting: Paper discussion, Monday 7 April, 17:30-18:30 (Greek time)

Moderator: Chryssa Zerva

Title: "Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models" (https://arxiv.org/abs/2411.14257)

Room: Amphitheater, Archimedes Unit (1 Artemidos str., ART1 building, ground floor)

and virtually via Zoom Meetings - https://zoom.us/j/96498340180?pwd=1gtGmNamcHvesTyb07nFWLkALL68zp.1

• Meeting ID: 964 9834 0180

• Passcode: I0Tggr

Check for additional ways to participate/dial in at the bottom of the announcement.

Abstract:

Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. Sparse autoencoders uncover meaningful directions in the representation space, these detect whether the model recognizes an entity, e.g. detecting it doesn't know about an athlete or a movie. This suggests that models can have self-knowledge: internal representations about their own capabilities. These directions are causally relevant: capable of steering the model to refuse to answer questions about known entities, or to hallucinate attributes of unknown entities when it would otherwise refuse. We demonstrate that despite the sparse autoencoders being trained on the base model, these directions have a causal effect on the chat model's refusal behavior, suggesting that chat finetuning has repurposed this existing mechanism. Furthermore, we provide an initial exploration into the mechanistic role of these directions in the model, finding that they disrupt the attention of downstream heads that typically move entity attributes to the final token.

Stay tuned!

For ways to receive news about the NLP Group and its meetings, as well as to get check the latest information about the meetings of Archimedes NLP Theme and AUEB NLP Group, check http://nlp.cs.aueb.gr/news.html. To subscribe to the mailing list of AUEB NLP Group, send a message with subject "subscribe" to This email address is being protected from spambots. You need JavaScript enabled to view it.. If you have an AUEB account and want to receive announcements about AUEB NLP Group Meetings through MS Teams, subscribe to "AUEB NLP Group meetings" group on MS Teams (code: 01j65ny). Team members can also send text messages (chat) to other team members.

If you are an AI researcher or practitioner, please consider becoming a member of the Hellenic Artificial Intelligence Society (EETN, http://www.eetn.gr/en/).

Dial-in information:

One tap mobile

+302111984488,,96498340180#,,,,*156740# Greece

+302311180599,,96498340180#,,,,*156740# Greece

+351308810988,,96498340180#,,,,*156740# Portugal

+351211202618,,96498340180#,,,,*156740# Portugal

Dial by your location

• +30 211 198 4488 Greece

• +30 231 118 0599 Greece

• +351 308 810 988 Portugal

• +351 211 202 618 Portugal

• +351 308 804 188 Portugal

Meeting ID: 964 9834 0180

Passcode: 156740

Find your local number: https://zoom.us/u/aH70sCHyk

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)