[AI@AUEB/ML Group Meeting - Seminar Series] Invited talk: Simplified and Generalized Masked Diffusion for Discrete Data

Dates
2025-02-18 12:00 - 13:00
Venue
Microsoft Teams Meeting; Artemidos 1 - Amphitheater

 

AUEB Machine Learning Reading Group: Invited Talk , Tuesday 18 February, 12:00 - 13:00

Speaker: Michalis Titsias (https://mtitsias.github.io/)
Title: "Simplified and Generalized Masked Diffusion for Discrete Data"

Location: virtually via MS Teams:
https://teams.microsoft.com/l/meetup-join/19%3ameeting_MGYxYWE3OTQtYTAyZi00N2ZlLTg0ZmItODM1MTc4YzY3NGE0%40thread.v2/0?context=%7b%22Tid%22%3a%22ad5ba4a2-7857-4ea1-895e-b3d5207a174f%22%2c%22Oid%22%3a%22072c280e-c769-43fc-9f91-f8515818eb12%22%7d

This is the 1st talk of the reading group of the AUEB ML Group. It is combined with the AI@AUEB Seminar Series and it will be live-streamed at Archimedes Research Unit (https://archimedesai.gr/en/) for physical attendees: Amphitheater Archimedes Unit (1 Artemidos str., Marousi, ground floor)

Abstract:

Masked(or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enable straining generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.75 (CIFAR-10) and 3.40 (ImageNet 64x64) bits per dimension that are better than autoregressive models of similar sizes. Our code is available at this https URL.

Speaker Bio:

Michalis Titsias works as full time Research Scientist at Google DeepMind in London, U.K. Before that, he was an Assistant Professor in the department of Informatics at Athens University of Economics and Business (AUEB), and Postdoctoral Researcher at the University of Oxford and University of Manchester. He obtained his PhD from University of Edinburgh, U.K. and, previously, he studied Computer Science at the University of Ioannina, Greece.

 

 

 
 
 
 

The project “ARCHIMEDES Unit: Research in Artificial Intelligence, Data Science and Algorithms” with code OPS 5154714 is implemented by the National Recovery and Resilience Plan “Greece 2.0” and is funded by the European Union – NextGenerationEU.

greece2.0 eu_arch_logo_en

 

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)