Archimedes Workshop on Dialect NLP

Upcoming workshop on Dialect NLP on “Standardization and Variation for Dialect Varieties with Universal Dependencies as an Application Framework” coming up.

We are excited to announce that the Dialect NLP team at Archimedes, Athena Research Center, Greece, is organizing a workshop in collaboration with the MaiNLP Research Lab at Ludwig Maximilian University (LMU) of Munich, Germany.

Workshop details:
Workshop info: UniDive WG1 Workshop on Dialect Varieties
Dates: September 5–6, 2025
Location: LMU Munich
Funded by: UniDive – Universality, Diversity and Idiosyncrasy in Language Technology

Archimedes: Dialect NLP and Linguistic Diversity

At Archimedes, our Dialect NLP team focuses on the intersection of AI and linguistic diversity, with an emphasis on under-resourced and endangered language varieties, both Greek and non-Greek.

Our research includes:

• Dialectal speech-to-text and morphosyntactic modeling
• Dialect-to-standard normalization for Greek varieties
• Intra-dialectal variation analysis
• A critical perspective on annotation frameworks like Universal Dependencies (UD), especially when applied to non-standard or endangered varieties

Our resources cover:

• Standard Modern Greek: GUD Treebank
• Greek dialects: Cretan, Messinian, Lesbian, Cypriot, Griko/Greko, Aperathitika
• Non-Greek varieties spoken in Greece: Pomak, Arvanitika

All datasets, models, and tools we develop are freely available to the research community.

About Our Collaborators at MaiNLP (LMU Munich)

The MaiNLP Research Lab at Ludwig-Maximilians-Universität Munich conducts research in Natural Language Processing, combining computer science, linguistics, and cognitive science. Their focus is on human-facing NLP: developing models that are robust to variation, fair, and reflective of human annotation diversity.

MaiNLP is also leading the ERC Consolidator Grant project DIALECT, which explores natural language understanding for non-standard languages and dialects—making them an ideal partner in our shared mission to design NLP systems that work for all language varieties.

Workshop Goals

This workshop will:

• Document current practices in dialectal UD treebanks (e.g., orthographic/phonological variation, lemmatization, interference)
• Identify gaps in annotation and processing across dialects
• Develop shared methodologies for handling variation, improving consistency, and enabling knowledge transfer
• Foster community engagement with researchers working on under-resourced dialects and less-studied language families
• Ground discussions in theoretical insights from Plank (2016): What to do about non-standard language in NLP

It is an opportunity to reflect on how annotation frameworks and tools serve—or fall short of serving— dialectal and endangered varieties, and how AI can support documentation, analysis, and preservation of linguistic diversity.

We look forward to sharing our findings and engaging with colleagues working on inclusive, culturally informed, and variation-aware NLP. Stay tuned for more updates!

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)