Archimedes Workshop on Dialect NLP
25 August 2025
Upcoming workshop on Dialect NLP on “Standardization and Variation for Dialect Varieties with Universal Dependencies as an Application Framework” coming up. We are excited to announce that the Dialect NLP team at Archimedes, Athena Research Center, Greece, is organizing a workshop in collaboration with the MaiNLP Research Lab at Ludwig Maximilian University (LMU) of Munich. Workshop details:Workshop info: UniDive WG1 Workshop on Dialect Varieties Dates: September 5–6, 2025 Location: LMU Munich Funded by: UniDive – Universality, Diversity and Idiosyncrasy in Language Technology Archimedes: Dialect NLP and Linguistic Diversity At Archimedes, our Dialect NLP team focuses on the intersection of AI and linguistic diversity, with an emphasis on under-resourced and endangered language varieties, both Greek and non-Greek. Our research includes: • Dialectal speech-to-text and morphosyntactic modeling • Dialect-to-standard normalization for Greek varieties • Intra-dialectal variation analysis • A critical perspective on annotation frameworks like Universal Dependencies (UD), especially when applied to non-standard or endangered varieties Our resources cover: • Standard Modern Greek: GUD Treebank • Greek dialects: Cretan, Messinian, Lesbian, Cypriot, Griko/Greko, Aperathitika • Non-Greek varieties spoken in Greece: Pomak, Arvanitika All datasets, models, and tools we develop are freely available to the research community. About Our Collaborators at MaiNLP (LMU Munich) The MaiNLP Research Lab at Ludwig-Maximilians-Universität Munich conducts research in Natural Language Processing, combining computer science, linguistics, and cognitive science. Their focus is on human-facing NLP: developing models that are robust to variation, fair, and reflective of human annotation diversity. MaiNLP is also leading the ERC Consolidator Grant project DIALECT, which explores natural language understanding for non-standard languages and dialects—making them an ideal partner in our shared mission to design NLP systems that work for all language varieties. Workshop Goals This workshop will: • Document current practices in dialectal UD treebanks (e.g., orthographic/phonological variation, lemmatization, interference) • Identify gaps in annotation and processing across dialects • Develop shared methodologies for handling variation, improving consistency, and enabling knowledge transfer • Foster community engagement with researchers working on under-resourced dialects and less-studied language families • Ground discussions in theoretical insights from Plank (2016): What to do about non-standard language in NLP It is an opportunity to reflect on how annotation frameworks and tools serve—or fall short of serving— dialectal and endangered varieties, and how AI can support documentation, analysis, and preservation of linguistic diversity. We look forward to sharing our findings and engaging with colleagues working on inclusive, culturally informed, and variation-aware NLP. Stay tuned for more updates!