Studying Multilingual Neural Models with Symbolic Approaches

Studying Multilingual Neural Models with Symbolic Approaches

Despite the notable success of neural multilingual languages models (MLMs) in language AI, their internal mechanics remain quite unclear; they simply train models on data from as many languages as possible.

The community lacks answers to crucial questions: How is each language and their interactions modeled internally? Which linguistic phenomena are represented, and why? What facilitates cross-lingual transfer? What type of biases do languages rich in data exert on the representation of less-resourced languages and dialects? And, no language is a monolith; within the same language variation results from sources such as regional, social class and mode-of-usage differences. MLMs have largely ignored language variation because they rely on large amounts of data, which only few standardised widely spoken languages can provide.  By treating less resourced varieties as noise, they neglect both the scientific evidence they encapsulate and the millions of their speakers. We will turn to symbolic approaches to understand the inner workings of large MLMs. New insights will guide our effort to inject linguistic knowledge into neural models, aiming to learn how to profit from human expertise and work with sparse data. We will work with multiple low-resource language varieties, particularly the severely technologically under-served Greek ones.

 
 

The project “ARCHIMEDES Unit: Research in Artificial Intelligence, Data Science and Algorithms” with code OPS 5154714 is implemented by the National Recovery and Resilience Plan “Greece 2.0” and is funded by the European Union – NextGenerationEU.

greece2.0 eu_arch_logo_en

 

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)