-
Linguistics Reimagined

DigitaL aims to foster the application of modern digital and computational methods in the language sciences.
The project is funded by the European Union through the Erasmus Plus Grant "Linguistics reimagined: Applying digital tools in language research" (Project 2025-1-MT01-KA220-HED-000353724) from 2025 to 2027. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency (nor any other funding agencies involved). Neither the European Union nor the granting authority can be held responsible for them.
-
About
Background
DigitaL is an interdisciplinary initiative that transforms how linguistics and computer science are taught and practiced. The project creates a unified learning experience that empowers students and researchers to use modern digital and machine-learning tools in language research. Through a hybrid lecture series, expert talks, collaborative meetings, and a student conference, participants develop practical skills in data creation, curation, and contextualization.
Key outputs, including an open-access textbook, research blog, and recorded lectures, promote open science and support early career development. By fostering international collaboration between Malta, Passau and Dublin through innovative teaching practices, the project strengthens digital readiness in higher education and prepares a new generation of linguists for a data-driven research landscape.
Topics of Interest
Data Creation: A crucial step in linguistic research, data creation provides the foundation for meaningful analysis. The project will develop high-quality linguistic datasets with Nathan Hill leading efforts in Automatic Speech Transcription, improving speech data collection through machine-assisted transcription methods.
Data Curation: Ensuring linguistic datasets are structured, standardized, and accessible for long-term research. Johann-Mattis List will lead this area, applying quantitative comparative linguistics to establish standardized workflows, ensuring interoperability across languages, platforms, and disciplines.
Data Contextualization: Beyond collection and organization, linguistic data must be placed within theoretical frameworks. Jessica Nieder will lead this area, integrating computational semantics and machine-learning-driven approaches to enhance linguistic analysis.
-
Events
Upcoming Events
DigitaL Talk Series to Continue with Two Talks in April (Malta and hybrid)
The talk series (see a detailed poster with information on how to access remotely here will introduce the three key topics of DigitaL — Linguistic Reimagined in three blocks, each organized by one of the three project partners. Participation will be possible in hybrid mode and in person. To enroll for the talks, please write an email to reima@digling.org in order to obtain the necessary information of how to participate virtually or in person. For more information, see also the entry at the website of the University of Malta.
Malta Talks in April
Talk 1: Claudia Borg (University of Malta)
Date April 14, 2026 Time 17:00 CET Venue University of Malta / virtual sphere Speaker Claudia Borg Title From Words to Meanings Across Languages: How Neural Models Represent Languages Abstract How can a computer represent the meaning of a word? This talk introduces the idea of word representations, starting from familiar linguistic concepts such as collocation, semantic similarity, and polysemy. I will show how modern language technologies use patterns to build representations of words and contexts, and how multilingual models extend this idea by relating meanings across languages. I will discuss what such models can reveal about language structure, variation, and cross-linguistic similarity, as well as their limitations. Talk 2: Harald Baayen (University of Tübingen)
Date April 21, 2026 Time 17:00 CET Venue University of Malta / virtual sphere Speaker Harald Baayen Title Linguistic analysis with word embeddings: methods and examples Abstract Word embeddings are high-dimensional numeric representations of word meaning, derived from large corpora. I will first introduce the core concepts underlying embeddings. I will then show, by means of a series of examples, how one can 'data mine' embeddings for linguistic analysis, using some basic but surprisingly powerful statistical tools. One set of examples will address the semantics of number in inflectional morphology. I will show that the change from singular to plural in semantic space can depend on semantic class (English) or case (Russian, Finnish), or on whether a plural is a broken plural or a sound plural (Maltese). A second set of examples will discuss recent results indicating that embeddings can be surprisingly predictive for the fine phonetic detail with which words are articulated. Past Events
DigitaL Talk Series to Start with Two Talks in February (Dublin and hybrid)
The talk series (see a detailed poster with information on how to access remotely [here](data/poster-talk-series-2026.pdf)) will introduce the three key topics of *DigitaL — Linguistic Reimagined* in three blocks, each organized by one of the three project partners. Participation will be possible in hybrid mode and in person. To enroll for the talks, please write an email to reima@digling.org in order to obtain the necessary information of how to participate virtually or in person. For more information, see also the entry at the website of the [University of Malta](https://www.um.edu.mt/newspoint/upcomingevents/2026/digitallinguisticsreimaginedtalkseries.html).
Dublin Talks in February
Talk 1: Rolando A. Cota Solano (Youtube)
Date February 10, 2026 Time 17:00 CET Venue Trinity College Dublin / virtual sphere Speaker Rolando A. Cota Solano (Dartmouth College) Title Incorporating ASR into Language Documentation Workflows Abstract Transcription of audio and video recordings is one of the main bottlenecks in the creation of corpora for language documentation and revitalization. In this presentation we will discuss the role of automatic speech recognition (ASR) in accelerating this work, as well as new challenges that might emerge when adapting ASR to the documentation workflow. We will discuss examples from three projects to document (i) Cook Islands Māori, (ii) Bribri from Costa Rica, and (iii) languages from Southern Arizona. Talk 2: Michael Bayona
Date February 17, 2026 Time 17:00 CET Venue Trinity College Dublin / virtual sphere Speaker Michael Bayona (Trinity College Dublin) Title Speech Technologies for Under-Resourced Languages: From Development to Application Abstract This talk introduces recent speech technologies with a focus on under-resourced languages, using low-resource ASR as a central case study. It will cover the basic pipeline from data and modelling choices through evaluation and deployment and will point to approaches that are particularly useful when training data are limited. Examples will be drawn from work on Philippine languages, with the aim of keeping the discussion concept-driven and accessible to an advanced undergraduate linguistics audience. Past Events
-
Resources
Publications
Lectures
Talks
Our open talk series started in February. The talks can still be inspected on YouTube, they are listed below.
Claudia Borg (University of Malta) -- From Words to Meanings Across Languages: How Neural Models Represent Languages
Michael Bayona (Trinity College Dublin) -- Speech Technologies for Under-Resourced Languages
Rolando Coto (Dartmouth) -- Incorporating ASR into Language Documentation Workflows
Tutorials
Software
Data






