The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
This paper describes the first year of the Indigenous Languages Technology (ILT) project which offers a software to support Indigenous communities in preserving and expanding the use of their native language.
Kuhn, R. et al. (2020). The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software. Proceedings of the 28th International Conference on Computational Linguistics, p. 5866–5878. Available at: https://aclanthology.org/2020.coling-main.516.pdf
Community consultations, especially with people involved with education.
Data collection, transcription, and audio segmentation.
Data analysis of linguistic specificities to each language included in the project.
Focus groups to test the efficiency of the software, for example meetings to test and gather feedback on the audiobooks and automatic speech recognition.
Interviews with community members, especially educators and elders
The Indigenous Languages Technology project focuses on providing Indigenous communities with a variety of digital technologies that support their efforts of protecting, documenting, and expanding their languages. Based on collaborative methods that center Indigenous needs and concerns, researchers used their knowledge to build tools that will facilitate Indigenous access to written and audio materials in their native language as well as be able to use their language in digital interfaces.
The project was guided by an Advisory Committee made up of Indigenous language revitalization experts. During all stages of the project, Indigenous communities were the sole owners of any data collected and produced by the project.
The Indigenous Languages Technology (ILT) project provided communities with the tools and training to use and expand their own forms of preserving their languages. Researchers participating in this project were also invested in building internal capacity to make their role obsolete and give the Indigenous communities the autonomy to work and experiment with using digital technologies to preserve their culture.
Documentation produced in the community consultations and meetings.
Language learning technologies, especially audio sources for speech-based technologies.
Community knowledge on the topic.
Indigenous languages grammar books and databases.
Academic articles on varied Indigenous languages, machine learning, and cultural preservation.
Through collaborative research the Indigenous Languages Technology (ILT) offered many tools to mobilize and expand linguistic research on Indigenous languages: polysynthetic verb conjugation, word weaver software, verb conjugator for Michif, corpus and tools for Inuktut (including machine translation), predictive text software, work at CRIM on audio segmentation and speech recognition, Automatic Speech Recognition (ASR), read-along audiobooks, enhancement of online language courses for East Cree and Innu, development of online courses for Plains Cree, Kwak’wala, Michif, and Naskapi, improvements to a role-playing game with Swampy Cree content, training Indigenous language activists in data collection methodologies, and data collection efforts for Plains Cree, Kanyen’kéha, Kwak’wala, Michif, Nsyilxcn, SENĆOŦEN, Tŝilhqot’in, and Tsuut’ina.
“The resilience of Indigenous communities can be seen in the many ways that they have resisted assimilation and continued to teach, learn, and speak their languages. The benefits associated with the use of these languages are wide-ranging. For instance, there is a correlation between Indigenous language use and a decrease in youth suicide rates on reserves in British Columbia. However, many communities face decreasing numbers of first language (mother tongue) speakers, due to declining language transmission rates. Much Indigenous language revitalization work in Canada focuses on preservation of language through recording the speech of Elders: recordings and transcriptions are a vital resource for language learning by younger generations.” (p. 5868)
“There are many interesting research themes that have the potential to help under-resourced languages. Examples in this project include implementing verb conjugators, machine translation, and speech recognition for polysynthetic languages. But there are also low-hanging fruit, challenges that do not require advanced research — they can be tackled with today’s technologies — but which, when overcome, can have linguistic benefits for communities. An example is the production of Read-along audio books in Indigenous languages. Automating this process posed a technical challenge of only moderate difficulty, but educators find the end product very useful.” (p. 5875)
Indigenous Science and Technology Studies, Linguistics, Machine Learning, Digital Humanities, Information Studies, Education Studies