Technology | Diversity

“Indigenous languages don't make it easy for AI”

Michael Running Wolf worked as a software engineer on the virtual voice assistant Alexa. Now he is revitalising indigenous languages with AI

Mr. Wolf, you are working on revitalising indigenous languages with AI. Why

We are losing a language every 14 days, 60 percent of those we are losing are here in North America. There's not enough intergenerational transmission, which is a result of intentional efforts in the US and Canada to erase indigenous languages. My parents and my grandparents’ generation were taught that speaking their language was not good for them personally. But our generation didn't grow up under the oppression of anti-cultural American policies. Now we are dealing with this situation where we want to learn our language, but we have very few who still speak the language and those who do may lack teaching skills.

What does AI have to do with it?

While working for Amazon’s Alexa, I became a big data nerd. I thought why we couldn’t enable this kind of technology for indigenous languages. Then I encountered a Māori tech-team that had successfully applied it for the Māori tribe with only 300 hours of audio – that inspired me. I asked myself: What if I could put on a headset, and everything around me was in Chayenne?

Like, the words, our conversation, and I would exist within the reality of where we are. We are in Montréal, this is Mohawk territory. What if Mohawk people could live here as if it were still Mohawk territory, conversing and living in their society and interacting with the outside world?

How would you do it?

In the short term, we aim to build so-called API’s, a common software-tool, that will help connect apps on your smartphone with a wide range of relevant knowledge-archives or existing language-education-programs within various indigenous communities.

Our medium-term goals involve bringing indigenous languages into the metaverse, creating XR experiences such as virtual or augmented reality. Imagine having different communities with hundreds of diverse APIs, game developers, and individuals interested in building digital artefacts. This will open up possibilities that I may not be interested in creating, but am excited about because of what they could achieve.

Speaking of Amazon, how true is the prevailing perception of big tech as white men working on AI and producing these products with their biases?

Yes, but it's often East Indians and Asians that make up a large percentage of the corporations. Like, you have Satya Nadella (Microsoft), you have the head of Google Sundar Pichai, and they are pretty representative of the workforce. But they still bring in Eurocentric ideas. From my perception, if your workforce comes from a colonised country, they are going to act out colonial behaviours regardless of their skin colour.

The computer science space is mostly a male dominated field, and we don't have many Indigenous people from North America in computer science according to the CRA Taulbee Survey, which gathers data from colleges in North America, United States, Canada, et cetera. I know of only twelve of us on the global stage who are computer scientists practising AI.

“People are conforming their linguistic cultures/patterns to the limitations of AI”

How does this affect language recognition AI?

You can’t talk to Siri or Bixby or Google Assistant in an indigenous language. Out of 7,000 languages, AI currently only serves languages that are similar to Mandarin, Hindi and English. Any languages that do not fall within those categories just don’t work. And this has been proven scientifically.

On another note, popular AI assistants require users to speak specific versions of English, Hindi, German, or French that the AI can understand, erasing not only indigenous languages but also dialects from modern technology. So what happens is that people are conforming their linguistic cultures/patterns to the limitations of AI. 

But what makes indigenous languages so difficult for language recognition systems such as “Alexa” or “Siri” to make sense of?

Firstly, there is a lack of data as only a handful of speakers are fluent in these languages. For some communities, they are only spoken by grandmothers and aunties around the kitchen table. They could never create a million hours of annotated audio, which is often required for these systems.

We need to come up with a solution that can handle sparse data sets and needs minimal information about a language for speaking it. Secondly our languages in North America phonetically, from the sound are very different to Western languages. And the key issue lies in the morphology, structure, syntax, and grammar unique to highly polysynthetic languages. A phrase can be condensed into one single word. So the infinite amount of potential words makes it hard for current AI to deal with.

How can this problem be addressed?

The eventual goal would be that we're not creating a one-size-fits-all solution, but strategically addressing technical challenges in language acquisition and language AI for each community we work with. At the moment, we are directly engaged with three communities: the Saskatchewan community in the Northwest Territories, Kwak’wala in British Columbia, and Makah in Washington State, United States. We are also in early discussions with tribes in South America and Mexico. However, we need to achieve specific technical goals before expanding further.

Do you think language is the prerequisite to integrate Indigenous knowledge/voices in AI into other fields?

Every community, without exception in North America, is deeply concerned about the safety of their language because language conveys who they are as individuals, as a community. And it unifies them in a world-view. It's a good incentive for communities to embrace technology.  Half of the AI people I know are doing natural language processing. I think it's the biggest concern currently, because we all are aware that our languages might go extinct within the next 20 years.

And there are other concerns, like climate, ethnobotany communities are very concerned about the safety of their plants and their medicines - there's certainly AI research and data science around that. Us indigenous, we have an opportunity now, that AI is so new. There's a few of us here that could set the stage for indigenous research, because no one else is going to do it. We also need more people to push for our ethics because no one else is going to push for our ethics.

What ethics do you want to push for?

Having sovereignty over your data. It depends on the tribe, but we put a high value upon personal safety and particularly safety of the community. And we don't want our sovereignty and ability to be who we want to be violated. The pressing need for extensive data often leads to the quick solution of purchasing data without the providence of where it came from. And when working with Indigenous data, language, and AI, these ethical dilemmas become even more pronounced.

“If there are no ethical behaviours or decolonial forethought, AI becomes another mechanism of colonisation”

Isn’t this an inherent Problem of every AI-System?

I think that AI systems, as an artifact, are not in itself bad. However, it is a mechanism of the West. And like with anything, if there are no ethical behaviours or decolonial forethought, AI becomes another mechanism of colonisation. Now they're working on indigenous language recognition but using methodology that is against our ethics. Large corporations are going in and harvesting data like a colonial entity.

Just to clarify, how could any kind of indigenous data sovereignty be achieved?

It is our belief, that we don’t want to steal data. We don’t want to use data without permission, and embedding that is a core ideology within AI research. We start by getting permission and setting up agreements where we're basically just users of the data. We use the data only for our specific research, and we can't do anything else with it without renegotiating the contract. The communities also have the freedom to remove their data from our set if they want to.

This is crucial because there's not much data available for minority languages, and we need a lot for our AI. Working with different communities and earning their trust is key.  We need the coordination of many different communities to help contribute to an indigenous data set.

You have stressed repeatedly, that there are few people working in this field. Do you feel a lot of responsibility to succeed?

Given our survival from genocide, revitalizing our culture matters. 10–15 years ago in Canada, there were federal programs and provincial programs where Indigenous women were being sterilised by doctors. That also happened in the US in the 1970s. So that is vital context. Of course, I am happy that they're not actively trying to kill us all any more.

We have emerged from genocide and we still talk and look like ourselves. I am now focused on making sure our culture survives into the next phase, meanwhile, I'll let the next generation worry about making a change in society.

Interview by Atifa Qazi

The interview was conducted by Atifa Qazi as part of the Heinrich Böll Foundation's Transatlantic Fellowship