How African Languages are Going Digital
Digitising African languages is a complex but necessary process. Now, academics and information technology companies from around the world are working together to integrate new languages into various forms of software and technology.
Although there are thousands of languages spoken around the world (many of these in Africa alone), a few, like English, are still ubiquitous – particularly when it comes to digital tools and technology. Until recently, tools for translation, spelling and grammar were primarily geared towards the mainstream Western languages. This focus is starting to shift, albeit slowly, to make more African languages digital.
Development of Human Language Technology
Indigenous African languages historically did not receive the same recognition when it came to technological integration. However, as technology continues to penetrate new African markets, more companies are seeing the need to improve Human Language Technology (HLT) on the continent. This is critical for both ongoing technological development, and the preservation of the cornerstone of cultures in a digital age.
According to the University of Arizona, language and information technology meet throughout the world on a regular basis, hence the need for proficient research and development: “Anywhere language comes in contact with information technology, or where humans need to interact with computers, language needs to be [organised] so that it can be handled and processed by computational means. This often requires broad knowledge not only about linguistics and how languages work, but also about computer science and related fields.”
Tech giants investing in African languages
As a result of this, many big corporations, including tech giants like Facebook and Google, are investing in HLTs for African languages (if only to increase their market reach and profits). The first noticeable changes came in the form of the ability to use Google in various African languages. Some companies, such as Microsoft, are also incorporating African languages into spell checkers and grammar tools.
However, there are still grave shortfalls and a lot of work needs to be done. As researcher Maria Keet points out, to integrate new languages with technology accurately and meaningfully is more than just flicking a switch, and instead requires an integrated approach: “What’s the point of searching the Web in, say, [South African language] isiXhosa when there are only a few online documents in isiXhosa and the search engine algorithms can’t process the words properly anyway, hence, not returning the results you’re looking for?”
More complex than it appears
Keet highlights the need for word processing tools to incorporate languages like these into spell checkers in order to assist everyone – from school children to professionals – to write papers, documents, messages and emails in their native tongue.
Digitising any language is complicated, and it requires extensive research and testing before reaching a stage of automatic implementation. African languages also require significantly more work than English.
Whereas basic syntax rules have been used to digitise the English language, many African languages consist of sentences that are highly dependent on the context of the situation, have complex verbs and sentence structures, and thus are not easily open to automatic tools using structured data.
As a result, researchers must build grammar engines to generate basic sentences. These run off complex algorithms that draw from existing texts, and this brings forth a range of additional issues.
Feeding off existing texts
At the core of all digital language learning are existing texts and these algorithms feed on these. Locating, and collating, these texts is difficult for many African languages, given the historical bias towards Western languages when it comes to publishing.
In spite of the difficulties, organisations around the continent are starting to identify and pool high quality native-language documents that not only include accurate spelling and grammar, but are also modern enough to be considered relevant today. Without this cultural context, software algorithms run the risk of being inaccurate and insensitive at best, and outright racist at worst.
Big data and combined effort
Much of the process, therefore, relies on big data and a combined effort from several different parties, from researchers to tech giants. As some organisations identify relevant texts, African language experts work on algorithms that can translate them accurately. Tech companies then integrate these into existing projects, and help develop new tools to assist those who need them most.
And they are already making progress – some African language spell checkers are now achieving accuracy rates of up to 80%, according to one study.
Though the process of digitising African languages is complicated, and requires the cooperation of several important individuals and organisations, the impact is far reaching. While it will certainly play an important role in cultural preservation, schooling, and the daily lives of people using digital tools, the ramifications at its most basic level are also huge – it will have a direct impact on the quality of life, particularly in fields such as healthcare, as technological advancements can help break down language barriers between doctors and patients.