Gitanjali Patel

Gitanjali Patel 

Posted Tuesday September 9, 2014
 

Lost in translation: using Google Input Tools for searches in foreign scripts

Lost in translation: using Google Input Tools for searches in foreign scripts

When Mercedes Benz marketed itself in China as “Bensi”, it didn’t initially realise that some consumers understood the new brand as meaning “rush to die“. It was not the only brand to fall at the first hurdle when launching into a new market.

Meanings are similarly lost in translation when using automated translation tools to search for names of people and companies - especially those written in different scripts.

At Arachnys, we wrestle with these complexities daily (and, by the way, we’re hiring!). One tool we recommend to customers who need an extra helping hand is that Google provides a free Chrome extension - Google Input Tools - that allows you to search in different scripts by typing the word out phonetically. In contrast to Google Translate, this tool allows you to switch between different keyboards and convert the sounds of words from one alphabet to the other. The meaning of the word is irrelevant. For example, to get 你好 using Chinese transliteration you type n-i-h-a-o for a list of Chinese words that sound like nihao in order of popularity.

An analyst without a language specialism searching company or subject names in languages with non-Roman scripts will often first look to Google Translate for support. Here are some examples from Arabic, Chinese and Russian (which make up over 50% of translated queries), to show where Input Tools can help you carry out more sophisticated searches than Translate alone on Arachnys and indeed other platforms.

Chinese

Chinese names typically translate badly using machine translation. The basic problem is that multiple Chinese characters have the same pronunciation (even taking into account tones) - so a romanised version of a name could correspond to a huge number of combinations of characters. Another issue is that translation tools may have a conflict in trying to interpret whether the term is a name or a meaning: the Chinese surnames Song, Sun and He, for example, can cause issues.

Both family name and surname consist of one or two characters, though typically the family name is a single character, and the given name is two. For example, actress Zhang Ziyi (章子怡) has family name Zhang and two given names Zi and Yi, while basketball player Yao Ming (姚明) has family name Yao and the given name Ming.

Chinese characters are used for names of individuals in mainland China, Taiwan, Hong Kong, Macao, Singapore and Malaysia, though the romanised versions of a Chinese name typically use different conventions in each of the countries. Google Input Tools matches the mainland China convention most closely in that it uses pinyin spellings to correspond to a Chinese character.

When typing a single syllable, the input tool offers a selection of matching characters, in descending order of frequency. For example, president Xi Jinping’s name is made up of the syllables Xi, Jin and Ping. The matching characters offered by the input tool are 喜, 金 and 平 if entered individually, but by entering ‘xijinping’ as a single input, it uses a predictive feature to offer the correct name: 习近平.

A counterexample is Singapore’s prime minister Lee Hsien Loong. As names in Singapore are not typically romanised using pinyin, the input tool cannot be used in the same way to enter the name in Chinese characters. Lee’s name is written 李显龙. To use input tools to render this name, the user would need to use the pinyin input ‘lixianlong’.

Arabic

The variety of ways a name can be written in Arabic presents the biggest challenge when using translation tools. Historically, Arabic names were based on a system of long chains of names, though almost all Arabic-speaking countries have now adopted a Westernised way of naming.

Westernising an Arab name is what produces the greatest inconsistency. There is no single accepted Arabic transliteration system and so an individual may Romanise their name in any number of ways, which can be attributed to both personal preference and regional influence. For example, the name أحمد المغربي might be transliterated as Ahmed al-Mughrabi, Al Mughrabi, Al-Mughrabi, El Mughrabi, El-Mughrabi or even just Mughrabi.

Using machine translation to search in Arabic may cause difficulties when the translation of the Romanised name fails. The name “Mohammad”, for example, can be romanised in many ways, but not all are recognised by Google Translate. Input tools for the most part account for variations and render the correct name محمد.

Arabic

Russian and other Cyrillic languages
Much like Arabic, Russian names are normally easy to transliterate into their original Cyrillic forms. There is only one way of spelling Aндрей (which could be transliterated as “Andrei” or “Andrey”) in Russian. Simple surnames like Ivanov (Иванов) are also straightforward to convert using Arachnys’ built-in translation or Google Translate.

However, less common surnames can present challenges with automated translation. Sometimes they do not “translate” at all.

For example, the relatively unusual Russian surname “Aranzhereev” simply doesn’t transliterate, correctly or otherwise, into Cyrillic using Google Translate. Technically, machine translation technology relies on matching up originals with translations online, and this particular name simply hasn’t been seen “in the wild”.

Input tools however render the correct result Aранжереев, because they take into account not just the whole word but the most probable combinations of its component letters:

Russian

Conclusion

In short, Google’s translation tools provide useful support for analysts working outside their language specialism and can enhance searches where translation flounders. It is hard to use them correctly without some approximate idea of what the target language should look like, but they can be a lifesaver in situations where other techniques draw a blank or translation fails entirely.

These tools can take analysts most of the way. Just remember though that review by a native speaker is essential if you want to avoid blunders like “Bensi”.

Stay current with the latest from Arachnys

Subscribe today