## Wednesday, April 26, 2017

Translators will have nothing to eat. Soon.

As I learned from Technet.cz, Google Translate was switched to a revolutionary new version of itself on the night between April 18th and April 19th. It's rather likely that you may already see the improvements now. The new software should result from the September 2016 scholarly work submitted to arXiv.org.

Up to that moment, Google Translate was exploiting more or less old-fashioned computer algorithms. However, it uses deep neural networks now. Google had to create its own processing units, the TPUs. Those "tensor processing units" are counterparts of GPUs, "graphics processing units", and they are capable of performing the tasks effectively. These TPUs have trained themselves by looking at millions of texts, including the corpus available through Google Books. They can "automatically", in an emulation of the human brain, learn some patterns and rules how to work not only with individual words and groups of words but even very complex sentences.

I have tried the translation in between the only two languages where I am still rather certain that my knowledge of all the nuances beats all the computers in the world – a statement that may cease to be true rather soon. The flawlessness of many translations has amazed me, indeed. Let me mention that the subtitle "translators will have nothing to eat" was obtained by Google Translate from the well-known quote (about glassmakers' jobs threatened by plastic cups) in a Czech comedy Cosy Dens – a 3-minute video.

Technet.cz has prepared lots of incredible examples in both directions – a soon-to-be-published text will show hundreds of them. I won't bother you with the Czech sentences – only 4% of the TRF readers, not necessarily all the homosexuals, understand Czech. But let me say that the sentence
We would like to try what is possible in the world of on-line translators and artificial intelligence. The new version of Google Translate utilizes deep neural networks to translate the whole sentences, not just short phrases. The neural networks analyze millions of different texts and then train themselves to perform better and better.
was translated to Czech almost flawlessly, just with one somewhat redundant word "it" in front of a sentence starting with "what is possible" – the Czech translation of "it" does naturally appear in similar complex sentences. The word "perform" was translated suboptimally as "may", more or less – not sure why. The old Google Translate had some 8 well-defined problems in the translation of the quote above; the new one has 2, at most 3.

To test the opposite translation, they picked a piece of the Czech tax code. The translation to English ended up like this:
Taxpayers are tax residents of the Czech Republic, if they are resident or usually reside in the Czech Republic. Taxpayers of the Czech Republic have a tax liability, which applies both to income flowing from sources in the Czech Republic and to income from sources abroad.
"If they are resident" could have said "residents" instead, the commas before "if" should be omitted in English, and you might find one similar imperfection but such complaints of ours are basically analogous to objections we could raise against translations done by human translators.

The translations really keep the "genre" and "style". Everyone who has followed the evolution of automatic translators must be as amazed as I am.

Technet.cz plans hundreds of examples of texts translated with the new Google Translate. Some commenters have added their own examples, often amusing ones. Mr Petr Konderla tried the following sentence on Twitter:
Supercharger availability is now displayed on your Tesla touchscreen so you can see how many stalls are open before you arrive.
On Twitter, you can get automatic translations to your language and they are provided by the Bing Translator. If I reverse translate the Czech text produced by Bing into English, it sounds as follows:
Through the blower[,] the availability is now displayed on "Tesla touchscreen" [kept in English], in order for you to see, how much the little tents can be opened before they arrive.
On the touch screen [correct Czech phrase] named Tesla, the availability of superchargers is now being displayed, in order for you to see how many tents are opened before the arrival.
The order of the words may look strange in English but in Czech, it's actually the more natural one and the automatic translator has really rearranged rather big pieces of that sentence, just like a human would do.

Of course, creative Czech commenters were able to construct sentences where the Google Translate did a lousy job. Mr Tomáš Kuchař started with a text that basically says
A šitty [literally: farted through] bumpkin in gum boots took a subway.
A flipper ripped up in his boots with a subway.
This computer-generated sentence may be translated back to Czech and we get the Czech counterpart of
In his armpit himself in shoes he extracted his paddles.
which is my back-back-translation, or
Under the armpits, the fins pulled up in his shoes.
which is Google Translate's own back-back-translation, maybe better than mine. ;-) If you continue to switch back and forth by Google Translate, the sentence stabilizes a bit, except that the fins/paddles become fingers in the next step and "in the armpit" becomes "beneath the armpit" another step later.

You should definitely try the translations between pairs of languages that you know – in both directions – and tell us about your impressions. It's plausible that Google's apps that translate the spoken language are already doing a similarly good job, too.

Google is clearly ahead of the competitors such as Microsoft here. But if someone tries to catch up with Google and a war erupts in between the competitors, computers could very well surpass humans in a few years.