June 18, 2018

“Everything has already been said, but since nobody was listening, we have to start again.”
André Gide

Last week I was checking out locations and prices of hotels in a small town in Southern Bohemia. Well, 90,000 people is a relatively small town by American standards, but a pretty big town by Southern Bohemian standards, I’d say. But then again, a small town in Bohemia is still a town, while a big town like Los Angeles is not really a town but a hundred different towns, so we are really comparing apples and oranges.

I thought that the English description of the hotel’s rooms and facilities sounded a little funny, so I started looking for the now omnipresent “Click here for translation” button to find out what was the original description in Czech like.

And there it was, in the upper right corner, exactly where I expected it to be.

But the “Click here for translation” button of this little hotel, more of what is called a pension rather than a hotel really, had more than a dozen flags to indicate the languages in which translation was available: not just in German, French, Russian, and English as one would expect, but also in Japanese, Chinese, Korean, Polish, Dutch, etc.

It was not surprising to me that when I clicked on the flag in a language I know, I saw that it was a machine translation rather than a real translation because there were mistakes in it. But when I clicked on the Czech flag, I saw that it was a machine translation too, not a text written by a real person.

The algorithms for machine translation can be designed to sound almost like a real translation for languages that have a relatively simple grammatical structure, such as English, which has no cases for nouns as do Slavic languages such as Czech, Russian, or Polish, and this is one reason is why machine translations from English into a Slavic language will be instantly recognizable already by the wrong case of the noun …. among many other things as well, of course, such as the wrong gender of the noun, or the iterative mode of a verb which should have been in a non-iterative mode based on the context.

For example, the correct translation of a very simple sentence in English, such as “The price includes breakfast” would be “Cena zahrnuje snídani”, but machine translation could easily butcher the result in Czech to “Cena zahrnuje snídaně”. Because there were so many mistakes like this in the Czech text, I saw that it could not have been written by a real person, it was a machine translation too.

So who wrote the original blurb on the hotel’s website if it was not the Czech owner of the small pension? I guess I will never know.

How can I find the original text if the website is in 16 languages? I guess I never will.

Most English speakers don’t realize that machine translations from and into complicated languages are much more difficult to design than machine translations from and into English.

We can see it all the time when we click on the “Click here for translation” button on Facebook for a language that we don’t know.

Even though I don’t speak Italian, I can usually figure out the meaning of the text if I click for example on Italian because I know French fairly well, and also because I have been studying Latin as a young lad for many years.

But if I click on a language that I don’t know and that is not related to another language that I know, for example on Arabic, the result is most of the time hilarious and completely incomprehensible nonsense.

We try to use technology to solve the problems of our civilization, and we think that it can be done in this way.

But maybe we are just fooling ourselves. It is also possible that the opposite is happening: instead of understanding each other better because we now have machine translation to communicate with people that we could not communicate with before since they speak a language that we don’t know, we understand less and less each other even in our own language because everything now looks more and more like a machine translation, and it’s not possible anymore to find out what the original was really about. The meaning gets lost and replaced by “alternative meanings” created by algorithms.

Few people notice, and nobody really cares.

Human translators are too expensive, so we don’t try too much to understand what things really mean. That is no longer very important.

Here is another example: this morning I gave a client an estimate for translating relatively small sections of several patents. All of the patents would translate into about seven or eight thousand words, the sections the client selected would translate into only about two thousand words.

There was no answer so far from the law firm to my modest cost estimate.

And here is what I think must have happened: the patent lawyer told the corporation that is his client in this case that there are important sections of the patent applications cited in opposition to a patent filed by the corporation, and that he just can’t figure out the meaning of these sections from free machine translation.

And because the corporation does not like to spend too much money on translations, the patent lawyer proposed an alternative: use human translation only for these vital sections to reduce the cost of human translation.

But the lawyer’s client is so used to the “Click here for translation button” that he told the lawyer that the company has no budget for human translation, even if it would be for a relatively small portion of the entire material.

Or maybe I must wait because the decision about the relatively minor expense related to human translation must be made by an important bean counter who is comfortably sitting at a higher corporate position?

As I’ve said above, because human translators are expensive, we don’t try too much to understand what things really mean. That is no longer very important.

In Algorithms We Trust.



  1. “…. for languages that have a relatively simple grammatical structure, such as English …
    Ok, have a go at colloquial Thai MTed into “English”. Only civil servant language renders more or less correctly (sometimes stating the opposite) because …. yeap.


    • What ?


