Posted by: Steve Vitek | April 23, 2012

Is Machine Translation Starting to Grow a Brain?

I find myself using machine translation (MT) more and more these days. I have been using the MT feature on the Japan Patent Office website for more than a decade, for about 2 years I have been using it also on the World Intellectual Property Organization (WIPO) site and I use it now also on the European Patent Office (EPO) site.

But most people who are not patent translators like myself probably use Google Translate or Microsoft Translator.

I often use Google Translate (GT) basically as a dictionary when I translate other materials than patents. In the last few weeks, I used GT to translate Russian articles dealing with hardware and software technology, German and French legal contracts, and Czech diplomas and university transcripts.

Google Translate Is a Gift from Heaven to Translators

I have to say, GT is a gift from heaven to this translator. For example, if I have a German abbreviation of a certain law, GT spits out immediately the words of the official name of the law in English. GT also knows the official English translations for long names of Japanese laws consisting of long strings of Japanese characters. These translations often do not correspond to the meaning of the characters, it would take me a long time to find them on the Internet, and I might not be able to find them at all. Weird names of subjects taught on Czech universities in the sixties and seventies are also correctly, or at least very plausibly, translated by GT, because humans who understand things like that translated them at some point correctly and the machine translation software knows how to access these translations.

GT still makes really hilarious mistakes that machines will often make, although no human translator would ever make such egregious mistakes. And when I translate words and sentences back from English to Japanese or Czech, I see that GT does not really understand Japanese and Czech grammar at all. The pronunciation of Japanese characters is often wrong when I click on the speaker icon (mostly because “kun-yomi” is used for the second or third character in technical and medical terms, although it should be obviously “on-yomi”, and the endings of Czech nouns and verbs are often incorrect and completely ridiculous, mostly because the system is insanely complicated. I think that Google is relying too much on programmers and mathematical formulas and that it should probably put more linguists to work on its translation software.

But when I compare GT to a typical machine translation 10 years ago, I can see that some progress has been made. Quite a bit, in fact.

GT has access to enormous amounts of data that is being used for translation. The problem is, what the MT software needs to make the jump to real translation is to grow a brain – the kind of brain that every translator has.

Can it do that?

Machine Translation Has Not Grown a Brain Yet – But It Has Already Grown Medulla Oblongata

Let me put it like this: I think that GT has already grown a small silicon brain equivalent to the part of the stem of human brain called medulla oblongata (bulbe rachidien in French, продолговатый мозг [prodolgovatyi mozg] in Russian, and 延髄 [enzui] in Japanese). As you hopefully remember from high school biology, medulla oblongata is a portion of the hindbrain just above the spinal cord that controls autonomic functions such as breathing, digestion and heart rate. We don’t have to use our brain to think about these activities as we can use the lower part of the brain stem for that.

Machine translation is beginning to grow what I would call a silicon brain stem enabling automatic access to an enormous amount of data that can be accessed at great speeds by powerful computers, just like medulla oblongata enables access to breathing and digestion.

But You Still Need Real Human Brain To Evaluate and Validate Machine Translation

But the same problem that MT developers had 50 years ago remains as before: you need a real human brain to process, evaluate and validate this accessed data. The MT product is just a suggestion that may or may not be acceptable.

And although this MT product is very valuable to human translators who can use their own human brain to complete the translation if they know both languages and understand the context, I don’t believe that editing of machine translation is a good method for obtaining good quality of translations created in this manner. The result obtained this method will be always inferior to real translations prepared independently by humans who are able to freely use their intellect and creativity in the translation process, while they can also take advantage of the new capabilities available to them thanks to machine translation.

Machine translation is obviously useful to monolingual people because anybody can use it, it is free, and it is much better than the alternative – namely no translation. It is also very useful to translators because it is slowly beginning to replace dictionaries, or in fact probably not so slowly.

But there is one thing that people who happily predict the imminent demise of human translators  don’t seem to understand: You will never be able to translate without a brain, and machines will never grow one.

About these ads

Responses

  1. “but it has already grown a medulla oblongata”… brilliant! I also find that GT has become sooo much better than it was. It’s really helpful for checking my understanding of passages in Hungarian; not so great for coming up with pretty phrasing, but useful all the same.

  2. Merci beaucoup.

  3. One has but to pick up Harrap’s “New Standard French and English Dictionary” (actually not so new) and turn to “faire” or “être” to see that machine translation will never replace dictionaries, as you have irrefutably pointed out over and over. I take your point about usefulness, practicality, speed, et al, but that isn’t really why humans “do” languages at all, is it? I am happy every time I open a dictionary and remember how thrilled I was to buy my first “Nouveau Petit Larolusse illustré,” with its colored illustrations (including of flags and costumes inside front and back covers), diagrams, “locutions latines,” maps, and so forth, to say nothing of the “Partie Arts, Lettres, Sciences” following the “Partie Langue” itself. I feel a twinge of guilt every time I open it because I have never used it fully, were that even posible. In the note to readers at the beginning, the editors observe that: “Let us see what the Larousse says” has come to replace a recommendation to consult the dictionary. (In a land of some 300 kinds of cheese and the French passion for discussion, how did this chef d’oeuvre ever come to completion?)

    I very frequently consult TVMonde’s dictionary, but almost as frequently, I go to the “real” dictionary for more precise, or just more, choices. I suppose if asked what book I would take to a desert island, I might answer “Le Petit Larousse illustré.”

  4. I will try TVMonde’s dictionary on a French chemical patent tomorrow to see how it compares to Google Translate and Harrap’s.

    I watch TVMonde5 and I heard commercials on it for this dictionary but I never tried it so far.

    I was listening to a discussion of French party hacks on French TV about the election yesterday. It was horrible. They try not to let the other person finish a sentence and whoever yells the loudest wins.

    It was just like in America. I had to turn it off.

  5. The Google patent translation on the EPO site is, in fact, the general translation system Google Translate (it is simple to test that). You might find interesting to test a patent-adapted machine translation system available at iptranslator.com and compare it to Google and Bing translations (in the same web page).
    I am a constant reader of your blog, and, for this post, I wasn’t able to hold back on suggesting this patent-adapted machine translation project we are working on.

  6. GT and MT products based on similar architecture (probabilistic MT) will reach an impressive level of usefulness but if will fail to become 100% because of the nature of the design.
    It might never develop into a “human intelligent” system but it will show us more and more what processes in our brains can be emulated in a computer and what are the real sparks that differentiate us.

  7. GT and MT have already reached a level that is very impressive from the viewpoint of human translators, but still very disappointing from the viewpoint of mononlingual users.

    This is precisely the level where I hope they will stay at least for a few more decades because it means that I can use GT and MT instead of or in addition to dictionaries, but general public cannot really use them instead of me.

  8. Wow, incredible blog format! How long have you been blogging for?
    you make blogging glance easy. The overall look of your website is fantastic,
    as smartly as the content material!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 1,136 other followers

%d bloggers like this: