Posted by: patenttranslator | January 24, 2015

The Machine Translation Conundrum

 

Several years ago I received a cost estimate inquiry through my website from a private individual who wanted me to quote my price for translating a fairly long Japanese patent for him. I remember that the cost that I quoted was about two thousand dollars and that I thought to myself that since this was a private individual rather than a company, I would have to make him pay 50% of the cost before I start working and the rest just prior to delivery.

I was mightily peeved when I received his answer, because it said:”Although I already have a pretty good machine translation of the patent, I am willing to pay you four hundred dollars for editing and cleaning it up for me.”

Obviously, I told him that I was not interested, and I was not being very polite either.

This guy had the same ingenious idea that is now promoted and heavily bet on by a certain segment of what is called the translation industry. This concept can be summarized as follows:

Let’s reclassify translators as post-processors and pay them a fraction of what they used to make so that we could keep most of the money for ourselves.

Translators are encouraged by this translation industry segment “to acquire new technical skills”, which means to agree to be “trained” in order to become post-processing factotums, and those of us who speak scornfully of this concept are called disgruntled luddites. I suppose post-processing of machine translations could be a useful skill to have, under some circumstances. Knowing how to dig your own grave is also a useful skill to have, under some circumstances.

Is this concept going to work? I believe that the answer has to be …. it depends on several factors, including the language, and on what the words “to work” mean in this case.

The fact is, some translation agencies have been using this technique already for some time, without telling the editors that the text that they are “editing” is in fact what Kevin Lossner and his followers fittingly call MpT (machine pseudo-translation) and also without telling their clients that the pig in a poke that they are being sold is the result of post-edited MpT, as evidenced by this post on the Tranix blog from 2013.

There is no question that great progress has been achieved in the last decade or so in machine translation technology, which is about 60 years old now. In my opinion, the recent progress has been achieved largely thanks to the statistical approach to machine translation, as opposed to the rule-based approach. That is why Google Translate is so popular now and why zillions of pages are translated every minute by Google Translate, which is free to most users.

To my astonishment, a brainless, heartless, bloodless, and pulse-less machine even passed a few years ago a Portuguese-to-English translation test administered by the American Translators Association to people who want to be accredited by the ATA. However, this interesting fact may be saying more about this particular ATA accreditation test than about the capabilities of machine translation. I am saying this because a friend of mine who failed this test happens to have a PhD in Japanese studies from the University of Berkeley and has been translating Japanese for decades.

Is this a case of man-versus-machine competition, where man lost and machine won because it was smarter, or a case of a few men and women who put together a ridiculous test? I think it is the latter.

In any case, many people, including the bait-and-switch potential customer mentioned in the introductory part of this post, or the one mentioned on Tranix blog, are trying to figure out how to replace translators by post-processors who would be simply asked to feast on big chunks of texts already “pre-translated” for them by hardware and software.

The post-processing scheme could work if the translators who agree to engage in this kind of mind numbing activity are able and willing to completely retranslate parts of the text that are beyond salvaging. How much of the machine-translated text will fall into this category will depend on several factors.

The complexity of the translation is obviously one of them, although, depending on the language, the statistical model of machine translation can deal with very complex and highly specialized texts on obscure and arcane subjects – provided that it is something that has been already translated and included in the database accessible to the software.

How do you find for example the correct spelling of Latin names of all kinds of seaweeds and various types of algae or mushrooms which are either written in characters or transcribed into one of the Japanese alphabets called katakana? There are many Japanese patents about products and medications based on such mysterious ingredients, and it used to be a gargantuan task for me to try to figure out the correct spelling in English. All I have to do now is to type it in katakana or characters into GoogleTranslate to identify the correct spelling instantaneously, because there is almost always only one equivalent for the Japanese word, usually in Latin, which is already contained in the machine translation database.

But not everything can be possibly contained in a database, and regardless of how many words are already there, machine translation still frequently results in mistranslation, for example when the positive form of a verb is used instead of the negative form and vice versa, such as when “nicht” or “nai” (no) in German or Japanese is missed by the machine translation software. Especially in Japanese, it is quite common to use not just 2, but 3 or 4 negatives to make a point in an erudite commentary, and even a human translator has to stop and think whether the result in English will be positive or negative based on the meaning of the sentence. Since a machine has no idea about the meaning of the sentence, it simply has to pick one of the two options because there is no such thing as meaning as far as the machine is concerned.

Machine translation is incredibly good at suggesting brilliantly sounding words and phrases worthy of William F. Buckley – especially if the database knows that it was in fact William F. Buckley who said these words first to begin with. The problem is, these brilliantly sounding translations may in fact be saying the opposite of what the text means in the original language because the machine has no concept of the meaning of the word “meaning”.

This must be quite a conundrum for machine translation programmers. I suppose that’s why they get the big bucks. Just like the alchemists of old, they will never succeed, but just like alchemists discovered a very useful new science called chemistry without every reaching their goal, programmers are discovering other new concepts and techniques that may be more valuable than gold, because the journey is more important than the original goal.

It might be possible to eventually teach ants who are busy dragging dead insects to feast on them in the sanctity of their anthill the complex logistics of the global economy that have been perfected by humans in the last few decades, so that for example peaches and oranges are flown from Argentina to Oregon (because they may be a little cheaper in Argentina than the peaches and oranges grown a few miles away in California).

But since computers can only understand strings of zeroes and ones, the meaning of the word “meaning” will always be beyond their grasp. An ant is much smarter in this respect than even the most powerful supercomputer, no matter how much memory, speed and “corpus-based terminology” we may throw at the machine.

Even so, the post-processing scheme should work, up to a point, for translations between similar and relatively straightforward languages, such as English, German, or French, which share a similar grammatical structure and use similar grammatical concepts, such as subject, singular, plural, etc., none of which have direct equivalents for instance in Japanese. However, even in the case of languages that can be quite easily described in terms originally used by the grammar of classical Latin, much of the text will need to be completely retranslated if it is to accurately reflect the meaning in the original language.

Because I am an extremely lazy patent translator who enjoys modern conveniences, I consider machine translation just another modern convenience, a useful concept that saves time, like a microwave oven. I generally print out a machine translation of the patents that I am translating, as machine translations are now easily available on the internet for most patent application in most languages, with the exception of relatively old documents.

But while machine translation from German into English often make sense in some sections of the documents, sometime even in large sections of the patents that I translate, they almost never make much sense when I look at a machine translation from Japanese.

And sometime the machine translation can be completely wrong also because the wrong document is linked in the database, probably due to an error of a human operator. It happened to me last month with a Polish patent. I was looking at its machine translation into English on the WIPO site, it was the same patent with the same title, but it must have been a different version of the patent application because the machine-translated text simply did not correspond to the original document.

St. Francis Xavier, a sixteenth century Jesuit missionary, said that the Japanese language is so difficult, (by which he probably meant different from European languages), that it must have been invented by the Devil himself to frustrate Jesuits who are trying to learn it.

Even though machine translations of Japanese patents are very useful to me, for example because the chemical terminology is usually correct, every sentence that is even just slightly complicated is usually completely mistranslated. Post-processing of sentences translated from Japanese thus does not seem to make any sense if what we are trying to achieve is a real translation faster than when the same text is translated by a knowledgeable human. Machine translations can be used in this case basically only as a dictionary (albeit often a life-saving dictionary containing correct equivalents of incredibly obscure terms).

I believe that St. Francis Xavier got it wrong when he said that it was the Devil who put so much complexity in human languages, especially some of them, to make it next to impossible to learn them by people who were not simply born into the language so that they could learn them the natural way.

It was God, not the Devil, who punished the hubris of humans who in their boundless arrogance attempted to build a tower reaching all the way to heaven by confusing them with dozens of different languages, so that eventually they had to stop building their stupid tower as they were no longer able to understand each other.

The quest for perfect machine translation, machine translation that would be not just a useful tool for translators and non-translators alike (although much more so for translators), is just another misguided attempt by arrogant humans to build another Tower of Babel that would reach all the way to heaven, thus making mortal and fallible humans God-like.

God in his infinite wisdom made sure that by the time “pretty good” machine translation would be available by about 2015, there would still be hundreds of languages spoken on planet Earth, many of them incredibly difficult to learn, just in case some misguided humans tried to simply replace human translators by software and hardware. That is why the evil designs of machine translation programmers have been thwarted so that human translators could continue to be gainfully employed even at a technologically advanced stage of human civilization.

Advertisements

Responses

  1. Thank you for this extremely interesting article. I have been offered several times lately to dig my own grave.

    Like

  2. “I have been offered several times lately to dig my own grave”.

    If you refused to learn this valuable skill, you too must be “a Luddite”.

    Like

  3. I obviously agree with your dissection of “machine pseudo-translation” and the scam of “post-editing.” I am not sure, however, whether looking up technical terms by Google Translation is totally reliable, since if it screws up in other ways, it might not be trustworthy with individual terms. For those, I use regular Google search or some other search engine and spend some time making sure I have the correct term, shifting back and forth between English and Japanese until I’m pretty sure.

    But perhaps your method is reliable enough. I’ll give it a try.

    When I started out translating, in the typewriter age (AK the Middle Ages), my first work was on academic papers on pesticide chemicals used on crops, and I spent many frustrating hours trying to find English names for those bugs, diseases, and plants in the printed reference works in the libraries I had access too. I eventually caught on to the fact that many pests, plant diseases, and plants themselves which exist in Japan just turn up missing in the U.S., and the only thing I could do was transliterate them. Ceiling Cat only knows what MT does with that kind of thing nowadays.

    Like

  4. I also think that it might be a little too daring to say that machines will never be able to deal with the meaning of language. The talking robots that appear in SF movies look pretty realistic to me, even if they might not be invented until the 24th century. 🙂 By then, of course, translators living today will not need to worry about being put out of work. Unless the “post-human” enthusiasts are right that we will soon be able to upload ourselves to computers and become immortal. (I always wonder, by the way, what would happen to these uploaded immortals when the machines they were shifted to crashed. I hope they get themselves backed up. But then who is the actual person: the original file or the backup?)

    Like

  5. “I am not sure, however, whether looking up technical terms by Google Translation is totally reliable, since if it screws up in other ways, it might not be trustworthy with individual terms.”

    Of course it is not reliable, I combine it with other databases and sometime I have to spend a lot of time researching a few terms. But it is still very helpful for a quick check, especially when I start translating. MT also warns me when I am about to skip a part of text, which can easily happen when you translate two very similar paragraphs and lose your concentration for a moment for some reason.

    “I hope they get themselves backed up. But then who is the actual person: the original file or the backup?”

    Good one. NSA would have to keep spying on both machines where the files are stored.

    Like

  6. A sharp tool in the hands of a trained and experienced tradesman can produce a better result. In the hands of those without training or experience, they could spell a trip to the hospital.

    Like

  7. Hi Steve,
    Many thanks for referring to my blog post on Google Translate. However, I’d just like to point out that the agency in this case was the good guy, since it was the client who was trying to pass off the text in English as human output. The PM supported me in rejecting the job as a revision. Unfortunately, a lot of skint or miserly clients think they can get away with using a free online MT tool (not necessarily GT these days) to then get the same quality for half the price. But this no longer works with this particular agency as the conditions for the client clearly state that we will not do PEMT and will halt all jobs if we discover that MT has been used.

    Like

  8. Hi Nikki

    So this client was trying to do basically the same thing that I am describing at the beginning of my post.

    I think that translators should try to keep their cool and give 2 options to clients like these:

    1. Learn the language and you will not need a translator (highly recommended).

    2. Pay a real translator for a real translation – asking a translator to fix and clean up MT is like asking a dentist to fix and clean up gum surgery that was done for free by a blind dental hygienist.

    Liked by 1 person

    • No. 2 is what I offer at the moment. Certainly don’t rule out suggesting no. 1 though.

      Like

  9. I’ve always liked the Tower of Babel story.

    It reminds me of a mate of mine who was working in the 1990’s in Lancaster University’s linguistics department on a European Community machine translation project in conjunction with other European universities.

    The idea was to develop a core language, only understood by computers, into which all European Community documents would be translated. Then the document would be translated from this core language into all the other EC languages.

    The aim was to save money and time – today there are 24 official EU languages so in theory each document has to be translated 23 times.

    The project was abandoned in the late 1990’s as it became clear that it wasn’t workable and that statistical, corpus-based machine translation was the way to go.

    Don’t know if God had much to do with it, but hubris, in the form of not thinking big ideas through, is definitely a bummer.

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: