Several years ago I received a cost estimate inquiry through my website from a private individual who wanted me to quote my price for translating a fairly long Japanese patent for him. I remember that the cost that I quoted was about two thousand dollars and that I thought to myself that since this was a private individual rather than a company, I would have to make him pay 50% of the cost before I start working and the rest just prior to delivery.
I was mightily peeved when I received his answer, because it said:”Although I already have a pretty good machine translation of the patent, I am willing to pay you four hundred dollars for editing and cleaning it up for me.”
Obviously, I told him that I was not interested, and I was not being very polite either.
This guy had the same ingenious idea that is now promoted and heavily bet on by a certain segment of what is called the translation industry. This concept can be summarized as follows:
Let’s reclassify translators as post-processors and pay them a fraction of what they used to make so that we could keep most of the money for ourselves.
Translators are encouraged by this translation industry segment “to acquire new technical skills”, which means to agree to be “trained” in order to become post-processing factotums, and those of us who speak scornfully of this concept are called disgruntled luddites. I suppose post-processing of machine translations could be a useful skill to have, under some circumstances. Knowing how to dig your own grave is also a useful skill to have, under some circumstances.
Is this concept going to work? I believe that the answer has to be …. it depends on several factors, including the language, and on what the words “to work” mean in this case.
The fact is, some translation agencies have been using this technique already for some time, without telling the editors that the text that they are “editing” is in fact what Kevin Lossner and his followers fittingly call MpT (machine pseudo-translation) and also without telling their clients that the pig in a poke that they are being sold is the result of post-edited MpT, as evidenced by this post on the Tranix blog from 2013.
There is no question that great progress has been achieved in the last decade or so in machine translation technology, which is about 60 years old now. In my opinion, the recent progress has been achieved largely thanks to the statistical approach to machine translation, as opposed to the rule-based approach. That is why Google Translate is so popular now and why zillions of pages are translated every minute by Google Translate, which is free to most users.
To my astonishment, a brainless, heartless, bloodless, and pulse-less machine even passed a few years ago a Portuguese-to-English translation test administered by the American Translators Association to people who want to be accredited by the ATA. However, this interesting fact may be saying more about this particular ATA accreditation test than about the capabilities of machine translation. I am saying this because a friend of mine who failed this test happens to have a PhD in Japanese studies from the University of Berkeley and has been translating Japanese for decades.
Is this a case of man-versus-machine competition, where man lost and machine won because it was smarter, or a case of a few men and women who put together a ridiculous test? I think it is the latter.
In any case, many people, including the bait-and-switch potential customer mentioned in the introductory part of this post, or the one mentioned on Tranix blog, are trying to figure out how to replace translators by post-processors who would be simply asked to feast on big chunks of texts already “pre-translated” for them by hardware and software.
The post-processing scheme could work if the translators who agree to engage in this kind of mind numbing activity are able and willing to completely retranslate parts of the text that are beyond salvaging. How much of the machine-translated text will fall into this category will depend on several factors.
The complexity of the translation is obviously one of them, although, depending on the language, the statistical model of machine translation can deal with very complex and highly specialized texts on obscure and arcane subjects – provided that it is something that has been already translated and included in the database accessible to the software.
How do you find for example the correct spelling of Latin names of all kinds of seaweeds and various types of algae or mushrooms which are either written in characters or transcribed into one of the Japanese alphabets called katakana? There are many Japanese patents about products and medications based on such mysterious ingredients, and it used to be a gargantuan task for me to try to figure out the correct spelling in English. All I have to do now is to type it in katakana or characters into GoogleTranslate to identify the correct spelling instantaneously, because there is almost always only one equivalent for the Japanese word, usually in Latin, which is already contained in the machine translation database.
But not everything can be possibly contained in a database, and regardless of how many words are already there, machine translation still frequently results in mistranslation, for example when the positive form of a verb is used instead of the negative form and vice versa, such as when “nicht” or “nai” (no) in German or Japanese is missed by the machine translation software. Especially in Japanese, it is quite common to use not just 2, but 3 or 4 negatives to make a point in an erudite commentary, and even a human translator has to stop and think whether the result in English will be positive or negative based on the meaning of the sentence. Since a machine has no idea about the meaning of the sentence, it simply has to pick one of the two options because there is no such thing as meaning as far as the machine is concerned.
Machine translation is incredibly good at suggesting brilliantly sounding words and phrases worthy of William F. Buckley – especially if the database knows that it was in fact William F. Buckley who said these words first to begin with. The problem is, these brilliantly sounding translations may in fact be saying the opposite of what the text means in the original language because the machine has no concept of the meaning of the word “meaning”.
This must be quite a conundrum for machine translation programmers. I suppose that’s why they get the big bucks. Just like the alchemists of old, they will never succeed, but just like alchemists discovered a very useful new science called chemistry without every reaching their goal, programmers are discovering other new concepts and techniques that may be more valuable than gold, because the journey is more important than the original goal.
It might be possible to eventually teach ants who are busy dragging dead insects to feast on them in the sanctity of their anthill the complex logistics of the global economy that have been perfected by humans in the last few decades, so that for example peaches and oranges are flown from Argentina to Oregon (because they may be a little cheaper in Argentina than the peaches and oranges grown a few miles away in California).
But since computers can only understand strings of zeroes and ones, the meaning of the word “meaning” will always be beyond their grasp. An ant is much smarter in this respect than even the most powerful supercomputer, no matter how much memory, speed and “corpus-based terminology” we may throw at the machine.
Even so, the post-processing scheme should work, up to a point, for translations between similar and relatively straightforward languages, such as English, German, or French, which share a similar grammatical structure and use similar grammatical concepts, such as subject, singular, plural, etc., none of which have direct equivalents for instance in Japanese. However, even in the case of languages that can be quite easily described in terms originally used by the grammar of classical Latin, much of the text will need to be completely retranslated if it is to accurately reflect the meaning in the original language.
Because I am an extremely lazy patent translator who enjoys modern conveniences, I consider machine translation just another modern convenience, a useful concept that saves time, like a microwave oven. I generally print out a machine translation of the patents that I am translating, as machine translations are now easily available on the internet for most patent application in most languages, with the exception of relatively old documents.
But while machine translation from German into English often make sense in some sections of the documents, sometime even in large sections of the patents that I translate, they almost never make much sense when I look at a machine translation from Japanese.
And sometime the machine translation can be completely wrong also because the wrong document is linked in the database, probably due to an error of a human operator. It happened to me last month with a Polish patent. I was looking at its machine translation into English on the WIPO site, it was the same patent with the same title, but it must have been a different version of the patent application because the machine-translated text simply did not correspond to the original document.
St. Francis Xavier, a sixteenth century Jesuit missionary, said that the Japanese language is so difficult, (by which he probably meant different from European languages), that it must have been invented by the Devil himself to frustrate Jesuits who are trying to learn it.
Even though machine translations of Japanese patents are very useful to me, for example because the chemical terminology is usually correct, every sentence that is even just slightly complicated is usually completely mistranslated. Post-processing of sentences translated from Japanese thus does not seem to make any sense if what we are trying to achieve is a real translation faster than when the same text is translated by a knowledgeable human. Machine translations can be used in this case basically only as a dictionary (albeit often a life-saving dictionary containing correct equivalents of incredibly obscure terms).
I believe that St. Francis Xavier got it wrong when he said that it was the Devil who put so much complexity in human languages, especially some of them, to make it next to impossible to learn them by people who were not simply born into the language so that they could learn them the natural way.
It was God, not the Devil, who punished the hubris of humans who in their boundless arrogance attempted to build a tower reaching all the way to heaven by confusing them with dozens of different languages, so that eventually they had to stop building their stupid tower as they were no longer able to understand each other.
The quest for perfect machine translation, machine translation that would be not just a useful tool for translators and non-translators alike (although much more so for translators), is just another misguided attempt by arrogant humans to build another Tower of Babel that would reach all the way to heaven, thus making mortal and fallible humans God-like.
God in his infinite wisdom made sure that by the time “pretty good” machine translation would be available by about 2015, there would still be hundreds of languages spoken on planet Earth, many of them incredibly difficult to learn, just in case some misguided humans tried to simply replace human translators by software and hardware. That is why the evil designs of machine translation programmers have been thwarted so that human translators could continue to be gainfully employed even at a technologically advanced stage of human civilization.