Posted by: patenttranslator | September 9, 2018

Merchants of MT “Equivalent” to Human Translation Rely on Artful Lies and Gullibility of Their Customers

What translators are saying is that machine translation should be forbidden and only human translations should be allowed,” said a commenter on a Facebook group of translators recently.

It should be noted that when I took a look at who this person was, I saw that, purely coincidentally, this person who infiltrated a number of groups for translators although he himself is not a translator, was selling his own customized machine translation system.

What he said was of course a lie. That is not what translators are saying. At least not translators who still have a functioning brain. Only a total moron could believe that anybody has the power to “forbid machine translations”, especially since most machine translations are free. Not even translators are probably as stupid as the above-mentioned troll suggested.

What we are saying is that machine translation is only a tool and not translation.

We understand that machine translations are very useful, not only to translators, but also and especially to non-translators. The problem is that people are led to believe by the “translation industry” that machine translations are actual translations, because they are constantly bombarded by insidious propaganda of the “translation industry” whose aim is to erase in the minds of most people the difference between human translations and machine translations so that the machine translation snake oil could then be sold to clients as real translations.

After all, machine translation is so much cheaper than actual human translation!

Machine translations may sometime look like real, i.e. human translations, which happens once in a while when the stars are aligned just right, which is to say when an algorithm found a previous translation of a human translator that happens to be perfectly compatible with a document “translated” by the machine translation tool. But as anybody who has dealt with machine translations knows, looks can be deceiving.

“Human Parity of Machine Translation” Is a Myth

According to a recent article published on September 7, 2018, by Gino Diño in Slator, which criticizes among other things the transparent hype created around “neural machine translation,” so-called human parity of machine translation is not what the words seem to suggest, as is often the case when we talk about misinformation launched into the world by the “translation industry.”

As the same Slator reporter wrote in a previous article titled “In Human vs. Machine Translation, Compare Documents, Not Sentences,” Microsoft authors claim in their paper that human parity is achieved “if there is no statistically significant difference between human quality scores for a test set of candidate translations from a machine translation system and the scores for the corresponding human translations. In other words, if a bilingual human evaluator judges the quality of human and machine translations as equal (difference in scores are statistically insignificant), then the machine has achieved human parity.”

On surface, such a claim appears to be reasonable. But as an English politician put it a long time ago, “there are lies, damned lies, and statistics.” When a group of independent researchers, Läubli, Hennrich and Volk, from Standford University, University of Edinburgh, and Zurich University, respectively, who were not paid for their research by Microsoft, looked more closely at Microsoft’s human parity claim, they found some very interesting facts:

“Microsoft followed current research standards in their methodology, where usually, “raters see single sentences – one by one, from any of the test documents, in random order – and rate their adequacy and fluency on a scale from 0 to 100.”

However, in this process, Läubli said it would be “impossible” for evaluators to detect certain translation errors, and thus they were unable to properly take these into account.

He pointed out some of the major problems in Microsoft’s process, among others:

  1. Evaluators were bilingual crowd workers, not necessarily professional translators.
  2. Evaluators only assessed adequacy, not fluency.
  3. Evaluators “never directly compared human to machine translation.” They looked at them separately and assigned scores.”

It’s no surprise to me that these evaluators reached the conclusion that Microsoft wanted them to reach. A translation that purposely ignores context is not a translation. Such a translation is bound to result in major, unforgivable and meaning-changing errors because context is the oxygen without which the meaning of the words, the sentences and of the entire document suffocates and dies.

Läubli, Hennrich and Volk further write that the NMT evaluation methods need to be changed, and even state that “Spreading rumours about human parity is dangerous for both research and practice: funding agencies may not want to fund MT research anymore if they think that the problem is “solved” and translation managers are not going to be willing anymore to have professionals revise MT output at all.”

Let’s say that a well researched scientific paper of several thousand words that is based on the Microsoft style statistical method for machine translation results evaluates the prospect of mankind to survive World War III in a document that is translated from Chinese into English. Even if the conclusion of the article in Chinese were to be that humans WOULD NOT survive such a catastrophic event, based on the statistical evaluation method that compares only words in separate sentences, without paying attention to context (since the meaning of the context is something that cannot be programmed into an algorithm), the result of the machine-translated version could be easily that humanity WOULD survive such an event.

A mistake like this can happen very easily in a machine translation. I have seen it dozens of times in machine translations of German patents when the word no (nicht), which can be in German hidden at the end of a long German sentence where the verb is sometime found, was wrongly assigned or missed by the software, although such a mistake would not be missed by a human translator who understands and pays attention to the context.

Machine programs for Japanese suffer from a similar problem with verbs that can change the meaning of an entire paragraph and are hidden in places where algorithms fail to find them, as well as with continuous series of characters which are often interpreted erroneously, just like compound nouns are often misinterpreted by super-cool MT algorithms in German.

In fact, the more translators use computer tools, for example to ensure consistent terms, the more their translations may be exposed to grave and really stupid errors caused by non-thinking algorithms. Just yesterday I was proofreading an excellent translation of a patent into German in which there were many “compound nouns” containing prepositions and articles that were run together and joined with nouns. After talking to the translator, I discovered that the problem, that was initially invisible to the translator in MS Word, was probably caused by the CAT tool that he was using (the much beloved and celebrated memo-Q in this case).

IT that Helps People Work Better and IT that Replaces People

There is a difference between IT that helps people to work better and IT that replaces them. Computers and IT have been affecting the work of all of us for several decades now, regardless of what we do for living. There have been many articles claiming that most doctors and lawyers will be replaced in the next few years by IT as some professions have been already, and it is likely that, for example, an anesthesiologist could be replaced by a specialized computer as described in this Washington Post article which is already three years old.

After all, a state-of-the-art machine is much cheaper than a doctor!

But would you want to be diagnosed by a machine before an operation that could (or not if it is based on incorrect information) save your own life?

I would want the anesthesiologist to have the best equipment with the best software currently available to do the job. But I would not want to have a machine diagnose me before an operation instead of a doctor because I happen to think that the chances that replacing an experienced and highly educated human called anesthesiologist by state-of-the-art equipment with the best available software would kill me are unacceptably high.

The chances of replacing an experienced and highly educated human translator with a tool that most translators have been using for quite some time, called machine translation, will kill the meaning of the translation are equally high.

But that is not how the “translation industry” looks at machine translation and what it can and cannot do, because the industry is only interested in maximizing its profit.

So it is only natural that the industry would design an evaluation method according to which “bilingual crowd workers”, whatever that means, compare human translation to machine translation by looking only at separate sentences and assigning points to them without looking at the context or at the meaning of the entire article and without actually comparing the human translation to the MT result, so that they (“the bilingual crowd workers” who are not translators and who don’t consider the meaning of the entire document) would find MT to be “equivalent” to human translation.

First you design a method that is guaranteed to prove your point and then you publish the result of your impartial test with great fanfare.

After all, that is how the “translation industry” has been selling its snake oil to gullible customers for quite some time now.


Responses

  1. Reblogged this on Translator Power.

    Liked by 1 person

  2. A friend in Tokyo told me that the new neural net machine translation system that his company has for J>E is around 90% accurate.

    Like

  3. That should be good enough for you and your friend, I think.

    Liked by 1 person

    • I think you may be missing the point. J>E was nowhere near 90% accurate a couple of years ago. That level will be available for low rate Indian translators a couple of years. Of course, 93% is just around the corner…

      Like

  4. The other day I came across a post by a translation/localisation company discussing their methodology for measuring translation quality. They categorised mistakes as “high” being a “showstopper”, “medium” as “may affect your reputation or someone’s ability to understand” and “low” as “probably won’t affect the above”.
    They then gave a (hypothetical) example of a 10,000 word translation that had 22 low errors, 4 medium and 1 high. Their “total weighted grading” for that translation was calculated as 22 + (4 x 10) + (1 x 100) = 162 divided by 10,000 = 0.0162. Or an overall grade of 98.38%.
    So this company could produce a translation that contained a “show-stopper” error, harmed their client’s reputation and caused readers to misunderstand the intended message, yet would give it a 98.38% quality evaluation.
    Stats and numbers can be used to paint any picture you want!

    Liked by 1 person

  5. “So this company could produce a translation that contained a “show-stopper” error, harmed their client’s reputation and caused readers to misunderstand the intended message, yet would give it a 98.38% quality evaluation.

    Stats and numbers can be used to paint any picture you want!”

    The statistics that the “translation industry” is using remind me of the famous question in medieval scholasticism, namely “How many angels can dance on the tip of a pin?”

    The answer to this interesting question from the Middle Ages is not easy because there are many different variables that must be appropriately taken into account, just like with the many different variables for measuring the correctness and quality of machine translation in your fine example of “total weighted grading, such as:”

    “If angels have a corporeal body, then only one can dance on the head of a pin. If angels have non-corporeal bodies – spiritual bodies – than an infinite number can dance on the head of a pin. Are the heavens of Christian theology a material location or a spiritual location?”, etc., etc.

    Medieval scholars were passionately arguing about this weighty issue for decades, just like learned and incredibly brilliant machine translation experts are tirelessly arguing about the measurement and percentages of errors and correctness in machine translation now.

    I believe the scholastic method has later become known as “reductio ad absurdum.”

    The learned disputations of medieval scholars (in Latin) took into account the possibility that it’s also possible that there might be no angels at all, in which case arguing over how many angels can dance on the tip of a pin would be total waste of time. That would be blasphemy!

    Similarly, the impartial tests paid for by MT giants are designed so that they never take into account the fact that if one word can change the meaning of the entire damn document, then statistics assigning mathematical values to how correctly or incorrectly individual words are “translated” in a document containing thousands of words by mathematical formulas are a giant exercise in propagandistic PR aimed at selling stuff to stupid customers.

    That too would be blasphemy!

    Like

  6. “According to a recent article published on September 7, 2018, by Gino Diño in Slator,”

    Hang on, I haven’t had an email from Slator since the beginning of August. I’d assumed they were taking a break for the holidays. Are you saying they’ve been publishing all this time?

    Like

    • Yes, of course. You can go to their site and subscribe. Once in a while they do have something worth reading.

      Like

  7. “After talking to the translator, I discovered that the problem, that was initially invisible to the translator in MS Word, was probably caused by the CAT tool that he was using (the much beloved and celebrated memo-Q in this case).”

    I’m confused, but would be very interested to find out more 🙂

    Liked by 1 person

  8. Honesty just use the Twitter translator and you’ll see even the best machine learning doesn’t fully get the whole meaning.

    Slang and memetics remain big problems

    Like


Leave a reply to translatorpower Cancel reply

Categories