Posted by: Steve Vitek | May 17, 2012

The World Is Still Waiting for the Magical Algorithm for Machine Translation

People who don’t know anything about machine translation (MT) think that editing MT must be a fairly straightforward task that should save a lot of money to people who need to have lots and lots of long documents translated. Quite a few translation enterprises are even betting the farm, so to speak, on this new business model.

A private individual recently e-mailed me a request for a cost estimate for translating a lengthy Japanese patent into English. My estimate was for one thousand eight hundred dollars.

His response was:”I already have a machine translation of this document. I am willing to pay you 400 dollars for editing it and making changes if any are required”. I Googled his name and he seemed to be a prominent consultant for the Democratic Party, unless I got the wrong person. Needless to say, I declined his generous offer.

Why am I not surprised that a prominent consultant for the Democratic Party is a cheap trickster whose favorite method is the bait and switch technique, which is the time-tested method favored by just about every politician?

Contrary to what commercial propaganda would want you to believe, even a very good MT program such as Google Translate will mistranslate so much in just about any text that the result will be in most cases useful only to translators who understand both the source and the target language, which is to say people who don’t really need it.

On the other hand, advances of machine translation, both real and imaginary, already eliminated human translators as some entities that used to spend a lot of money on human translators only a few months ago have switched to MT. Not surprisingly, the government is leading the way when it comes to being penny wise and pound foolish.

I was translating a long Japanese patent for an inventor recently who needed accurate information about a Japanese patent application representing prior art (existing technology) in order to file his new patent application. He told me that he was very happy that the examiner who works for the US Patent Office and who would be examining his application was using a machine translation for examination purposes. I printed out the machine translation in question, and although it was very useful to me, I don’t really don’t know how a monolingual patent examiner could possibly make sense out of the MT product.

It should be quite easy for a patent agent to discredit machine-translated evidence of prior art in a Japanese patent application, which was the only obstacle that this inventor had to overcome.

When I was a budding patent translator 25 years ago, I cut my teeth on translating Japanese patents for the US government through several translation agencies specializing in this kind of work. I did not really know what I was doing at first, but as my rates were low, new work kept arriving by Federal Express (there was no Internet back then) whenever I finished one batch of patents.

After a year or so, I had to find new customers as I started raising my rates. It is likely that this kind of work at very low rates mostly disappeared as a result of free machine translation and competition from third world countries. It is probably much easier now for people who don’t really know the source language or the technical field to translate patents and other complicated documents that a novice in the profession would not dare to touch a few years ago before machine translation became available.

On the one hand, barriers to entry into our profession have been all but eliminated. You don’t even need to have specialized technical dictionaries when most technical terms can be easily translated using MT. The low end of the market thus will be “serviced” by MT, and probably also by translators in low cost countries who don’t really know the source language or English that well.

But as I have not seen a drop in my business, machine translation so far has not been able to make a dent in the demand for patent translators with decades of relevant experience.

There is still this one little detail that MT developers need to figure out first before they can make people like me redundant: they have to design an algorithm that will obviate the need to understand the meaning of the text in one language in order to translate it correctly into another language.

It should be as simple and yet as ingenious as Occam’s razor, a principle formulated by Occam, a monk and logician in 14th century England, which says that “Pluralitas non est ponenda sine necessitate” (multiple entities should not be posited unnecessarily), i.e. when you have many competing hypotheses, the simplest one among them is usually the correct one.

Occam’s razor usually works very well when it is applied to sciences such as physics, or to finding the murderer in the mystery that you are reading (it will eliminate red herrings).

But as it does not seem to work when it is applied to the devious invention of human languages, we are still waiting for the magical algorithm in machine translation that will finally eliminate the need to understand the meaning of what is being said.

About these ads

Responses

  1. Recently, I read the story of Nicola Telsa (theoatmeal.com/comics/tesla), listened to the story of Beethoven told by José Bowen (ted.com), and I find these stories confirmed my suspicion: It is usually not the geeks who win, rather the trickiest douchebags.

    Occam’s razor is a powerful tool to eliminate unnessary hypotheses, even if the simplest theory is not nessarily accurate. It stands in Wikipedia: “In the scientific method, Occam’s razor is not considered an irrefutable principle of logic, and certainly not a scientific result. Solomonoff’s inductive inference is a mathematical proof of Occam’s razor, under the assumption that the environment follows some unknown but computable probability distribution.”

    Now, since MT nowadays is a “statistic” model (a probablistic/stochastic model), it seems to me that MT could eventually be proven the “simplest” solution for translation industry. The douchebags are redefining the product, making division of labor (MT, HT and PE), changing expectations of quality (“Good enough is good enough.”) and offering more consumer choice.

    This is not a bad news for us at all, as you pointed out in a past blog post, Advances of Google Translate Put a Premium on Human Translation. While we are still waiting for the magical algorithm, we can make use of the powerful tool because it is always humans who decide on the accuracy of meaning, as you put it: “The words and sentences magically appearing on the screen in English when I type something in Japanese, French, German, or Russian or Czech are accurate only when I say that they are accurate.”

    BTW, a question for you: Do you think the hamsterizing post-editing can do any good (in respect to quality and productivity) or any harm (in respect to human thinking [intellectual development]) to translation industry (in respect to translators as well as agencies)?

    Like

    • Wenjer:

      You pack a lot in your sentences, almost as if you were writing them with Chinese characters.

      The trickiest douchebags rule the world, that’s for sure. That has always been the case.

      I think that MT with and the consequent hamsterization of translators via MT and post-editing is both good and bad for the translation industry. Some translators will be turned into pitiful human hamsters, while the high end of the market (which is where I would like to think am residing) will be largely unaffected.

      But the Achilles heel of the whole business concept is that since anybody has easy access to cheap or free MT, the hamsters will inevitably in the end revolt against their masters and start using MT to do business with their clients without them.

      That is the main reason why the business model is not viable, but there is also another one: translation obtained as a result of post-edited MT will almost always be inferior to real translation because it lacks the creative spark that MT does not have, even MT that is based on the statistical approach.

      Once hamsters find the courage to jump out of their hamster wheel, they will rediscover their creative spark again.

      Hamsters Of The World, Unite!

      Like

  2. Will never happen! For the pure hell of it, I entered “red herring” into Google’s site and got (Eureka!) Red Herring. Even more perveresely, I entered (from a recent editorial on the DR Congo) “du fil à tordre,” rendered as “yarn twist,” and “panier à crabes,” given back as “basket crabs.” Technical terms might be easier (?) for MT than, say, political stuff, but what does it do with such things as “female” and “male” ends? Ah, yes! Let’s just edit for no money.

    I was happy to enter “Jean Yves” into Google’s pronunciation guide and heard no liaison of the “n” (putting an end to an argument), but “Saint-Saens” did not produce the “s” required at the end, according tothe Petit Larousse. Now then, whom to believe in the absence of Thibaudet, a French person in the know, or the deceased Saint-Saens?

    To “discuss” this with non-linguists is useless. Both sides risk a heart attack, stroke or worse.

    Like

  3. Hi Ricky:

    I know, Google Translate is a really interesting toy.

    Today I was amusing myself with a few foreign proverbs to see English translations and vice versa. I really needed some distraction as I was translating a long patent from 6 AM to 6 PM, with long breaks, of course.

    Even proverbs that have identical or close equivalents were mistranslated. For example “Laska prochazi zaludkem” was translated as “Love goes through the stomach”. This is a fairly good translation of the Czech words, but they actually mean “The way to a man’s heart is through his stomach”. It did not work in the other direction either, the English proverb “What goes around comes around” was translated into Czech as “To co odejde, vrati se” (That, which leaves, comes back).

    And when I tried to have “What goes around comes around” translated into Japanese, I got “ワット·ゴーズ·アラウンド·カムズ·アラウンド” [watto gohzu araundo komezu aroundo”. GT refused to translate it and instead transliterated the English words into katakana, one of two Japanese alphabets used in conjunction with Japanese characters.

    So much for the statistical approach to MT as of May, 2012. You would think that inputting proverbs should not be a major problems since there are not that many of them in any language.

    Like

  4. Sorry about the heavily packed sentences. I hope that they still make sense, somehow.

    When I asked the question concerning good and evil of post-editing, I was thinking of an essay, Translation Skill‐Sets in a Machine-Translation Age, written by Anthony Pym (http://usuaris.tinet.cat/apym/on-line/training/2012_competence_pym.pdf). This essay is quite a surprise for me and some other translation colleagues who have been closely following Pym’s translation theory. However, he got some points, too. (I don’t think we are going to explore the points here. There is a summary of his points and an assessment made by our colleague John Bunch – http://www.bunch-translate.com/2012/01/translator-vs-post-editor-vs-technical.html – which reflects a pretty good overview of Pym’s model.)

    Somehow I feel that the development of TM and MT tools does not affect my business at all. Instead, it helps me quite a bit. I wouldn’t be able to translate 27 manuals of each over 154 pages in the last 3 weeks without a combination of such tools. Well, the portion that had to be translated became, in fact, less than 12% of the whole. That is, I was translating about 2000 words a day instead of 16000 to have the 27 manuals done in 3 weeks. The tools helped to keep the terminology and the style consistent, too.

    The best thing was that I didn’t feel hamsterizing myself in the last 3 weeks post-editing something. I had to translate the 10~12%, anyway. And the money was good, too.

    In Pym’s essay, he argues for (a set of) new skills (cf. pages 7~12 of the essay). I do notice the shift of emphasis since quite a while. However, my concern is about who is going to pay for the development of new skills of the “re-texters” (as Pym prefers to call translators in the brave new world of translation technology). In my case, I am just lucky enough to have the right educational background to acquire the needed skills and the training as well as the tools are paid by clients. I would wonder that any agency would pay all these for their freelance translators.

    In Miguel’s recent blog post, My Dinner with Renato, he explained the difference between him and Renato Beninatto lies not so much in the facts on which they are divided, but rather on the interpretation of those facts. You see, Steve, Miguel is one of those translation colleagues I admire. He is a sharp thinker. I totally agree with his first example for their difference (concerning the thesis of “a call for all translators to try to get into the high-rate sector is self-defeating”). “No,” he wrote, “I would welcome more translators emigrating from the middle of the bell curve, because I think a rising tide could well lift all boats.”

    Now, I wouldn’t believe that any entrepreneur of a translation agency would like to see the emigration of more translators from the middle of the bell curve. You see, Steve, I am afraid that “Hamsters of The World, Unite!” could easily become just another slogan like what they sang in The Internationale. When hamsters are kept busy treading the wheel, they won’t find the courage to jump out of it to rediscover their creative spark.

    You know how they, some of our colleagues, ridicule Wendell Ricketts for his “No Peanuts!” and how they suppress voices like what IAPTI has been raising. So long as the human hamsters keep on treading the wheel, they keep on believing that their only chance for a better life is to keep quite with their profiles on one of those workhouses and to get into the translator pools of some larger agencies, instead of jumping out of the wheel for better a view on this industry.

    I sincerely hope that all translators will find the courage to jump out of the wheel and find a way to migrate from the middle of the bell curve Renato Beninatto drew on a napkin at lunch with Miguel Llorens. But this could last more than a life time before it could happen at all.

    As to whether people would find the magical algorithm, I don’t think there is such an algorithm. Nevertheless, I appreciate the nice tool which does not destroy our living. “There are tools. There are people who use the tools. And then there are people who are tools. Know the difference.” MT will never be perfect, neither any translator. It is after all the translator who is responsible for the quality of a translation, not the tools he uses to achieve the translation.

    P.S. I tried the Chinese proverb “知人知面不知心” (which some people translate inaccurately as 顔を見て人の心のありようを知るすべはない) with Google Translate. It was translated as “clothes make the man” which is a part of a totally different proverb “佛要金装,人要衣装” (The gold coating makes the icon of Buddha; the man is likewise made of clothes). Since there are many proverbs translated, the improper translations of them indicate that GT is not really based on statistical approach. Otherwise, it should have offered one of those existing human translations of the proverb, because the probability of any existing one must be 100%.

    Like

  5. Everything depends on your field.

    My business has been influenced by MT because my clients are using it and I am using it too, but not for example by CATs, although most translators seem to think that Trados is de riguer these days. Well, it is, but only if you work in a certain field and for a certain type of agency.

    I don’t use translation memory tools because my clients don’t ask about them, probably don’t know about them, and if they did, they would probably not want me to use them.

    The thing is, everybody assumes that every other translator is doing exactly the same thing in exactly the same way. But the opposite is true. The translation business is as fragmented as, say, the music business.

    My business is very different from yours because I specialize in patents, and the way you work is very different from the work of a financial translator or a translator of novels.

    Which makes it difficult to make any comparisons that would be meaningful across the whole spectrum of different translation fields, including a meaningful comparison of rates, because different translators work for very different segments of the translation market.

    Like

    • Steve, as it goes with a Chinese saying, there are zillion ways to become a buddha, which means there are zillion ways to achieve the same or similar purposes. I am quite aware of the facts that people are doing different translations and that they do translation with different tools and in different ways, even though their purposes are the same or similar, that is, an economic purpose.

      I do agree that it is difficult to make any meaningful comparisons across the whole spectrum of different translation fields. However, CAT tools with TM and MT are there and they influence nearly the whole spectrum of the industry, except translation of propaganda and advertisements. In my field of technical translation, TM and MT are also powerful tools. The influence is there. But like in the field of patent translation, tools can never replace translators.

      While patent translators make use of TM/MT in different ways, technical translators also make use of TM/MT in many ways. When I said that MT does not affect my business, I mean that MT cannot replace any technical translator. TM/MT are tools and people use tools. That’s why I wrote, “There are tools. There are people using tools. And then some people are tools.”

      Tools people use are, like games people play, they are different. Even the ways people use tools are different. We may not rely on Google Translate and use it only as a dictionary. We let it translate some words, phrases or sentences, but we are aware of the nature of such a tool and decide by ourselves on the accuracy of its translations. Nevertheless, it helps, sometimes. It helps immensely, sometimes, depending on what we are translating.

      What I am saying is that we shall always be aware of the nature of TM/MT tools as tools and that we use tools of our choice, never blindly rely on tools. An MT algorithm that performs like a human translator is not to be expected anytime in the future, no matter how far the tools are getting improved.

      As to the question of rates, I don’t see a problem in it. If a TM/MT tool could help us “translate” 10,000 words an hour, I wouldn’t mind getting paid US$0.01 per word, including the machine translated words or what they call fuzzy matches. But I am afraid there isn’t any CAT tools that can help us achieving the desired economic purpose. Besides, there is another question: Who is going to pay for such tools?

      Like

  6. I wouldn’t discredit MT so quickly–there is trusted software available that is successfully used on a daily basis by many reputable, global organizations. SDL just recently launched SDL BeGlobal Translator for example: http://bit.ly/K0OpHw.

    Like

  7. Brittany:

    MT is frequently discredited by dishonest claims made about it by greedy vendors, not by posts on translators’ blogs.

    We are only describing our experience with the wonders of MT. Most of us like MT because we can use it as a tool, although some translators are scared that it could one day make them redundant.

    But to me, MT is like a huge and free dictionary with plenty of context – what is there not to like?

    Like

    • Ah, Steve, now I see it: We are actually of the same opinion.

      Like

  8. 成程、ね。

    Like

  9. [...] People who don’t know anything about machine translation (MT) think that editing MT must be a fairly straightforward task that should save a lot of money to people who need to have lots and l…  [...]

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 1,299 other followers

%d bloggers like this: