Posted by: patenttranslator | August 29, 2010

A Short Test of the Google Translate Function on a PCT Patent Application Published in Japanese on the WIPO Website

 

Machine translations of Japanese patent applications have been available on the website of the Japan Patent Office (JPO) for about 10 years now. Recently, the World Intellectual Property Organization (WIPO) website added a machine translation (MT) function by incorporating Google Translate in the search function. Entire texts of applications can now be almost instantaneously translated between English, Spanish, Vietnamese, Hebrew, Portuguese, French, German, Japanese, Russian, Korean, Chinese and other languages, such as Czech. Ten years ago, I wrote an article about the MT function on the JPO website for Translation Journal. A post that I wrote a few months ago about the apparent threat of machine translation to human translators generated a lively discussion on this blog. I also wrote a post explaining how to use the machine translation tool available on the Japan Patent Office website, as well as a rather long post for this blog about Internet Resources for patent translators (such as the EPO, JPO, WIPO and DepatisNet websites), which is based on a chapter that I wrote for The Patent Translator’s Handbook published by the American Translators Association. I decided to test the new Google Translate function on a Patent Cooperation Treaty (PCT) patent application published in Japanese and write a post about it for my blog.

I started by searching on the WIPO website for a common Japanese term used in a patent that I translated yesterday. I searched for the  term 記録制御手段 (kiroku seigyo shudan = recording control means), a technical term that I selected at random from my most recent translation that I just finished the day before. Out of 26 patent applications displayed, I selected the first one: WO/2009/084116 (Recording Device, Portable Device, Recording Program, and Recording Method, filed by Fujitsu, Ltd.) and then I translated paragraph 6 from Japanese to English using the Google Translate function because this was the first paragraph with a somewhat long, meaningful description. Item A below is the original text in Japanese, Item B is my translation, and Item C is machine translation obtained with the Google Translate function.

Item A – Original Text in Japanese

[0006]

携帯装置に搭載されているTV受信機能及びその記録機能を用いて放送番組を受信し、記録する場合には、移動中に放送受信や記録ができる利点があるもの の、場所や時間によっては、電界状態や受信感度の影響を受け、記録画像の不鮮明や、録音劣化等の不都合がある。画像品質が悪い場合、その状況を表すメッ セージを画面上に表示(特許文献1)しても、ユーザが記録を切望している場合には斯かるメッセージは無意味であるし、電界強度が低下した場合に記録を停止 することは(特許文献2、特許文献3)、この場合もユーザが記録を切望している場合には、ユーザの期待を裏切ることになる。画像品質の良否、記録の要否は 番組内容やユーザによって異なるものである。このため、記録停止や画像劣化を表すメッセージを自動表示したり、それを記録することは、記録の有効利用を損 う等、ユーザの要請に沿っているとは言えない。

Item B – My Translation

[0006]

Portable devices equipped with a TV reception function use this function and a recording function to record broadcast programs. Although the advantage of these devices is that programs can be received and recorded while the devices are being moved, the status of the electromagnetic field and the reception signal sensitivity can by influenced by the location or by the time, and the disadvantage is that the recorded image is in some cases not clear, or that the sound quality can deteriorate, etc. If the user really wants to record something when the image quality is poor even though messages indicating this status are displayed on the screen (Patent Reference 1), the messages may be meaningless. However, when recording is stopped if the intensity of the electric field is decreased (Patent Reference 2, Patent Reference 3), the user will be greatly disappointed if there is no recording. When the image quality is poor, whether the recording will or will not be required will depend on the content and on the user. Therefore, when recording is performed while messages are automatically displayed prompting to stop recording or indicating a poor image, the recording may no longer be useful since the recording is not necessarily performed according to what the user really wants.

Item C – MT Text Obtained with Google Translate

[0006]

Is the machine for Mobile TV to receive broadcast program for recording function and its receive function, if recorded, despite the advantage of recording and broadcast reception during the move, the place and time, the field affected states and receive sensitivity, and smearing the image recording, the recording deterioration disadvantages. If image quality is poor, a message on the screen represent the situation (Patent Document 1) but then if you are keen to record the user message is meaningless Magical biggest cotton textile, electric field strength decreases If you stop the recording (Patent Document 2, Patent Document 3), if you are keen to record the user in this case, users will be disappointed. Quality of image quality, the necessity of recorded programs which vary depending on the content and user. Therefore, you can view messages automatically stops recording and representing the image degradation, to record it, and harm the effective use of records, along with the request that the user can not say.

If you read my translation first and then read the Google translation, you can more or less understand and follow the meaning of the Google MT product. Even if you don’t read my translation first, you would probably be able to understand most of the translation produced by the Google software. The MT function is very useful considering that the alternative here for people who don’t know Japanese, which means most people on this planet except those who happen to be Japanese or foreigners who spent decades trying to learn the language, would be – no information about the Japanese text at all. But I must say, the quality of the MT product is not very different from the result of the MT software that I used for a similar test on the JPO website 10 years ago, see my article for Translation Journal from July of 2000. Although Google Translate uses a radically different statistical approach to machine translation, the result is in my opinion not very different from other types of MT software and in some cases it may be even worse than what one would expect from Systran-based machine translation tools such as Yahoo Babel Fish, see next paragraph.

The Magical Biggest Cotton Textile Mystery

I have no idea how this “magical biggest cotton textile” ended up in the Google translation. There is nothing even remotely similar to this wording in the Japanese text. I sometime look at machine translations from the JPO website, for example if I want to make sure that I did not skip anything, which is a mistake that human translators will often make. I sometime see hilarious bloopers in the MT product on the JPO website, but if I carefully read the Japanese text, I can always trace the origin of the nonsensical English formulation back to an unfortunate (or fortunate if you appreciate the entertainment value) sly combination of Japanese characters. But not in this case. Is this “magical biggest cotton textile” a contamination that is specific to the statistical model? Can somebody enlighten me as to what might have happened here? I would really appreciate it.

Disclaimer – There Is No Such Thing As A Perfect Translation

My translation is only my interpretation of the original Japanese text. Other translators could translate the same text somewhat differently, and I could have translated it differently under different circumstance – for example, this morning I had two large cups of coffee so far (French Roast purchased from my friendly local Food Lion supermarket, which is in fact a Belgian Company although I yet have to meet a Virginian or North Carolinian who actually knows that). With 3 cups, or with a different brand of coffee, the resulting translation could be a little different.  But unlike Google’s MT product celebrated frequently in newspapers as “the new tool that will eliminate the language barrier”, I do believe that my translation expresses what the author of the patent application wanted to say in Japanese.

Just about every article about machine translation ends with words of caution along the lines of “this product still needs improvement, some tweaking, more work”, etc. The companies selling MT can obviously never admit that machine translation will never break the ultimate barrier – the barrier of meaning. Unless you understand the meaning of words, you are merely replacing words by other words in another language according to some algorithm, not translating. I think that the statistical approach to machine translation, pioneered by Google, is just another dead end. It may work very well for some applications but as my simple test seems to indicate, it is not likely to put human translators out of business. In addition to machine translation, Google is also working on other new applications for artificial intelligence that humans have been dreaming about for a long time, such as a self-driving car. Based on this New York Times article, they have been quite successful in this area, although truth be told, I’ll believe it when I see it. I imagine that taxi drivers reading the article linked above experience feelings similar to those experienced by professional translators when they read enthusiastic descriptions of breakthroughs in machine translation.

I would love to be driven by my car instead of having to drive it. And I think it is likely to happen some day, perhaps even soon. But unlike self-driving cars, I think that machine translation that is just as good as what a good human translator can do will not be available to us. And I don’t mean any time soon. I mean ever. Or at least until somebody figures out how to teach computers the meaning of meaning, or what in fancy MT speak is sometime referred to as disambiguation. If that ever happens and computers start understanding that they are merely our slaves, the computers just might decide at that point to get rid of humans. After all, who needs humans when computers understand the meaning of everything just fine.

http://www.youtube.com/watch?v=G-_-l_NaDcw&feature=related


Responses

  1. “The Magical Biggest Cotton Textile Mystery!” LOL, I cannot stop laughing at this. Doesn’t sound very patentable.

    I’m a professional Japanese to English translator myself, and I am often peeved when people ask me how it is I make money when you can just translate things online.

    I actually read the Google translation in your article first, and it makes absolutely no sense to me, whereas your translation makes perfect sense.

    This is my first visit to your blog but not my last! Keep the articles coming!

    P.S. if you have any projects you need help on just drop me a line 😉

    Like

  2. Thank you for your comment.

    The Magical Biggest Cotton Textile Mystery remains unsolved.

    I talked to a German translator recently who said that she was really impressed by MT translation of Le Petit Prince by de Saint-Exupery which was based on Google Translate. Maybe they just retrieve an existing translation, since this story must have been translated into English many times.

    A software that can just locate an existing translation will have excellent results, of course, because the translation will be based on the original translation by a human being.

    But I am not sure how they really do it with Google Translate.

    Like

  3. […] a short section of a Japanese patent and the results were pretty horrible. You can read about it in my post here. Incidentally, it is really hard for some reason to find this post on Google. If you Google the […]

    Like

  4. […] you are interested in what I think about Google Translate, you can read my post “A Short Test of the Google Translate Machine Translation Function” here on my […]

    Like

  5. Thank you for sending me to this article. It was a great read. Your comparison to the self-driving car is an interesting one, as both machine translation and Google’s self-driving cars seem like something out of the “future” as far back as the 1950s science fiction. However, I think functional machine translation will beat the self-driving car to mass market only because a single accident on the road can doom a self-driving car, while people seem to put up with countless mistakes in translation before batting an eyelash. That is, of course, unless they’ve paid money for it!

    Like

  6. Thank you for your comment.

    Machine translation is functional already and if you keep tweaking it, it will be slightly further improved with time.

    But I don’t think that it will ever be as good as human translation done by qualified human translators because the category of meaning is something that only a human brain can understand.

    You would have to program every possible alternative into the MT software for any type of meaning of any word in any context to create MT that really works well, which is impossible.

    Incidentally, after I wrote the post about my test of the Google translate function, my website’s ranking in Google dropped significantly.

    I wonder if there is a connection there.

    Like

    • “Magical biggest cotton textile”, partially with Kanji, appears in the following article:

      Soup, Hiroshi Ito (Ito stands) and the ordinary becomes a cloud of mediocrity is not a matter of great importance for his death or loss of the world’s largest Dokoro is a small loss to Japan and not you. (Omitted) 43 days of Meiji, Tokyo, and poured all the land of their uprising in the land would meet their aspirations,聴Kazun happen if a riot extreme若Shi Hiroshi public opinion will be faced with a lot of, not decades later餘義which he could have the confidence to 期Shi 達Seshi skillful, and Hiroshi Ito 在朝 fellows, is the only free 其 gave in to demands of the times. Ga如Ki to extol the head and pointing to the big fellow Constitutional Magical biggest cotton textile is the fear of persecution past events Freedom and Human Rights who is to ignore the patriotic甚Dashiki Seshi argued that the private sector. (Omitted) 非命 death of compassion, sorrow to the dead from the normal of humanity, but also so do not blame 其事 Soup, and a crazy sense of sorrow about the deaths 其 過Goseshi is rather strong opposition be.

      – Gaikotsu Miyatake , funny newspaper in Osaka [11, No. 25, No. 26通巻

      Like

  7. When Google translates Japanese to English, its translation for a certain difficult-to-be-translated word may probably have been based on Lullar Data (Japanese Wiki Translated to English), which is a totally nonsense machine translation. The nonsense translation is shown in my first comment
    as above.

    As can be seen from identical articles appearing in Japanese Wiki and Lullar Data, the “斯” is always put into “biggest cotton textile” by mistake for a reason that I do not know. And, when “斯” is followed by “かる” sounding like “cal”, the “Magical” is prefixed or suffixed to “biggest cotton textile”.

    斯かる状況の下
    Magical biggest cotton textile circumstances,

    斯かる状態に対して
    The biggest cotton textile Magical State,

    斯かる人と交ることなく
    people without Magical交Ru biggest cotton textile

    斯(か)う答へるんです
    biggest cotton textile

    斯ては武家にて如何ほど勇決するも其詮なく到底大事は行はれがたし。
    Isami as shall be determined by whether it is in the biggest
    cotton textile samurai Gatashi 到底 is important even without
    swelling 其詮 line.

    Google now translates the “斯”, still in a wrong, nonsense way:

    斯界 view
    “斯界” (this society or this field of study) is a homonym to
    “視界” (view).

    斯かる Si hunt
    “hunt” is equivalent to “狩る” being read as “karu”.

    斯業 opening
    “斯業” (this business or industrial field) is a homonym to
    “始業” (time for opening a shop or for starting the day’s job).

    Like

  8. Thank you very much for helping to solve the mystery. It does not make a lot of sense to me to base a new machine translation on an existing machine translation. I thought the Google model was based on trying to find a match with existing human translations which would make sense to me.

    I often use machine translations of patents in French, German and Japanese available on the EPO, WIPO and JPO websites. The MT product from French and German is often quite good. Completely nonsensical translations are generated occasionally in machine translations from these languages, but the translations are generally useable for the purposes of basic understanding of what it in the patent.

    With Japanese, however, it is a different story. The resulting MT product is mostly incomprehensible.

    Like

  9. In trying to find out a stupid source of translating “斯かる” into the “Magical biggest cotton textile”, I seem to have come across a serendipity.

    斯かる人物を推薦することはできない。

    Livedoor 翻訳
    I cannot recommend such a person.

    Livedoor 翻訳(The above English translation is translated back to Japanese.)
    私は、そのような人を推薦することができません。

    As far as “斯かる” is concerned, the above translations are found perfect.

    You may further like to scan the following examples:

    1つの風力タービン40だけが示されているが、必要であれば、所望の最終目的に適合するよう任意の数の小型風力タービンと関連タワー構造とが使用されうることが意図されている。

    Google Translate…out of question
    Have shown that only the turbine 40 wind one, if necessary, it is contemplated that may be used as structural tower and associated small wind turbines in any number to meet the goal desired.

    Liverdoor 翻訳( http://livedoor-translate.naver.jp/ )…narrowly acceptable
    Only one wind turbine 40 is shown, but it is intended that an arbitrary numerical small size wind turbine and associated tower structure can be used to adapt to a desired last purpose, if necessary.

    a native English speaker (USP 7,105,940)
    Although only one wind turbine 40 is shown, it is contemplated that any number of small wind turbines and associated tower structures can be employed suitable to the desired end purpose, if necessary.

    Let me add that the Liverdoor translation in general is yet laughable in many other trials that I made.

    Like

  10. Interesting examples.

    However, most people believe that most human translators will be replaced by MT in a few years, regardless of how poor the real results of MT are.

    Somehow, once the bugs have been dealt with, the software will start working the way it is supposed to work about 5 years from now, which is what they were saying 5 ago, 10 years ago, 15 years ago, 20 years ago ……

    Like

  11. As you have so far remarked repeatedly, translation software is simply replacing but not translating.

    “人工無脳 but not 人工知能 (Artificial Intelligence)” in such software is clearly evidenced even in a short sentence as under:

    Wind striking against the large surface area of the deployed solar cell array can create large tipping forces.

    Livedoor 翻訳
    配備された太陽電池配列のかなりの表面積に反対してストをしている風は、大規模なチップ軍隊をつくることができます。

    Google Translate
    展開太陽電池アレイの大きな表面積に対して打つ風が大転倒力を作成することができます。

    I will be stepping out of this tedious topic, for now.

    Like


Leave a comment

Categories