Posted by: patenttranslator | September 1, 2017

Post-Editing of Machine-Translated Patents by Human Translator Would Be Tantamount to Committing Professional Suicide


Many people who have never translated a patent have the wrong idea about what patent translation involves.

They often think that every patent is pretty much the same kind of very technical, dry, and boring document, a relatively simple thing from a language standpoint if the technical terms are known, that should be ideal for processing with Computer Assisted Tools (CATs), or even for “translating” with machine translation, which can then be “post-edited” by a human translator.

Put your CAT to work, or “translate” the dry, boring and highly repetitive uninspiring piece of technical writing with machine translation, and then you should be able to edit it with the CAT or machine translation product relatively easily.

That is what some people think, including translators, although only translators who do not translate patents, which is the majority of them.

I am not surprised that project managers working for translation agencies in translation industry 2.0 misunderstand what patents are and how they can be translated, because let’s face it, most of these PMs don’t know anything about translation, especially those who work for large predatory translation agencies intent on maximizing their profits ad infinitum at the expense of translators.

What I find surprising is that even translators who translate other fields seem to think that patents are eminently suitable for processing by machines and software. There are some patent translators who use CATs, and some swear by their darling CATs, but I am not one of them.

I don’t trust CATs to not make major errors that I may or may not be able to catch if I try to “produce more words per day”, and I happen to know that my customers don’t want me to use them. In fact, although many translation agencies love it when their translators use CATs because they are such convenient tools to control translators, most direct customers would stop sending me work if they thought I trusted a software tool not to mess up a patent translation.

Although I use machine translations of patents frequently, more or less every time when I can use them, I am aware of the many pitfalls of machine translations of patents.

Machine translations of patents have been available on the websites of organizations such as EPO (European Patent Office, WIPO (World Intellectual Property Office), or JPO (Japan Patent Office) for about two decades now.

I usually try to locate a machine translation of a patent document and when it is available, I print it out and look at it, especially as I start translating the document.

But I happen to know that trying to edit machine translations of patents, namely the kind of “post-editing” the translation industry 2.0 recommends and believes in with all of its greedy heart, would be tantamount to committing professional suicide for human translators.

Although a machine translation of a patent may look like a real translation that has been done by an experienced human patent translator, it is nothing of the sort.

It is important to keep in mind that machine translations are simply matched segments of translations that were originally provided by human translator, which is why they look like real translations. These segments are matched based on a text that is produced by the machine translation software, but they are not actual translations of real human translators.

And although large segments of the original patent document may be translated seemingly well by the software, every single word that was processed by a software package would still need to be validated by a human translator to avoid the many pitfalls of machine translations.

Because this is extremely time-consuming work, the “time savings” loudly celebrated in the propaganda of what I call translation industry 2.0, are only a propagandistic illusion.

Some errors due to mismatching or other glitches and errors of machine-translated segments would be easily identified by a human translator during “post-processing” because they are simply too ridiculous given the context of the document.

But mistranslations that seem perfectly or somewhat plausible in the context of the document may be very difficult to detect, even by an experienced and qualified human translator, even though they may be completely wrong.

For example, I have noticed on many occasions that the machine translation software that is used on the European Patent Office or WIPO Office websites sometimes mismatches entire segments of documents, or the machine translation may be based on the wrong version of the documents when there are several versions of “the same document.”

This does not happen that much in the main text of the patent application, called “Description” in English. Machine translations of the segment called “Description” usually only contain the mistakes that we generally associated with typical machine translation software glitches. But it does happen quite frequently with the section of the patent application called “Claims”, which is the section that basically identifies what is new in the patent based on existing technology, or what is called prior art.

The reason for this is simple. When a new patent application is filed, it is first examined by patent office examiners who accept the new invention as “patentable”, but only with the provision that the claim section needs to be changed and the application must then be resubmitted to the patent office, usually when the claims are overly broad, or not clear enough.

This means that some claims are deleted, some may have to be rewritten, and new, more restrictive claims are added.

And because the patent application may be returned for new formulation of claims several times, the software used by the patent office could then be using an older version of the document which still contains the claims that were rejected by the examiners, but which are no longer relevant.

The content of the new claims will then be completely unrelated to what the machine translation spits out and even the number of claims is usually different.

Translation industry 2.0 likes to celebrate new tools such as CATs and machine translation as revolutionary, “disruptive technology” that will completely change the way translations are produced and delivered so that eventually, these tools may for the most part replace human translators, the way video streaming replaced DVD rentals, or the way Uber is slowly replacing traditional taxi services (and self-driving cars may eventually replace Uber and Lyft).

But they either don’t understand or pretend not to understand that translation tools—wonderfully helpful and exciting as they are, especially for us, translators—will never be able to replace the minds of human translators.

The only “tool” that, unlike software, is able to be actively engaged in intellectual activity that can create real translation is the human mind. And this is true not only about literary translation or financial documents, but also about translations of the dry, highly technical and sometimes boring documents called “patents”.

It so happens that tools, including software tools, are and always will be only tools.

And no matter how helpful, useful and innovative these tools are in the hands of translators, despite what the many smarmy vendors of snake oil in translation industry 2.0 with big dollar and Euro signs in their eyes try to make us believe, these tools will never amount to a suitable replacement for intellectual activity that is required for a real translation, and especially for translation of patents, which are also referred to as intellectual property.

Advertisements

Responses

  1. Sure, patent prosecution means that the claims of an issued patent, or even the claims of a patent application late on in prosecution, may well be different from those in the published application (which is, except infrequently, the application as filed); but as long as you realize that what the MT at esp@cenet hands you is MT of the published application, you should be OK as a starting point. It all depends on what you’re asked to translate.

    Like

  2. Of course I realize it, I can read both languages.

    But do the patent lawyers that I work for realize it?

    That I don’t know.

    Like

    • I would hope – I’m a patent attorney, as well as being a former translator.
      But the “you” of my original message wasn’t addressed at Steve personally, it was addressed at users of esp@cenet MT: I should rephrase as “but as long as the reader realizes that what the MT at esp@cenet provides is MT of the published application, that should be OK as a starting point.”

      Liked by 1 person

  3. You are absolutely right. For one thing, few languages have as many words in their vocabulary as English and may use one word where a variety of words could be their equivalent in English. For instance, in a patent I am currently translating there is the words “volet” and “obturation” are used, both of which could be translated as “shutter” but you can’t “shutter” a “shutter”! The only way CAT translations are successful are for totally routine texts such as weather forecasts (“winds light to variable”), I bet you could write a book of hysterical mistranslations that were the result of using CAT or MT. Ditto for menus. I have just returned from interpreting abroad and EVERY SINGLE restaurant or hotel menu contained errors, some of them really amusing!

    Liked by 1 person

  4. CAT and MT software developers like to add the adjective “neural” to the latest versions of their software precisely because by now they probably understand that what the software lacks is “nerves and brain.”

    Because the software has no brain capable of actual logical thinking, despite the creative but for the most part fictional “neural” label, it is not even able to detect absurdity and the result of the processing with a tool is sometime hilariously funny.

    A tool is just a tool, and it so happens that tools don’t have a brain, even if you put the creative label “neural” on these tools.

    But there is no question that the label “neural” is a great marketing tool, just like the “ISO certification” label is a great marketing tool for translation agencies.

    Liked by 1 person

    • 1) “CAT and MT software developers like to add the adjective “neural” to the latest versions of their software, precisely because by now they probably understand that what the software lacks is “nerves and brain.””:

      exactly!! 😀
      This must be re-posted, tweeted, etc etc! 🙂

      2) “the creative, but for the most part fictional “neural” label” : 🙂

      Yeah, those crooks have a lot of marketing creativity. As to the rest… 😦

      I find your latest 2 blog posts particularly well written:
      not too long (to the point), well-structured (demonstration),
      with well-chosen terminology, interesting – and fun!

      Liked by 1 person

  5. Thank you. Music to my ears.

    Like

  6. I love your blogs Steve. As a patent translator myself, I find it quite extraordinary (preposterous?) that the EPO let Google persuade them to the tune of $$millions to develop GoogleTranslate for patents in the expectation that it would solve their problem of translation for new patents or or patent applications (it is helpful for human translators, of course, as you note). As you also mentioned, the mega data that GoogleT holds was originally produced by human translators. But what does a patent need to be first of all? Novel, of course! So every new patent would not precisely match what is in the database. GoogleT works for patents just like it works for everything else–gisting (with its ensuing hilarity). So why did anyone at EPO originally think that this effort would actually work to eliminate the need for human translation???? And, you and I (as well as other patent translators, I am sure) have not really lost much business because of it.

    Alice

    Liked by 1 person

    • *Did* they think this, or is it just an aid, accompanied by numerous caveats? I’ve never used it.

      Like

      • I think that the EPO certainly hoped this would happen. Here is an excerpt from the EPO’s announcement in 2012 after it launched Patent Translate:

        “The cooperation with Google launched less than one year ago has already led to a significant improvement in the quality of the machine translation of patents. This was achieved by the introduction of several hundred thousand high quality translations of patents in the seven languages provided by the EPO, which Google used to ‘train’ its Google translate system. Further gains will be achieved as more language corpora are added over time.”

        Like

  7. Thank you for your comment, Alice.

    I think patent translators did lose some work as a result of machine translation, but mostly just translations that were not really necessary, especially for prior art. It is now easy to determine which patents are not relevant enough to warrant a translation, even with languages like Japanese, while 20 years ago this was not possible.

    I used to translate mostly older patents for information about prior art, now about 80% of patents that I translate are for filing purposes rather than prior art research, and I think that this is partially the result of the information that is easily available on EPO and other websites, including machine translation.

    Did EPO originally think that Google Translate would eliminate the need for human translators? I did not know that.

    That’s so funny!

    Like

    • Yes, Steve. That is generally true–we do many more translations for filing than we did 20 years ago. But I think at least part of the Japanese decrease is due to their economic issues. We are doing a lot more Chinese patents these days instead (but I do find the Chinese clients are a bit more difficult to work with than our former Japanese clients…)

      I do think that there is a lot of hype out there that MT will replace human translators very soon (now that it has gone “neural”). As a colleague, Ken Kronenberg, reminded me today, RWS intends to train its 100 in-house translators as post-editors of MT…its main fields are patent and medical translation. Check out recent Slator’s article on RWS:
      https://slator.com/financial-results/never-buy-transperfect-says-rws-chairman-andrew-brode/

      Like

      • I usually read or at least scan articles in Slator, but I missed this one. Thanks for reminding me about it, Alice.

        If RWS is not interested in buying Transperfect, the most likely reason for that is that they don’t have the money to do that.

        They have been buying translation agencies for decades. So instead of buying it, they’ll just throw a little mud on their main competitor.

        Ha, ha, ha – so what else is new?

        And if they turn their 100 internal translators into “post-processors” of the MT detritus, which would indeed save them a lot of money because they would no doubt be paying pittance to the newly minted low-level employees, about as much as the cleaning personnel, all of the good ones will leave because even slaves have their breaking point.

        But you and I, we both know that no matter how good the “post-processors” who may stay would be, the post-processed texts of the patents will be pretty bad to really horrible, and their clients will eventually catch on.

        I hope the leaders of RWS will go for the post-processing strategy.

        It would be very good for actual human translators of patents.

        Because I have been making a living as a patent translator for the last three decades, I remember well the shifting goalposts in the last three decades of magical thinking about machine translation.

        Thirty years ago, the reason why human translators were not replaced by machine translation was the slow speed of computers. Once the speed was sufficiently high, translators would be eliminated, the wise heads were saying.

        The computers got terrifyingly fast, which was a development that was enthusiastically welcomed by translators.

        Ten years ago, the magic word was not speed, but “corpora”. Once software like Google Translate had enough nutrition called “corpora”, meaning billions of words, translators would be history. GT was fed billions upon billions of “corpora”, which made the MT machine a little better, but still not good enough to eliminate the need for lowly human translators.

        These days, the magical thinking goalpost has been moved to the concept called “neural systems”. This is very clever terminology because it makes people who don’t understand what the main problem of machine translation is, which is 99% of the population, believe that the software has somehow grown a brain, or something like a brain, or at least close enough to it.

        Every time when a goalpost has been moved, machine translations got a little bit better, but only incrementally so, because the damn machines still don’t understand anything about the meaning of the words that they are processing at such a breakneck speed.

        And every time, the major beneficiaries of the incrementally improved machine translation software were translators, who were supposed to be eliminated by it.

        This interesting development reminds me of the quest of bearded sages of 17th and 18th century called alchemists, who were trying very hard for a couple of centuries to discover the best “transmutation” method for turning base metals, like iron, into noble metals, mostly gold.

        The whole thing was impossible, but smart alchemists never let on what was really going on because if they did, they would lose the financing from their rich, greedy and stupid benefactors at various courts in Europe.

        And although the alchemists never discovered how to turn iron into gold, in the end they discovered a new science called “chemistry”.

        When it comes to magical thinking, the magic word is not “Rumpelstiltskin”.

        The magic word is “Financing”.

        Like

  8. “What I find surprising is that even translators who translate other fields seem to think that patents are eminently suitable for processing by machines and software. There are some patent translators who use CATs, and some swear by their darling CATs, but I am not one of them.”

    Once again, Steve, you appear to be confusing MT and CAT tools. Translation memory tools do *not* process patents: I *translate* patents using TM tools *as an aid*, just as you use MT as an aid.

    Incidentally, I processed an amended patent claim from a patent opposition in DeepL last week: the result was rather scarily good, and I was a bit worried until I put in another sentence which it tripped up on. If you’re still stuck on GoogleTranslate, you might want to investigate DeepL.

    Like

  9. It warms my heart that you love your CAT so much, Alison. Confused as I am, I am often not sure how to find my way out of my bed when I wake up in the morning. One of these days I may just stay in it given how complicated this task is for me.

    But I prefer dogs, pit bulls in particular.

    I have been using DeepL in addition to Google Translate and Microsoft Translator and my experience has been similar to yours. DeepL is sometime better than Google Translate, and sometime it mistranslates the whole sentence.

    But then again, Microsoft Translator is sometime better than Google Translate too.

    I am thinking of writing a post comparing DeepL, (I dislike the name, which says nothing to anyone, such as myself, who does not already know what it means), to Google Translate and to Microsoft Translator.

    But maybe I am writing about machine translation too much when I should be writing about more interesting subjects, like for example Lady Gaga and her apparent uncompromising bisexuality.

    Liked by 1 person

    • The MT providers produce similar output because they all follow more or less the same cookbook, namely LSTM (long short-term memory) neural networks trained with colossal datasets.

      LSTM is the latest iteration of recurrent neural networks, and it produced an incremental improvement in MT and other applications. You can find a technical summary at http://colah.github.io/posts/2015-08-Understanding-LSTMs/

      Like

      • But no matter how long the short-memory is and how colossal the database sets may be, the problem is that a tool is still just a tool.

        We can keep improving tools. But only incrementally. Tools do not have a brain and therefore cannot make decisions about the meaning of anything.

        And you and I know that you can’t translate anything and know that your translation is correct unless you understand the meaning of the original text and of the translation.

        Only human brain can do that.

        Tools, no matter how sophisticated they may be, don’t understand anything and never will.

        A dog or a cat is a million times more sophisticated than the most sophisticated computer tool because animals are living beings, capable of understanding, although on a different level than humans.

        Like

      • Sure. My point is rather that MT quality should be similar across providers–just as you pointed out–because they follow the same state of the art. Barriers to entry are low (you can build a basic LSTM on a desktop machine) and substantial innovations to the actual algorithms are a lot less plentiful than the hype around them.

        By the way, LSTMs were initially developed in Germany. Maybe that explains the clumsy name.

        Like

  10. You are right on about moving goal posts over the years in MT Steve. Why do people continue to buy the hype? And then the totally insane amount of money people will invest in the translation business that builds on this hype–is Transperfect worth anywhere near the purported $600 million to $ 1 billion? (Well, maybe it is, if it is thought of as the next Facebook–but how long can the bubble last?)

    Like

  11. They buy the hype because they are ignorant.

    “No one ever went broke by underestimating the intelligence of the American public.” (From a column by H.L. Mencken published in Chicago Sun in 1926.)

    Like

  12. “Barriers to entry are low (you can build a basic LSTM on a desktop machine) and substantial innovations to the actual algorithms are a lot less plentiful than the hype around them.”

    Thanks, I did not know that. As a lowly translator, I thought that the barriers to entry into this business are relatively high.

    This explains a lot – such as why there are so many so many alchemists out there with their own incredibly brilliant method for transubstantiating words into translation, instead of transubstantiating non-noble metals into gold.

    Like

  13. I’m not sure why but this weblog is loading very slow for me.Is anyone else having
    this issue or is it a problem on my end? I’ll checkk back later and see iif the problem still exists.

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: