Posted by: patenttranslator | December 23, 2019

The Degree of Confidence in the Reliability of Machine Translation Remains Unchanged

This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output.

(A warning notice displayed prominently above the machine translation function on the recently redesigned website of the European Patent Office).

The European Patent Office recently updated the design of its search pages.

I remember two contradictory feelings that I had when I saw that the machine translation function improved quite dramatically on the European Patent Office (EPO) and Japan Patent Office (JPO) websites about 15 years ago. The first feeling was one of elation, when I realized how useful much better machine translations would be to this human patent translator.

The other feeling, however, was a feeling of dismay mixed with trepidation. Will my services still be needed if my clients can figure out the meaning of complicated sentences in patent descriptions from much more accurate machine translations? This second type of feeling was further aggravated by defeatist comments of some readers of my frequent posts about machine translations when I started my new blog, the one that you are reading now, about 10 years ago.

It actually annoys me when the European Patent Office, Japanese Patent Office or German Patent Offices website update their web pages, because I have to learn yet again new tricks to quickly achieve the same searching results that I have been used to having in a few seconds for a long time. Sometime it feels to me like they are doing it to me on purpose out of Shadenfreude.

Since I have also been putting links to various search functions of several major Patent Offices also on my website to facilitate searching for patents, for instance for legal secretaries and inventors, this means that I need to update the links again.

Having machine translations available for translation of patents is very useful to human translators for a number of reasons. Not only because we save time because translators can see based on the machine translations which technical terms can be translated in a certain manner, which is the most obvious advantage. Incidentally, it should be said that translators are nowadays more or less forced to use terms listed on the website, because these may be the only terms available to our clients, who may be sharing them with their clients.

An important advantage of machine translations is that while they always contain mistakes, they usually do not contain mistakes that are typically made human translators … precisely because they are created in a mechanical manner by machines using algorithms, without ever getting tired as a human translator would.

Mistakes that are typically made by human translators, including by this patent translator, are for example these:

1. Misreading a number, for instance misreading the number 3 as number 9 or vice versa.

2. Skipping a number, a word, or for instance an entire line with 5 to 10 words on it depending on the format of the patent publication, when the translator must continue translating despite being tired and then skips a part of the text and continues on the next (wrong) line.

3. Mistyping a word without realizing that the wrong word has been used.

There are quite a few mistakes human translators sometime make, especially when they are very tired, for example when a client insists on a rush delivery. But despite the fatigue caused by rush work that is often inevitable, a good human translator should be able to catch all of his or her mistakes later during the proofreading phase, preferably after a good night’s sleep.

Machines don’t make these mistakes because unlike humans, they never get tired. Machines keep mechanically processing the texts for as long as they’re turned on. Humans always get tired after working for a certain period of time, and that’s when mistakes like this start creeping in.

But even after more than half a century of constant and clever improvements in the development of machine translation, the designers of machine translation packages seem unable to fix the same dumb mistakes that machines do, obvious mistakes that from the perspective of a human translator would seem inexplicable.

I will point out only one particular problem here, a problem that is often overlooked in patent translations, to keep my post short and sweet today.

Apart from obvious errors, such as when a completely nonsensical word is used for a particular technical term when an algorithm goes haywire for some reason, it often happens especially in long patents that the same terms indicating the same parts or elements of the invention are very often translated with different terms. This must be very confusing because it is then not clear to the reader of patent specifications whether these are the same or different parts and elements, which is one reason why a patent translation is then unreliable, even if it may be understandable.

While obvious mistakes can be easily fixed by a human ‘post-processor’ with so called ‘post processing of machine translations’, a technique enthusiastically embraced by some translation agencies as a time and money saving technique, the truth is that an extremely time consuming process, in fact so much so that proper “post processing” would take longer, often much longer, than a complete retranslation from scratch.

The problem with the ‘post-processing’ is that it can be only done by a human translator, as we cannot expect a faster computer equipped with better software and better algorithms to fix the mistakes that often occur in the machine-translated text. To do something like that properly means that the fixing would need to be done by a qualified and knowledgeable human, who unlike machines, who unlike a computer understands the meaning of the translation.

Machine translation software can only find similar texts that have already been translated by a good human translator and insert them into the machine translation output at lightening speed. That is why the translation may look very good. But unfortunately, or in fact fortunately for translators like me, because even a very similar translation will not be exactly the same as a previous real (i.e human) translation, in fact the very opposite might be meant by a very similar text, ‘It cannot be guaranteed that [a machine translation will be] intelligible, accurate, complete, reliable or fit for specific purposes [and why] … Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output.’

In other words, machine translation is an excellent, time and money saving solution … but this solution can be used only if it does not matter that the machine translation may have the opposite meaning than- that of the original text.

Although thanks to much improved software design, machine translations look often now like real translation, because the degree of reliability of a machine translation has hardly improved over the last half a century at all, they should be only used when the reliability of the translations is not really an issue.


Responses

  1. Current approaches to machine translation are based on machine learning techniques which are more accurately described as “statistical inference based on large data sets.” Feed a million pictures of dogs into the neural net and it will extract features (colors and shapes) that represent “dog-ness” that it can use to recognize new dog photos it has never seen before. It is basic pattern recognition. However, as expert linguists well know, recognition is merely the first step of the translation process. As you rightly point out, these systems can work well if the new material is very similar to the data the system was trained on. But we already have CAT tools that do the same thing for far less cost. One aspect that is often ignored is that ML systems require a lot of energy to train and update. I think that if you factor these types of costs into the equation, it is probably a net negative. Thus, you have agencies and big companies trying to use MT as justification to fix sub-standard output for lower fees. Post-editing by human translators also helps to improve these systems over time, which is value that will accrue to the companies that own the systems.

    https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/

    Like

  2. Thank you for you comment, TC, and the link to the article.

    It made me feel much better about my own carbon footprint.

    Like

  3. And yet more and more translators feel the quake of MT.

    One translator recently reported at Honyaku group:

    “I’ve been in this industry [Japanese>English translation] as a full-time professional for a little less than a decade, but my workload has definitely dropped off in the last year, for the first time ever. Every other year was quite fat and satisfying — but not this year. I’ve averaged about $1000 USD less per month than I’m used to.”

    Like

  4. That was probably due to a number of causes. It happens sooner or later to everyone in any type of business. He needs to figure out what these causes are and come up with a good plan.

    Like

  5. I do J-E translation as well and noticed a drop in work during the first half of 2019. However, as most of my customers are in technical, industrial fields, I believe that the major culprit is not MT but the trade war between the US and China. Industrial and technology exports from Japan to China are down around 15% annually, and those exporters are hitting the brakes on spending as a result. I have heard this from both direct and agency customers. To compensate, I had to pivot and market to customers in other non export-oriented sectors. It was a lot of work, but I was able to close the gap.

    In my experience, companies that are dabbling with MT tend to be price-sensitive (cheap) and don’t care much about quality. They are not very good clients to begin with. Having seen the actual output of some of the latest MT products, I predict that end clients will not be happy with the results. JAT recently surveyed their members on how they use MT in their work. As one person put it, “Like porcupines having sex, slowly and carefully.”

    Like

  6. Ha, ha, ha.

    I love it

    Like

  7. Just tracked down the Honyaku thread (Google groups) mentioned above. It gets into the weeds a bit, but another poster also mentioned the global economic slowdown as a key factor. They also discuss the Japanese company Rozetta and their MT services (https://www.rozetta.jp/department/). One poster mentioned the use of MT for product data sheets in Europe. Certainly, MT will likely be a useful application for material that has constrained, predictable input. However, you can already do that with CAT tools by leveraging past work without incurring all of the energy costs of a machine learning system.

    Speak of the devil, just got a random inquiry from an occasional client that has started offering an MT service. They are short-staffed over the holidays and looking for people to do “post-editing” at basically arubaito level hourly rates. I politely told them I don’t provide services under those conditions.

    Like

  8. I think that another factor in the J to E technical translation is a more competitive environment. There are many agencies advertising very cheap rates on internet, something like 10 cents per English word. I imagine that the actual translators working for them must be living in countries with a low cost of living – not in US, Canada or Japan, and that the quality of the translations is probably not very good.

    Last week I was asked by an old client to bid on translating two patents to English, one in German, about 6,000 words, and one in Japanese, about 27,000 words in English.

    I got the German patent, I am just finishing it, but I expect that the Japanese job will go to a cheaper bid because the client let me know that they are seeking a second bid.

    It’s fine with me, I want some work, but not too much of it.

    Like

  9. “It actually annoys me when the European Patent Office, Japanese Patent Office or German Patent Offices website update their web pages, because I have to learn yet again new tricks to quickly achieve the same searching results that I have been used to having in a few seconds for a long time.”

    Exactly. I swear the new version takes longer, and more clicks, to navigate than the old version, and I have no real idea what I’m doing with it yet. I regularly download English-language versions of a patent – if such exist – for use off-line in case I need a “second opinion” on what a term should be in English, or to know how another translator interpreted something (with the usual proviso that the US or UK version, at least, may well have been reworked out of all recognition and may not therefore contain the troublesome passage), but sometimes now when I try and open the saved version all I get is a blank page. I haven’t as yet worked out what I’m doing wrong.

    Like

  10. “Machine translation software can only find similar texts that have already been translated by a good human translator and insert them into the machine translation output at lightening speed. That is why the translation may look very good.”

    I suspect it may do rather more than that, but selecting texts by more than one human translator might well explain why terminology is inconsistent throughout the text, if the translators have translated a term differently.

    Like

  11. Probably. But why can’t they program the software so that the same part would be always translated with the same term, especially if it is numbered?

    Like

    • Perhaps there isn’t any “they”? Perhaps it would cost too much to have human intervention at that point rather than at the post-editing stage? I mean, the problem (insofar as it is a problem) with using translation memory is that you have to record every possible variant with its translation separately. So if you have an “XXX-element” you have to make separate entries for “XXX-element”, “XXX-elemente”, “XXX-elements”, “XXX-elementes” …

      Like

  12. I agree completely with what you say about the reliability of machine translations now but I am also convinced that a point will come – a tipping point? – when human and machine translations will become indistinguishable from one another. Hard to say exactly when that will be, though. As usual, great music choices!

    Like

  13. Yes, of course, nothing lasts forever, it may happen one day … probably before the Sun explodes when it has depleted its oxygen fuel and the final supernova extinguishes all life in our galaxy and beyond.

    Scientists say it may happen in about 50 billion years, but they don’t know, they just picked a number that looks plausible enough based on their necessarily uninformed, blind calculations, not unlike prophets of MT technology who have been saying that in a decade or two at the most there will be no need for human translators for more than half century now.

    Like

  14. “… not unlike prophets of MT technology who have been saying that in a decade or two at the most there will be no need for human translators for more than half century now.”

    You had been saying translators will be needed for centuries until around 2014 when you said about 20 more years.

    Also, less work for JE translators has nothing to do with the U.S. – China trade squabble. China is still growing at 6% a year and the U.S. at a healthy 2.3%.

    Like

  15. My ego rejoices that somebody remembers what I may have said 6 or 16 years ago … although I think you mostly misremember and misquote what I said, on purpose.

    The modern term for this is ghosting, and I thought that only my ex-wife did that.

    America used to have a very healthy economy. But if it is true that more than 50 percent of Americans now can’t come up with 500 bucks to cover an emergency and must put something like that on credit card, it’s not a healthy economy.

    And I’m pretty sure it is true.

    Liked by 1 person

  16. No, I quoted you correctly:

    2000: “All I can say is, good luck, Mr. Kurzweil, and more power to you! Thanks, among other things, to your superior machines whose intelligence will presumably soon exceed yours and mine, we human translators can look forward to a booming business in the exciting field of human translation for a few more centuries!”

    In 2013 you wrote that translation would become machine translation and editing in 20 years but that you would be retired so weren’t worried. That is too difficult to find, but I remember it.

    Like

  17. No, I did not say that. I said that even if translation would be mostly about editing of MT, I would be retried by then, so it does not really matter to me.

    You twisting my words again.

    Like

  18. «While obvious mistakes can be easily fixed by a human ‘post-processor’ with so called ‘post processing of machine translations’, a technique enthusiastically embraced by some translation agencies as a time and money saving technique, the truth is that an extremely time consuming process, in fact so much so that proper “post processing” would take longer, often much longer, than a complete retranslation from scratch.»

    I have to concur after what I experienced the other day:
    I was able to pull 1600 words per hour in a very familiar field the other day when translating from scratch. The number surprised me, because 1200-1600 words per hour is roughly my speed when post-processing nearly perfect MT provided by the same client!

    Sure, in some cases super-custom and optimized neural MT engines help me because the MT is informed by millions upon millions of TM segments. Such an engine also helps improve consistency across a team. It’s not bad when done right. And it helps the poor and inexperienced translators deliver slightly more predictable quality.

    On the whole though, large corporations would have been better off securing the best translators and paying them well enough to keep them on board for the rest of their lives. With such an arrangement, they could have achieved a truly high level of translation quality. But instead of pursuing this path, they tend to outsource to the lowest bidders, which results in very unstable quality and a long-term reduction in quality expectations across the board.

    Like

    • If the “Like” option were open to me, I’d Like this post, but as it is, I’ll have to actually post my Liking of it.

      Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Categories

%d bloggers like this: