Posted by: patenttranslator | January 22, 2017

Seen from the Trenches of Patent Translation Warfare, Is Machine Translation a Translator’s Friend, or Enemy?

A lot has been written about machine translation by “globalization, internationalization, localization and translation research and consulting firms” such as the Common Sense Advisory, which is often referred to by translators I know as the “Common Nonsense Advisory”.

Corporate translation agency blogs, if you’d like to call them that, are also full of bold prognostication and prophesying about the future of the “translation industry”. This information is skillfully or not so skillfully woven into blog posts that are for the most part just advertisements for totally awesome and incredibly cool yet moderately priced translation services that are allegedly provided by these resourceful translation agencies.

Some of the things these consulting firms and public relations specialists of translation agencies say do make sense, up to a point, but many of their conclusions, while they may be ingenious from an advertising standpoint, make no sense from a translator’s view point – namely, the person who does know a thing or two about translation, unlike people who talk and write about translation for a living, but can’t actually translate themselves.

As Yogi Berra (or possibly somebody else before him) put it: “It is difficult to make predictions, especially about the future”. So instead of predicting the future, I will briefly describe how machine translation has influenced the work of this patent translator over the last decade or so by including the good, the bad, and the ugly.

The Good

I’ll start with the good. Like many or probably most translators at this point, I am now using machine translation and multilingual online databases instead of my old dictionaries, which I used to love so much for so many years.

Because I translate from seven languages myself (although only into English at this point), I’ve amassed a great deal of general and specialized dictionaries which are now gathering dust in bookcases in my office, and in more bookcases lining the hallway walls. Sadly, nobody would probably want them anymore.

When translating a patent, I automatically print out a machine translation in English from the Japan Patent Office, European Patent Office or World Intellectual Property Office Website.

It definitely speeds up my work if I don’t have to look for words in my dictionaries, and instead just take a quick look at the machine translation printout, or run a quick search on one of the patent office websites, or Linguee, etc.

I also use Google Translate and Microsoft Translator on my computer as I would a dictionary. I do not believe that Google Translate is better than Microsoft Translator, which is something that many translators seem to believe, just like most people believe that Google is superior to all other search engines, which is probably not always true either depending on what one is looking for.

The Bad

If I look for a technical term on Google Translate, including the context, I am often presented either with a highly specialized term that clearly belongs to a completely different field, and sometimes is completely ridiculous. If I then try Microsoft Translate, I sometimes effortlessly find exactly the word that I am looking for. So I generally have both machine translation programs open on my computer when I translate.

Google Translate is much better in one respect: it tries to locate the closest patent to the one that I am translating. The translation often sounds like a very good translation done by a very good human translator … because it is a very good translation that was originally done by a very good human translator and then found and matched with a request for machine translation by Google Translate.

But if the existing closest patent translation says something that is not in the version of the patent that I am translating, Google Translate will sometimes miss that part and instead will insist that black is white, up is down, and good is bad.

Just because a sentence in a machine translation reads like a very good translation does not necessarily mean that it is a correct translation.

Changes are frequently made in patent applications, for example when a patent office in Europe, the United States, or Japan issues an opinion requesting amendments (modifications) of submitted patent applications because some features of the patent claims are too obvious (lack of inventive step), or too similar to prior art (lack of novelty), or for another reason.

However, these and other changes may not be reflected in the machine translation picked by Google Translation because Google Translate can only find an existing translation and match it with the request for a machine translation, which may be obsolete.

Statistically based machine translation clearly has its limitations because it can only match existing human translations with a request for a new machine translation if these human translations exist.

For example, when I tried to translate some of my blog posts into a different language, especially a complicated language like Japanese or Czech, they were for the most part incomprehensible.

(Does that mean that nobody writes like I do?)

If no closest translation of a patent publication is available, the method that is based on statistical probability is not very useful. Also, if I am working with old PDF copies that are not clearly legible, for example of old Japanese Utility Models famous for their terrible legibility, any machine translation engine is completely useless because after conversion from PDF, the characters are misinterpreted and translated in such a haphazard way by the software that the description of a technical design will be turned into a crazy monologue of a clearly retarded child.

The Ugly

The very ugly aspect of machine translation is mostly connected with the way the “translation industry” has been using machine translation to intimidate and put down translators by telling them that their jobs will simply be erased from existence by computers, which is what has already happened to many other professions.

Because so many important people in the “translation industry”, for example, the movers and shakers in the industry, have little or no understanding of what translation is and how it works, the industry originally planned to hire monolinguals who would be quite cheap and whose job it would be to “clean up” and “fix” machine translations. Those plans had to be scrapped very quickly.

The problem is, it is not possible to “clean up” and “fix” machine pseudo-translations without taking a good look at the original text and comparing it to the machine-produced output to see where the problems are. To fix the problems, you have to be able to read the text in the original language.

So a new concept was born in the “translation industry” about a decade ago: translators’ jobs would be eliminated and instead, people who used to translate for a living would be offered new jobs as “post-processors” of machine translations.

This concept could theoretically work. It certainly makes much more sense than using monolinguals to write complete nonsense. But even this process is based on false premises.

If customers know that the translations that they are paying for are only “post-processed” machine translations, they are likely to demand much lower prices for these types of translations because they understand that the quality of these post-processed translations, miracles of “language technology” coupled with strange processes occurring in the brain of human post-processors, is substandard.

Greedy as the industry movers and shakers are, the “translation industry” is willing to pay only ridiculously low rates for the “post-processing” bit – so far I have twice been offered 1 cent per word for doing this dirty, mind-numbing work for the industry. The dirty post-processing work is probably not going to be done by competent translators.

Most likely, the industry will employ people living in developing countries … because who else is willing and able to work for next to nothing?

The “translation industry” is salivating at the prospect of the billions of words it thinks could be translated by combining machine translation and cheap human brains. Post-processing is such a tempting concept for the industry … if only it could work!

But I think it is highly unlikely that this concept will work, at least not in my field of patent translation. Machine translations post-processed by poor, quasi-human creatures willing to do the post-processing drudgery for the industry will not be very different from non-post-processed machine translations.

In particular, they will be riddled with mistakes and mistranslations because no matter what the PR machine of the “translation industry” says, the only way to do “post-processing” the right way is to retranslate the whole thing.

And if post-processed machine translation is unreliable, why pay anything at all for this kind of translation service when machine translations are already available mostly for free?

Impact of Machine Translation on My Work

Because I have been working as an independent patent translator for 30 years, I have seen quite a few changes in my line of work during three decades.

Some languages that were in high demand for a very long time are less in demand now in the field of patent translation, for example Japanese. And some languages that were not very useful a couple of decades ago, are very much in demand now, such as Chinese and Korean.

I translate many more German patents than Japanese patents these days. Nothing stays constant forever.

Most patent applications that were available only in a foreign language can now be “translated” with a few mouse clicks with machine translation. This obviously had an impact on the number of patents translated by human translators, but mostly to the extent that translations that are not really needed are no longer being ordered.

Before machine translation became a tool that could be used by my clients to find out what is in a patent in a foreign language, they had no choice but order a translation of an entire document to find out what was in it.

Machine translation is now good enough not only to determine which documents are and are not relevant, but also which parts of documents need to be translated. Next week I will be translating only portions of several Japanese patents, as opposed to the entire documents, because a client used machine translation to identify the relevant portions for translation to save his client’s money.

Would this patent law firm have ordered more translations in the absence of machine translation tools?

It is certainly possible. But it is also possible that none of the documents that I will be translating, albeit only partially (about 60% of them), would have been discovered without machine translation.

That is why I believe that the impact of machine translation on the work of this patent translator has been mostly positive over the last three decades. I predict that it will continue to have a mostly positive impact on my work for a long time to come, namely until machine translation is so good that human translators will no longer be needed.

I predict that this will happen around the year 3,754, give or take a century or two, if our civilization is still around at that point, which, frankly, does not appear to be very likely.


Responses

  1. I think one important variable is what the client thinks of machine translation. People who are not linguistically aware will not in general understand much about it. But they do have the not unreasonable view that if they are paying for a translation they don’t want the translator to use Google Translate. However, as you describe with your patent translation, sometimes it happens – bingo – that you get a legal or technical text that is practically identical to one in the database, in which case you can get a near-perfect translation. This happened to me recently with a translation of about 120 pages. I am sure that agencies would like to find some way of not paying you in that kind of case, but fortunately they haven’t worked one out yet.

    Liked by 1 person

    • But what does “to use” mean? If it means to “post-edit”, I agree completely. That would be cheating. But if it means taking a look at the machine translation, I don’t agree.

      It’s like saying that a translation should not have access to dictionary. Translators should be free to use anything that is available, but only for their own information.

      Like

    • “But they do have the not unreasonable view that if they are paying for a translation they don’t want the translator to use Google Translate.”

      That is precisely my way of looking at things (plus the confidentiality angle, too – read the T&Cs and you’ll see that by entering a text into GT you give Google the right to do all sorts of things with it which could breach confidentiality). Ditto that if my client wants me to amend the US text rather than produce my own translation they will specifically tell me so (it does happen. As with MT, it can be that so many changes need to be made that it would be quicker to start from scratch). OTOH, if I can – from my own translation memory databases – reconstruct 70% of a new patent application in a few minutes, that’s fine, because I did the work in the first place and it’s my intellectual property.

      Like

      • When I download machine translations of patents from WIPO, EPO or JPO website, I don’t enter them into Google Translate. They must be copyright agreements between these patent offices and GT and other machine translation services (WIPO offers three different services) already in place.

        The thing is, assuming that my clients have the same machine translations available to them, which they probably do, they probably expect me to use the same technical terms if these terms are correct.

        So I am in fact more or less forced to look at the machine translations now that they are available.

        Like

  2. Hi Steve:
    I think that the widespread public availability of free patent databases (I’m thinking particularly of esp@cenet and Depatisnet) has also had a major influence on patent translation – perhaps as much as MT.
    When I started in the business in the 1970s, Chemical Abstracts generated patent families, but they were only accessible through searching masses of paper volumes of CA, and only covered chemically-related patents; and, though Derwent was not limited to chemically-related patents, it was more expensive and much less easily searchable. And even if you found an English-language equivalent to your foreign patent, it was often difficult to purchase. Experts were/expertise was needed.
    Now, anyone can go on-line with information on a patent and a list of equivalents is instantly available, with pdf images and sometimes even text, all free.
    Now if your patent was filed only in Japan, say, you are back to either MT or reading the original in Japanese; but for anything foreign filed in a reasonable number of countries, there will be an English-language version floating around somewhere, and it’s both findable and retrievable. And, the translation into English was probably done by a real person with some skill – I know that as a patent attorney I am fussy about who translates my applications for filing in Japan, or China, or wherever, and don’t buy from Translations’r’Us, since my client may want to enforce that patent someday.
    But I agree 100% that the availability of MT and MT abstracts (as the JPO used to put out) allows potential clients to make a judgment about whether a patent is worth translating, or excerpting, whereas pre-MT it was an all-or-nothing choice.

    Liked by 1 person

    • “but for anything foreign filed in a reasonable number of countries, there will be an English-language version floating around somewhere, and it’s both findable and retrievable.”

      Except you’d be surprised how often that *isn’t* the case :-). For example, I’ve just OCR’d a set of claims and found that one of the (rather large) independent claims was missed entirely, so went over to Espacenet to see if I could get at a decent electronically-readable copy, rather than retype several hundred words. As this is an applicant whose claims I translate regularly, I remembered that they tend to have a US or other version, so thought I’d look it out while I was there, as I frequently do for reference purposes, only to find that there wasn’t one this time.

      Also, if the English-language text is a US patent (application) (and possibly also a GB one?), in my experience you have to be careful, because they frequently differ quite considerably from an EP one. I believe this is because the texts tend to be significantly reworked by the attorney before they are filed, so they are frequently not an exact equivalent anyway. Plus (and I hate to say it), I think that occasionally the patent attorney, who may not have any or much knowledge of the source language, rewrites the text the wrong way, because there have been a number of occasions on which I’ve thought “Oops, sloppy translator – how could s/he have made that mistake?”, and then realise that it probably wasn’t down to the translator, but someone who worked on the text afterwards.

      Liked by 1 person

  3. But even if a very similar or almost identical translation is floating around somewhere, I would think that something like that can be used “as is” only for information purposes, but not for filing purposes, because one has to know exactly what is contained in a foreign patent that is to be filed in English, and the machine translation only matches the closes equivalent, while differences are invisible.

    Like

  4. Steve:
    If your comment is referring to mine immediately above, the equivalent can be used only for information purposes because any filing deadlines would be long past – that’s how you got the translations. But in terms of accuracy of translation, if you consider PCT applications, the general requirement is that the national phase application must be an exact translation of the PCT application – though some countries (Chile and Vietnam come to mind) let you file a translation with amendments, but these aren’t countries one would usually worry about. This wouldn’t tell you what the content of a non-PCT “equivalent”, or the priority filing itself if any, would be – those could be different.

    Liked by 1 person

  5. “It definitely speeds up my work if I don’t have to look for words in my dictionaries, and instead just take a quick look at the machine translation printout, or run a quick search on one of the patent office websites, or Linguee, etc.

    I also use Google Translate and Microsoft Translator on my computer as I would a dictionary.”

    Even quicker when you’re using a CAT tool you’ve already entered the translations into, of course 🙂

    Like

  6. The “glossaries” that some agencies send me for use with (or in my case, without) CAT tools are full of spelling errors, inaccuracies or, most frequently, terms that have absolutely nothing to do with what I am translating. Having to use their silly “glossaries” just slows me down. As for the agencies who will not pay for 100% matches or “fuzzy matches” (whatever they are, no doubt the agency decides what they are) that is appalling! For one thing, in order to produce a good translation one often needs to change the “100% match” into a different word, depending on the context. I LOVE it when agencies tell me they will not use my services because they only work with translators who use CAT tools because “our clients insist on it”. 99% of direct clients wouldn’t know a CAT from a pussy cat, any “client” of the agency who “insists” on them using CAT tools (if any do at all) are the big agencies, who have sub-contracted the work to them. I friend of mine who runs a small agency has just agreed to take on a big website translation from one of the mega-agencies, no doubt they will insist on the use of CAT tools which have ONLY one purpose — to reduce the amount paid to the translator so the agency can earn more money since the saving is NEVER passed on to the client!

    Like

  7. “99% of direct clients wouldn’t know a CAT from a pussy cat, any “client” of the agency who “insists” on them using CAT tools (if any do at all) are the big agencies, who have sub-contracted the work to them.”

    Ha, ha, ha.

    So true.

    I’ve never had a single direct client ask me whether I use CATs, not once.

    Like


Leave a comment

Categories