Every time I read another article about machine translation, I have the same feeling of futility that I experience also when I listen to John Lennon sing “A Day in a Life” (I read the news today, oh, boy, the English army has just won the war …..).
All of these articles have interesting anecdotes about unlucky users of MT who perhaps expected too much, and many of them start with one such anecdote as did this article by Konstantin Kakaes in Washington Post. His blog has the following introduction: I’m a freelance journalist. I have many interests, though these days my focus is writing about nuclear proliferation, science, technology and the world. I also write about Latin America. It does not say anything about what he knows about foreign languages and linguistics. Perhaps he knows some Spanish. That would be an improvement.
All of these articles in what I call corporate media, because I think that is the proper term for it, are written by journalists who have various backgrounds, often impressive ones. But none of them has a background in languages and linguistics. These journalists then interview users and developers of machine translation, but they are not interested in talking to people who translate for a living. I was once interviewed by phone on this subject by a journalist in Canada about 8 years ago, but that is the only exception to the rule that I can think of. Apparently, human translators have nothing of importance to say about machine translation.
The recent crop of articles tells the readers that everything changed with the advent of Google Translate, which is now here, as Kakaes puts it, “to remove humans from equation”. Right. Let’s use mathematics to find the magic algorithm that will eliminate the need for human brain. Kakaes says among other things in his article that the late Frederick Jelinek (which is a Czech name), who pioneered work on speech recognition at IBM in the 1970s, is widely quoted as saying: “Every time I fire a linguist, my translation improves.” I suspect the late Frederick Jelinek was firing linguists because they were telling him 40 years ago something that he did not want to hear, namely the same thing that I am saying in this blog.
“I am sorry, but machine translation will never work, Herr Jelinek.”
“Vhat did you say? You are fired, you damn ingrate.”
It does not take a genius to figure out that the statistical approach pioneered by Google will not really work either. The way the commercial propaganda machine writes and talks about it, real machine translation that will get rid of people like me and the readers of this blog is just around the corner. It has been just around the corner for quite a few decades now.
But machine translation is not really translation at all, although it may look like one, and never will be because no algorithm will obviate the need for the concept of meaning in translation. Trillions of words in a database are really nothing more than a huge haystack hiding what is missing and always will be missing in machine translation – the meaning of the words. You need a human to make sense out of things. Google is such a great engine because thousands of very smart human programmers analyze and update links to information every second of every hour of every day. If Google stopped doing that, it would be out of business within a few weeks.
But you can’t really put everything that humans say now and will say in the future into a huge database that could be used for machine translation by Google Translate in the same way that Google the search engine can be used. People who got used to the miracle of Google the search engine naturally expect this to happen one day soon with Google Translate. Only it never will. We are all unique. We all say things that nobody else has ever said and possibly never will …. we don’t do it all that often, but we all do it. The human brain is not a database. I don’t know what it is that makes it work the way it does, nobody really knows, but I do know that it’s not a database that can be updated just like a search engine. Because language is what it is, the most likely equivalent to a sentence in another language can be correct … or completely incorrect. Probability is not a replacement for meaning. And machine translation will never break the barrier of meaning.
30 years ago, you had to pre-edit and post-edit every machine-translated sentence, otherwise it would make no sense. In the second decade of the 21st century, you still have to pre-edit and post-edit everything, except for really simple sentences, see the example of an international lawyer who uses MT for translation from English to Chinese in the Washington Post article. But when you make a conscious effort to use only short and simple sentences, this is really pre-editing too.
Machine translation is getting better at simple tasks like this, and the statistical approach may be more instrumental than linguistic analysis. I don’t really know that much about it as I prefer to spend most of my time doing the real thing …. translating.
This blog post is too long, I am tired and I am going to finish it now. But I would like to pose a question here.
Do you think that it is possible to create software, similar to machine translation software, that would write steamy romance novels that women would actually be buying and reading?
And if not, why not?
It’s just stupid words on a page. Just like a translation.