Machine translation is incredibly difficult. And to prove that, I will now read this introduction again, after it’s been sent through Google’s translator — currently one of the best in the world — and then translated back into English. Machine translation is very difficult. Back then translated into English – is one of the best in the world right now – it is to prove that, after being sent through Google’s translator, I’ll read this again introduced. Okay, I chose a difficult language, but each one I tried introduced subtle errors in diverse ways. Via Chinese, it had been translated by “Google hair”. Via French, the introduction became a “he”, not an “it”. And those sentences were incredibly simple. Folks who only speak one language — and I am embarrassed to say that’s a group that includes me, I’m sorry — folks who only speak one language often assume that you can open a translation dictionary, pick an appropriate word, faff around with the grammar a bit, and have a functional sentence in another language. For simple sentences, yes, that’s true: but very few sentences in the real world are that simple. Google recently released a paper about how they’d reduced machine translation to a problem in vector space mathematics, representations of concepts in an abstract language space. Which is great for mapping concepts to words, and it’ll even deal well with homographs, identical words that mean completely different things. You can deal with those through context: the days of “hydraulic ram” being translated as “water sheep” are pretty much in the past. Spot the engineer. For formal, technical documents, it might even start to work well. But for more casual communication, it’s not so easy.
Why Computers Suck At Translation?
Heck, translating between British English and American English isn’t always easy. Not because your car’s “hood” is our “bonnet”, but because “that’s a brave idea” isn’t a compliment in British English, it means you’re a prat and your idea is impossible. There are concepts which don’t quite match between languages. “Bonne nuit” might literally mean the same as “buenas noches” I’m sorry about my pronunciation there but one is meant for saying goodnight at bedtime and the other’s for saying hello or goodbye at any point after dark. Then you have the concepts that don’t translate between languages at all. In French, “you” translates as “vous” if it’s someone you should be respectful towards, and “tu” if it’s a more casual conversation. Or if you’re talking to God. No, really. God is “tu”. A computer will crush both of those to “you” when translating to other languages, and it won’t have any idea which of them to use when translating into French. And that is just a simple “honorifics” system. Korean has a much more complicated set of pronouns for all sorts of situations. Remember this? That repeated line: oppan Gangnam style. The English translation of “oppa” is usually “a woman’s older brother”: but in everyday speech, “oppa” is used to refer to someone based on a series of complicated and fuzzy rules that make instinctive sense to native speakers. To make it worse, PSY is referring to himself in the third person there, which sounds weird when translated out of Korean.
Google Machine Translation
There is no way to translate all the meaning in those words into one English sentence. Then you have the problem of shared expectations. English-speaking cultures tend to be monochronic: if you make an appointment to meet someone at 11am, you are expected to be there at about 11am. I mean, groups of friends can often get around this — “the party starts at 6” often means people will turn up anywhere from 6:30 to 9. But imagine if that lack of punctuality, and that acceptance of a lack of punctuality, expanded to all aspects of everyday life. Welcome to the rest of the world.
Massive parts of this planet run on what is called polychronic time. Two appointments at the same time? That’s fine, they’ll understand. And they will understand. Needless to say, there is often quite significant culture clash when monochronic and polychronic people meet. But a machine translation isn’t going to see an English sentence like “I’ll meet you at 7pm” and add a note for someone in a polychronic culture that, no, they really do mean 7pm, and they’re going to be annoyed if you’re late. Ultimately, to accurately translate something, you don’t just need to know how words map to concepts: you need to understand social structures, subtext, nuance, innuendo. You need at least a basic theory of mind: the idea that the speaker and the listener both have beliefs and desires expressed by the particulars words they’ve chosen. Translators need to be able to ask questions of the original author, so you can check that the subtleties that you have to add to their work reflect their intention. The problem isn’t that language is messy computers can cope with messy, heck, they can pretty much solve CAPTCHAs better than humans these days. The problem is that language relies on intent, on shared secrets, on group identity, and on hidden knowledge. Machine translation is a useful tool, don’t get me wrong, but trying to get a machine to translate better than a human is… a brave idea.