In my last post, I introduced the topic of natural language processing and discussed the issue of how the context of a piece of language has an enormous impact on its translation into another language. In this post, I want to address issue with translation. Specifically, I want to talk how language is really an integrated function of the way the human brain models the world, and why this might make it difficult to create a machine translator isolated from the rest of an artificial intelligence.
When a human uses language they are expressing things that are based upon an integrated model of the universe in which they live. There is a linguistic model in their brain that divides up their concept of the world into ideas representable by words. For example, let’s look at the word “pit bull”. (It’s written with two words, but as a compound word, it functions as a single noun.) Pit bull is a generic term for a group of terrier dog breeds. Terriers are dogs. Dogs are mammals. Mammals are animals. This relationship is called a hypernym/hyponym relationship. All content words(nouns/verbs/adjectives) are part of a hierarchical tree of hypo-/hyper-nym relationships.
So when you talk about a pit bull, you’re invoking the tree to which it belongs, and anything you say about a pit bull will trigger the conversational participants’ knowledge and feelings about not only pit bulls, but all the other members of the tree to which it belongs. It would be fairly trivial programming-wise, although possibly quite tedious data-entry-wise to create a hypo-/hyper-nym tree for the couple-hundred-thousand or so words that make up the core vocabulary of English. But to codify the various associations to all those words would be a lot more difficult. Such a tree would be a step towards creating both a world-model and knowledge-base, aspects of artificial intelligence not explicitly related to the problem of machine translation. That’s because humans use their whole brain when they use language, and so by default, they use more than just a bare set of grammar rules when parsing language and translating between one language and another.
One use of such a tree and its associations would be to distinguish between homographs or homonyms. For example, if the computer sees a word it knows is associated with animals, it could work through the hypernym tree to see if “animal” is a hypernym or association with say, the word horse. Or, if it sees the word “grain”, it could run through the trees of other words to see if they are farming/crop related or wood-related. Or, perhaps, crossing language boundaries, if a language has one word that covers all senses of “ride”, and the other language distinguishes between riding in a car, or riding a horse, the program could use the trees to search for horse- or car-related words that might let it make a best guess one which verb is appropriate in a given context.
The long and short of the case I intend to make is that a true and accurate translation program cannot be written without taking enormous steps down the path of artificial intelligence. A purely rule-based system, no matter how many epicycles are added to it, cannot be entirely accurate, because even a human being with native fluency in both languages and extensive knowledge and experience of translating cannot be entirely accurate. Language is too malleable and allows too many equivalent forms to always allow for a single definitive translation of anything reasonably complex, and this is why it is necessary to make value judgements based on extra-linguistic data, which can only be comprehensively modeled by using techniques beyond pure grammatical rules.
In the next post, I’ll talk about statistical methods of machine translation, and hopefully I’ll be following that up with a critique and analysis of the spec fic concept of a universal translator.