RSS

Tag Archives: Speculative Reality

Machine “Translation” and What Words Mean in Context

One of the biggest commonly known flaws of mahcine translation is a computer’s inability to understand differing meaning in context.  After all, a machine doesn’t know what a “horse” is.  It knows that “caballo” has (roughly) the same meaning in Spanish as “horse” does in English.  But it doesn’t know what that meaning is.

And it certainly doesn’t know what it means when we say that someone has a “horse-face”(/”face like a horse”).

 

But humans can misunderstand meaning in context, too.  For example, if you don’t know how “machine translation” works, you’d think that machines could actually translate or produce translations.  You would be wrong.  What a human does to produce a translation is not the same as what a machine does to produce a “translation”.  That’s why machine and human translators make different mistakes when trying to render the original meaning in the new language.

 

A human brain converts words from the source language into meaning and the meaning back into words in the target language.  A computer converts words from the source language directly to words in the target language, creating a so-called “literal” translation.  A computer would suck at translating a novel, because the figures of speech that make prose (or poetry) what they are are incomprehensible to a machine.  Machine translation programs lack the deeply associated(inter-connected) knowledge base that humans use when producing and interpreting language.

 

A more realistic machine translation(MT) program would require an information web with connections between concepts, rather than words, such that the concept of horse would be related to the concepts of leg, mane, tail, rider, etc, without any intervening linguistic connection.

Imagine a net of concepts represented as data objects.  These are connected to each other in an enormously complex web.  Then, separately, you have a net of linguistic objects, such as words and grammatical patterns, which are overlaid on the concept net, and interconnected.  The objects representing the words for “horse” and “mane” would not have a connection, but the objects representing the concept of meaning underlying these words would have, perhaps, a “has-a” connection, also represented by a connection or “association” object.

In order to translate between languages like a human would, you need your program to have an approximation of human understanding.  A famous study suggested that in the brain of a human who knows about Lindsay Lohan, there’s an actual “Lindsay” neuron, which lights up whenever you think about Lindsay Lohan.  It’s probably lighting up right now as you read this post.  Similarly, in our theoretical machine translation program information “database”, you have a “horse” “neuron” represented by our concept object concept that I described above.  It’s separate from our linguistic object neuron which contains the idea of the word group “Lindsay Lohan”, though probably connected.

Whenever you dig the concept of horse or Lindsay Lohan from your long-term memory, your brain sort of primes the concept by loading it and related concepts into short-term memory, so your “rehab” neuron probably fires pretty soon after your Lindsay neuron.  Similarly, our translation program doesn’t keep it’s whole data-set in RAM constatnly, but loads it from whatever our storage medium is, based on what’s connected to our currently loaded portion of the web.

Current MT programs don’t translate like humans do.  No matter what tricks or algorithms they use, it’s all based on manipulating sequences of letters and basically doing math based on a set of equivalences such as “caballo” = “horse”.  Whether they do statistical analysis on corpuses of previously-translated phrases and sentences like Google Translate to find the most likely translation, or a straight0forward dictionary look-up one word at a time, they don’t understand what the text they are matching means in either language, and that’s why current approaches will never be able to compare to a reasonably competent human translator.

It’s also why current “artificial intelligence” programs will never achieve true human-like general intelligence.  So, even your best current chatbot has to use tricks like pretending to be a Ukranian teenager with bad English skills on AIM to pass the so-called Turing test.  A side-walk artist might draw a picture perfect crevasse that seems to plunge deep into the Earth below your feet.  But no matter how real it looks, your elevation isn’t going to change.  A bird can;t nest in a picture of tree, no matter how realistically depicted.

Calling what Google Translate does, or any machine “translation” program does translation has to be viewed in context, or else it’s quite misleading.  Language functions properly only in the proper context, and that’s something statistical approaches to machine translation will never be able to imitate, no matter how many billions of they spend on hardware or algorithm development.  Could you eventually get them to where they can probably usually mostly communicate the gist of a short newspaper article?  Sure.  Will you be able to engage live in witty reparte with your mutually-language exclusive acquaintance over Skype?  Probably not.  Not with the kind of system we have now.

Those crude, our theoretical program with knowledge web described above might take us a step closer, but even if we could perfect and polish it, we’re still a long way from truly useful translation or AI software.  After all, we don;t even understand how we do these things ourselves.  How could we create an artificial version when the natural one still eludes our grasp?

Advertisements
 

Tags: , , , , , , ,