Tag Archives: translation

The Translation Problem: People vs. Computers

In my last post, I introduced the topic of natural language processing and discussed the issue of how the context of a piece of language has an enormous impact on its translation into another language.  In this post, I want to address issue with translation.  Specifically, I want to talk how language is really an integrated function of the way the human brain models the world, and why this might make it difficult to create a machine translator isolated from the rest of an artificial intelligence.

When a human uses language they are expressing things that are based upon an integrated model of the universe in which they live.  There is a linguistic model in their brain that divides up their concept of the world into ideas representable by words.  For example, let’s look at the word “pit bull”.  (It’s written with two words, but as a compound word, it functions as a single noun.)  Pit bull is a generic term for a group of terrier dog breeds.  Terriers are dogs.  Dogs are mammals.  Mammals are animals.  This relationship is called a hypernym/hyponym relationship.  All content words(nouns/verbs/adjectives) are part of a hierarchical tree of hypo-/hyper-nym relationships.

So when you talk about a pit bull, you’re invoking the tree to which it belongs, and anything you say about a pit bull will trigger the conversational participants’ knowledge and feelings about not only pit bulls, but all the other members of the tree to which it belongs.  It would be fairly trivial programming-wise, although possibly quite tedious data-entry-wise to create a hypo-/hyper-nym tree for the couple-hundred-thousand or so words that make up the core vocabulary of English.  But to codify the various associations to all those words would be a lot more difficult.  Such a tree would be a step towards creating both a world-model and knowledge-base, aspects of artificial intelligence not explicitly related to the problem of machine translation.  That’s because humans use their whole brain when they use language, and so by default, they use more than just a bare set of grammar rules when parsing language and translating between one language and another.

One use of such a tree and its associations would be to distinguish between homographs or homonyms.  For example, if the computer sees a word it knows is associated with animals, it could work through the hypernym tree to see if “animal” is a hypernym or association with say, the word horse.  Or, if it sees the word “grain”, it could run through the trees of other words to see if they are farming/crop related or wood-related.  Or, perhaps, crossing language boundaries, if a language has one word that covers all senses of “ride”, and the other language distinguishes between riding in a car, or riding a horse, the program could use the trees to search for horse- or car-related words that might let it make a best guess one which verb is appropriate in a given context.

The long and short of the case I intend to make is that a true and accurate translation program cannot be written without taking enormous steps down the path of artificial intelligence.  A purely rule-based system, no matter how many epicycles are added to it, cannot be entirely accurate, because even a human being with native fluency in both languages and extensive knowledge and experience of translating cannot be entirely accurate.  Language is too malleable and allows too many equivalent forms to always allow for a single definitive translation of anything reasonably complex, and this is why it is necessary to make value judgements based on extra-linguistic data, which can only be comprehensively modeled by using techniques beyond pure grammatical rules.


In the next post, I’ll talk about statistical methods of machine translation, and hopefully I’ll be following that up with a critique and analysis of the spec fic concept of a universal translator.


Tags: , , , , , ,

Linguistics and SFF: Connotations and the Failures of Dictionary Definitions

Last time on Linguistics and SFF: Shadow and Bone and the Russian Language

If you’ve been reading my posts on linguistic appropriation and foreign languages in SFF, you might have noticed that a great deal of the problem comes from the misuse or misunderstanding of foreign words and their meanings.  There’s a fairly simple reason for this, and it’s something that online machine translation efforts have greatly contributed to:

Words have a denotation, the literal meaning of the word: “cat” is a four-legged animal of the genus Felis.  Further, words have a set of connotations, the set of cultural or emotional associations that are connected to the word.  Cats are often considered solitary, imperious, curious, etc.  Connotations are what lend words to the various metaphors (and more generally, all forms of analogy) found in human language.  Then, many words have cultural or historical baggage.  Finally, there is idiomatic language, which provides another layer of meaning to word.

How does this relate to linguistic appropriation and translation failure?  Dictionaries, the most commonly available resource for learning the meaning of foreign words most often include only the denotation of a word, and very occasionally limited historical, idiomatic, or connotational information: ethnic slurs in a dictionary may contain a note that the word is a pejorative.

And, if that isn’t enough, there’s etymology.  This could be construed as a historical association,but I think it has enough relevance to be treated as its own category.  Etymology is the linguistic history of a word: when it first entered the language, the language it came from, changes it has undergone even while part of the current language.  All of these things are relevant.  Consider English.  Words of a Latin or Greek source are often considered more sophisticated than words of Germanic origin.  People who speak mostly in Latin or Greek roots are often considered elitist or snobbish as opposed to those who speak with Germanic roots and grammar, who can be considered un-intelligent, or homey.  This is an association that all native speakers of the language make.  But it may not be immediately obvious to a non-native speaker.

Similarly, Japanese has a strong Chinese influence, especially in literature or religion, for example.  Japanese words with Chinese origins are used and perceived differently, than those of native Japanese origin.  Many languages have similar dichotomies.  Loan words are perceived differently, and a non-native speaker may not know which words are loan words, and whether their source language gives them negative or positive associations.

In order to use a language that is not your own effectively and respectfully, you have to be aware of all of these things.  What may seem like a perfectly reasonable translation may shock or offend a native speaker.  One of the things a writer has to accept when using a language that is not their own is that they will mess up.  They’ll miss something.  It’s inevitable when you consider everything that goes into choosing even a single word in the mind of a native speaker.  But you can do things to lessen the chance of such an occurrence, even before you consult a native speaker.

What you cannot do is attempt to include a real foreign language in your story just by consulting an online bilingual dictionary.

And of course the same issues apply with any historical figure, or pop culture icon, or myth, setting.

Next time on Linguistics and SFF: Why Non-English Words?


Tags: , , , , ,

Linguistics and SFF: Stormdancer and the Translation Convention

Back for Part 2 of my linguistic analysis of Stormdancer by Jay Kristoff.  Last time, I talked about Stormdancer and the Japanese Language.  In this post, I”m going to address the issues Kristoff had with something called the translation convention.  The translation convention is an idea in SFF that when a story is set somewhere else than Earth in an English speaking country, the characters in their own reality are not speaking English, and the story is merely being “translated” into English from whatever language it’s really written in.

There are two main versions of the convention:

1.  The book itself is just an English translation of the original.

2.  The dialogue in the book and the characters’ thoughts are in translation; the general prose is not.

For the purposes of this post, we will make the second assumption, since the use of scattered Japanese makes a strong argument that the characters in the novel have Japanese as a native language.

Now, in order to adhere to this convention, the author must treat dialogue and interior monologue as if it were written in the language.  For example, if a character thinks about the words they are speaking–or would like to speak, in many cases–the description of those words must treat them as they exist in the native language of the characters, not in English or whatever language the novel is actually written in.  No language is a complete word for word cypher for any other, so things such as syllable counts or word counts are going to be different, as is the shaping of the sounds of the word in the mouth.

Kristoff provides us with several examples–or mistakes–of this type to analyze:

1.  Goodreads user Cyna points out several of these mistakes.  First, the word “impure” in this sentence:

“‘Impure.’ Yukiko whispered the word […] It was such a simple thing; two syllables, the press of her lips together, one on another, tongue rolling over her teeth.”

As Cyna notes, the Japanese for “impure” is “fuketsu”.  Now, that’s almost irrelevant here, since in neither word is there any sort of “tongue rolling”.  But if we do consider it, then we realize that the Japanese word would have three syllables, not two, and there is no “press of her lips together”.  Such a sound is called a “bilabial”.  In the English we have both the nasal bilabial stop /m/, and the oral bilabial stop /p/.  Neither of these are in Japanese.  The closest we come is /f/, which in English is a labiodental fricative, meaning that the bottom teeth approach the upper lip.

The prime reason I’m citing this is because it violates our translation convention in that it discusses (and inaccurately) the English word “impure” which is not what Yukiko would be using, which should be Japanese.

2.  The second example citied by Cyna is “arashi no ko”, literally translatable as “storm child” or “child of storm(s)”.  The griffin “Buruu” asks her what it means.  Now, since it’s pure Japanese, he ought to know, or how else have they been understanding each other all this time?  It’s nice of Kristoff to give use a translation, but it could have been done much more simply, and without violating the translation convention.

3.  Cyna’s thrd example is this passage:

‘”I lo-”
She kissed him, stood on her tiptoes and threw her arms around his neck and crushed her lips to his before he could finish the sentence. She didn’t want to listen to those three awful words, feel them open her up to the bone and see what the lies had done to her insides.’

The words obviously being “I love you.”  However, that’s English, not Japanese as she should be speaking.  Japanese has several ways to say “I love you.”  “Aishiteru”, which is considered very strong and rarely used.  “Suki desu”, or more commonly “suki”, which literally translates to “Like [something].”  Or, if you want to push it to three words, “Kimi ga suki [desu].”  Kristoff gets lucky here, since you could argue he meant the last example, although his use of Japanese previous suggests it was un-intentional.  Here we consider something interesting: a set-phrase.  This is a phrase with a culturally legislated and rigidly formed phrase, generally with a ritual meaning.  They’re generally used for greetings or apologies or any other very common act of speech.  IN English, the culturally weighted phrase is the three-word “I love you.”   You can see it in the extremely common mention of “those three little words”, which even Kristoff references in the scene.  In Japanese, especially the anime-style register that Kristoff is drawing his material from, the set phrase is not “kimi ga suki”, or “anata ga suki”.  Kristoff is clearly referring to the English set phrase.  So again, he breaks the convention.

There are some other examples, I’m sure.

The reason I wrote a whole post focusing on the translation convention of SFF is that it highlights the mindset behind this mis-use of language.  It’s not intentional, or aimed at helping readers.  It’s just ignorance, and it makes you wonder what’s the point of the other language.  Is it just for cheap exoticism?  Is the author just ignorant?  They’re certainly not achieving the goal of immersing the reader in another culture, or another mindset.  There are so many wonderful ways we can use language in fiction, and I include the ever popular entertainment in this.  When you write a book about another culture and another language, there needs to be more there than just making things seem foreign, especially considering anyone with a basic idea of how Japanese language and culture works (and the number of those people is growing, due to the spread of Japanese soft power propagated by manga and JDramas and all of that) isn’t going to find this book foreign but rather fake.

Next time on Linguistics and SFF: Shadow and Bone and the Russian Language


Posted by on July 13, 2013 in Cultural Appropriation, Linguistics


Tags: , , , ,