Rob and Ted,
People seem to expect more from NLP systems that humans could ever
deliver. To illustrate, anything greater than a 1% error is disastrous
when translating technical papers, and even humans have several times
that rate on isolated sentences. Hence, technical papers must generally
be translated by someone who understands the material, rather than by
someone who only knows both languages.
Of course, the problem is in requiring more than can be done rather than
in not doing a good enough job, though people are quick to accept the
responsibility and seek to improve their software because of
imperfections they see, when no amount of improvement would ever make it
better than is theoretically possible.
For a simple exercise, try to restate a given sentence while exactly
preserving ALL of its meaning and ambiguity. This usually isn't
possible. How then can we expect to restate a sentence in a foreign
language and preserve all of its meaning and ambiguity when you must
work with words having different meanings? In short, in any given
language, only a tiny fraction of all possible meanings can even be
stated! English, like all other languages, has HUGE gaping lexical
holes, e.g. the various forms of "you" that exist in so many other
languages like German and Russian but not in English. When translating
to these languages, you usually MUST indicate the level of familiarity
between the speaker and the listener as this is part of the basic syntax
of the languages, yet English provides no way to convey this
information. This alone makes exactly translating many/most statements
between English and other languages quite impossible.
I have had long discussions about this with various linguists. The ONLY
potential approach that has emerged is to include various foreign words
as necessary and footnote their precise meaning, which can sometimes be
pretty long, especially things like Yiddish and Japanese terms that
refer to personalities or characters in fables. This technique is pretty
common in translations.
The more precise the translation, the more footnoted words would be
needed. Of course, much/most of the subtlties of meaning and ambiguity
that would be preserved are either unintentional or unavoidable (due to
limitations in the source lexicon), and footnoting would only draw
attention to the flaws rather than the meaning.
For example, when going from German or Russian to English, These source
languages necessarily indicate familiarity and could be translated to
the unstated familiarity "you", but meaning would be lost in the
process. Sometimes this wouldn't matter, and sometimes it would.