Is there any english grammar for english parser.

Is there any english grammar for english parser.

Post by noesi » Mon, 05 Sep 2005 22:53:15


hi all,

i am a new comer of NLP. i am doing a research in language translation
(from english to myanmar language). The firsh step of my thesis is to
implement an english language parser to parse input english language
into its AST form. Is there any english grammar rules to use in english
language parser(compiler)?

thanks in advance.

sincerely,

noesis
 
 
 

Is there any english grammar for english parser.

Post by Ted Dunnin » Wed, 07 Sep 2005 09:14:46


AST (augmented syntax trees) sounds like you are working from some
/very/ old literature. I haven't heard about them since the 80's.

Your thesis is going to be much harder than it sounds like you are
ready for. Building a parser for any significant piece of English that
produces suitable output for translation into a very different language
like Myanmar is a very large task.

Most modern language translation efforts use a combination of
relatively simple parsing together with various statistical insights
obtained by examination of (preferably) parallel corpora.

It also is important to decide what the goals of your translation
system are. If you are trying to build a system that helps human
translators, then you probably be best off just building a good
bi-lingual text retrieval system that can serve as a translation
memory.

Otherwise, I recommend you read more of the literature on recent
translation systems.

Good luck!

 
 
 

Is there any english grammar for english parser.

Post by areza » Sun, 18 Sep 2005 20:45:10

Dear Ted Dunning
Would you please help me to find some article about new methods in
Machine Translation? as you mentioned, SMT is the latest method. I
found many article in Google but I don't know what is the best project.
I'm gana trying to prepare a proposal for Machine Translation with new
techniques.

Regards
Ebadat A.R.
 
 
 

Is there any english grammar for english parser.

Post by Ted Dunnin » Tue, 20 Sep 2005 11:57:30


I have been out of the field for several years so I can't really point
you at the latest other than to say that pure techniques (either
statistical or traditional symbolic) will not give you the best
performance.

As far as I can tell from a distance, the real state of the art is in
the nascent techniques for producing statistical translators using
large amounts non-parallel texts and very limited amounts (possibly
none) of parallel text.

That said, you should be aware that building any kind of translation
system is an enormous amount of work unless you are will to accept
really cheesy (as in unusable) results.

Sorry I can't be of more assistance. Perhaps there is somebody who is
currently engaged in the field who could help with a pointer to a
survey article or two.
 
 
 

Is there any english grammar for english parser.

Post by Ted Dunnin » Tue, 20 Sep 2005 12:01:09


As an additional note, if you are trying to find a good project for a
class, then you would be hard pressed to find a better topic than to
try to replicate some of the Mercer and Brown results from the
early/mid 90's. Their work is the basis (or at least inspiration) for
almost all statistical machine translation that has been done since
then.

If you could replicate even a little of what they did, it would be very
impressive as a class project. What would make it possible at all is
the fact that computers are soooo much easier to work with now than 15
years ago.

Good luck!
 
 
 

Is there any english grammar for english parser.

Post by Shlomo Arg » Tue, 20 Sep 2005 12:15:56


There are several good papers in this past year's (2005) Proceedings of
the Association for Computational Linguistics (ACL). I believe you can
order a copy on-line (just google for "ACL").

Good luck!
-Shlomo-
 
 
 

Is there any english grammar for english parser.

Post by Rob Freema » Fri, 23 Sep 2005 20:31:26


Shlomo,

Do you know of any papers which consider my particular hobby horse of
"lack of generality" for statistical analyses of language?

That is to say any studies which suggest statistical generalizations
about language are necessarily multiple and inconsistent.

Best,

Rob Freeman
 
 
 

Is there any english grammar for english parser.

Post by Ted Dunnin » Sun, 25 Sep 2005 01:21:22


Rob said:

Rob,

Care to clarify this point?

You can definitely build statistical systems with limited
generalization capability. You also have to deal with some levels of
doubt and ambiguity in any NLP program (this might be the multiple
complaint you have).

But nothing in the method implies that statistical methods cannot
generalize. In fact, all of the useful ones have methods for
generalization. In many cases, performance (and lack of same) is
similar to human performance. For instance, if you have a language
model for text, it is possible to pose an estimate of the
characteristics of an novel English word based on what you have seen
previously and the morphological characteristics of the word. For
instance, if you have a large language model already, then you know
that any word that you see that is not in the language model has the
following characteristics with reasonably high probability:

a) it is relatively rare.
b) it is not a closed class word.
c) if it ends in certain patterns, you can guess a class (-ation =>
noun, ize => verb, -ly => adverb, [capital]- => proper noun if not a
sentence beginning)
d) otherwise it is likely to be a noun

Of course a real system is rather more involved than this, but these
are reasonable examples of how effective generalization can be built
into a statistical NLP system.

So, can you clarify what you are saying? In particular, can you say
why inconsistency and ambiguity in a statistical system is any
different from the pretty inevitable inconsistency and ambiguity in a
non-statistical system? Or even in human interpretation of language?
 
 
 

Is there any english grammar for english parser.

Post by Steve Rich » Sun, 25 Sep 2005 14:16:24

Rob and Ted,


People seem to expect more from NLP systems that humans could ever
deliver. To illustrate, anything greater than a 1% error is disastrous
when translating technical papers, and even humans have several times
that rate on isolated sentences. Hence, technical papers must generally
be translated by someone who understands the material, rather than by
someone who only knows both languages.

Of course, the problem is in requiring more than can be done rather than
in not doing a good enough job, though people are quick to accept the
responsibility and seek to improve their software because of
imperfections they see, when no amount of improvement would ever make it
better than is theoretically possible.

For a simple exercise, try to restate a given sentence while exactly
preserving ALL of its meaning and ambiguity. This usually isn't
possible. How then can we expect to restate a sentence in a foreign
language and preserve all of its meaning and ambiguity when you must
work with words having different meanings? In short, in any given
language, only a tiny fraction of all possible meanings can even be
stated! English, like all other languages, has HUGE gaping lexical
holes, e.g. the various forms of "you" that exist in so many other
languages like German and Russian but not in English. When translating
to these languages, you usually MUST indicate the level of familiarity
between the speaker and the listener as this is part of the basic syntax
of the languages, yet English provides no way to convey this
information. This alone makes exactly translating many/most statements
between English and other languages quite impossible.

I have had long discussions about this with various linguists. The ONLY
potential approach that has emerged is to include various foreign words
as necessary and footnote their precise meaning, which can sometimes be
pretty long, especially things like Yiddish and Japanese terms that
refer to personalities or characters in fables. This technique is pretty
common in translations.

The more precise the translation, the more footnoted words would be
needed. Of course, much/most of the subtlties of meaning and ambiguity
that would be preserved are either unintentional or unavoidable (due to
limitations in the source lexicon), and footnoting would only draw
attention to the flaws rather than the meaning.

For example, when going from German or Russian to English, These source
languages necessarily indicate familiarity and could be translated to
the unstated familiarity "you", but meaning would be lost in the
process. Sometimes this wouldn't matter, and sometimes it would.

Steve Richfie1d
 
 
 

Is there any english grammar for english parser.

Post by Rob Freema » Mon, 26 Sep 2005 01:10:43

Hi Ted,



Different because in a statistical system that inconsistency can be
understood as natural and necessary.

My favorite example of this "natural inconsistency" is the simple
"statistical" analysis of ordering a group of people. You can order
them according to height, or IQ, but not in general both at the
same time, not past a constant.

This can be seen as support for a statistical system (one which
models language as generalizations of large numbers of examples),
but I don't know of any current statistical NLP models which take
advantage of it.

-Rob