understanding the intuition behind LL(k) parsers and LR(k) parsers

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by Mark F » Sun, 23 Apr 2006 12:47:48


Hey guys,

Maybe you can help me visualize the basic idea behind the two
approaches to AST tree generation.

As I understand it a top-down parser..will start with the first input
symbol of the program file,check if its a terminal/nonterminal, then
it expands it..looking for the next predicted symbol. A bottom up
parser has to start with some kind of "last terminal" then build the
tree in upward direction. Intuitively where would the bottom-up parser
start if presented an input file. How does it find this starting
point, of the parse. It has to be the deepest part of the tree. But
how do you detect this, without expending the tree first from
top-bottom? Is my thinking faulted?


Thank You.
Mark F.
[Bottom up parsers use a state machine, like the one in a DFA regular
expression matcher. The states implicitly incode all of the possible
matches, and when the parser gets to a state where there's one complete
match, you reduce the rule. -John]
 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by Tom Copela » Sun, 23 Apr 2006 23:01:10


For notes on a particular implementation of a bottom up tree builder,
check out the JJTree documentation:

https://javacc.dev.java.net/doc/JJTree.html

It talks about ways to customize the node tree as it's constructed; lots
of good stuff there.

Yours,

tom

 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by Chris F Cl » Mon, 24 Apr 2006 23:00:02

Your intuition for bottom-up parsers is interesting but flawed. It is
not the exact reverse of LL; one doesn't start with the last terminal,
but rather the first.

The best "intuition" of a bottom-up parser is to imagine the finished
parse (derivation) tree, like I show in ASCII art below. Now, start
at the bottom-left corner and walk acroos the leaves (terminals).
Each time, you have walked across enough leaves, that you have found
all the leaves, for some non-terminal, you build that non-terminal,
working your way up the tree.

program
|
+------...
|
stmt
|
+---+---+----+
| | | |
A = expr ;
|
+----+----+
| | |
B * C

With the example, you start with A, then =, then B, *, and C. Once
you have reached C, you can build an expr. Then at ;, you can build a
stmt. When you have seen everything, you can build a program
non-terminal and you are done.

Note, this is vastly simplified and details have been swept under the
rug, but it does give the right intuitive feel for LR parsing.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : XXXX@XXXXX.COM
Compiler Resources, Inc. Web Site : http://www.yqcomputer.com/ ~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by Max Hailpe » Mon, 24 Apr 2006 23:01:04

Mark F." < XXXX@XXXXX.COM > writes:


Conceptually, both an LL parser and an LR parser maintain a stack of
grammar symbols, that is, a stack where each element is either a
nonterminal or a terminal. However, the meaning of that stack is
completely opposite in the two styles of parser, and so is the
combination of two operations used to update the stack.

In an LR parser, if you read the stack from bottom to top, you get a
summary of the input that has already been read. In an LL parser, if
you read the stack from top to bottom, you get a prediction of the
input that remains to be read. In both cases, when I say the stack is
a "summary," I mean that it may be at a relatively abstract level
because it can contain nonterminals rather than only terminals. For
example, if an LR stack contains

+
Expr
(

it means that the input that was already read consists of a left
parenthesis, some sort of Expr, and then a plus sign. This is an
abstract summary, because it doesn't specify which particular terminal
symbols were read that constituted the Expr. Similarly, if an LL
stack contains

+
Expr
)

it means that the remaining input, which has not yet been read, is
expected to contain a plus sign, some sort of Expr, and then a right
parenthesis. Again, there is no specificity regarding what form the
Expr will take.

This is just the conceptual view; LR stacks in paricular actually
contain numeric "states", which are critical for understanding how the
parser efficiently decides on the next action to take, but which are
not critical for understanding the fundamental contrast between LL and
LR. Instead, a better next step would be to move from considering how
to read the stack to considering what the two actions available to the
parser are. For appreciating the LL/LR contrast, you can just assume
the appropriate action is magically chosen.

In an LR parser, the two possible actions are:

(1) Read in one more terminal symbol from the input; to maintain the
stack as a summary of the input that has been read, the terminal
symbol needs to be pushed onto the stack. This is called
"shifting" a symbol onto the stack.

(2) Reduce the summary that is on the stack to a more abstract one by
replacing the right-hand side of a production by the left-hand
side. For example, if there is a production Factor -> ( Expr )
and the stack is
)
Expr
(
(
then a reduction would summarize the "( Expr )" part more
concisely as being a Factor; this entails popping the top three
symbols off the stack and pushing the Factor on, yielding
Factor
(
which is still a summary of the same already-consumed input, just
a more abstract one.

Because of these two actions, the LR parser is also called a
shift/reduce parser.

In an LL parser, on the other hand, the actions are the opposite,
reflecting the oppositeness of what its stack means:

(1) If a terminal symbol is read in from the input, then the same
terminal symbol must be popped off from the top of the stack,
because the stack is a prediction of the remaining input. The
action here consists of confirming one symbol worth of that
prediction.

(2) If instead the prediction is changed without reading input, this
happens by expanding out the prediction to a less abstract one
by replacin
 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by Hans-Peter » Mon, 24 Apr 2006 23:05:53


Just an idea:

A top-down parser starts thinking about possible alternatives, *before*
inspecting an input symbol. He will immediately know, when only one or
zero alternatives remain, whereupon all read symbols can be discarded.

As a scout, he'll know where he is, but not where to go.


A bottom-up parser reads input symbols and determines the possible
alternatives *afterwards*, from what he already has read. If not a
single alternative remains, it continues reading, hoping that further
symbols will make sense later.

As a scout, he'll know where to go, but not which way.

DoDi
 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by pbman » Sun, 30 Apr 2006 12:57:08


In the previous replies I saw a lot of information about parsing, how
the parser performs recognition of the input (bottom up and top down),
but nothing about AST generation. It seems the original poster meant
to say parse tree instead of AST tree. Let's try to use the correct
terms for these things so there is hope for those who are trying to
learn from this discussion group.

AST generation is easily done with the LRgen product at
http://www.yqcomputer.com/

Paul Mann
 
 
 

understanding the intuition behind LL(k) parsers and LR(k) parsers

Post by pbman » Wed, 03 May 2006 04:06:13

> It seems the original poster meant to say parse tree instead of AST tree.

I made a mistake. I should have said "parse stack" instead of "parse
tree."

Paul Mann
http://www.yqcomputer.com/