Parser 2

Ambiguous Grammar

A grammar is ambiguous if it can derive a sentence with two different parse trees. For example, using the grammar below we can derive two different parse tree with the sentence of 1+2*3.


ambiguous grammar
1+2*3 or (1+2)*3?

Ambiguous grammars are problematic for compiling since it can derive different meanings. Actually, we can usually eliminate ambiguity by transforming the grammar as shown below.


Grammar 1

Predictive Parsing

Some grammar are easy to use a simple algorithm as recursive-descent.
A recursive-descent parser for this language has one function for each non-
terminal and one clause for each production. We illustrate this by writing a recursive-descent parser for grammar below.


enum token {IF, THEN, ELSE, BEGIN, END, PRINT, SEMI, NUM, EQ};
extern enum token getToken(void);
enum token tok;
void advance() {tok=getToken();}
void eat(enum token t) {if (tok==t) advance(); else error();}
void S(void) {switch(tok) {
case IF: eat(IF); E(); eat(THEN); S();
eat(ELSE); S(); break;
case BEGIN: eat(BEGIN); S(); L(); break;
case PRINT: eat(PRINT); E(); break;
default: error();
}}
void L(void) {switch(tok) {
case END: eat(END); break;
case SEMI: eat(SEMI); S(); L(); break;
default: error();
}}
void E(void) { eat(NUM); eat(EQ); eat(NUM); }

However, this method will fail if using the umbiguous grammar above.

 void S(void) { E(); eat(EOF); }
void E(void) {switch (tok) {
case ?: E(); eat(PLUS); T(); break;
case ?: E(); eat(MINUS); T(); break;
case ?: T(); break;
default: error();
}}
void T(void) {switch (tok) {
case ?: T(); eat(TIMES); F(); break;
case ?: T(); eat(DIV); F(); break;
case ?: F(); break;
default: error();
}}

There is a conflict here: the E function has no way to know which production to use. That means the recursive-descent parsing works only on grammars where the first terminal symbol of each subexpression provides enough information to choose which production to use. So how can we use predictive parsing working on more complicated grammar?
This is how we do to parse the grammar below.


image.png

1. Calculate the First and Follow sets of all terminals in grammar.

FIRST(r) is the set of all terminals that can begin any string derived from r. For example, let r= T*F, thus FIRST(r) = { id , num , ( }.
Notice: If two different production X->r1 and X->r2 have the same left-hand-side symbol(X) and their right hand side have overlapping FIRST sets, then the grammar cannot be parsed using the predictive parsing. if FIRST(r1)=FIRST(r2)=I, then the X function in parser will not know what to do if the input token = I.
FOLLOW(X) is the set of terminals that can immediately follow X.
Then use the algorithm below to calculate.

for each terminal symbol Z
  FIRST[Z]←{Z}
repeat
  for each production X → Y1Y2 ··· Yk
    for each i from 1 to k, each j from i + 1 to k,
      if all the Yi are nullable
        then nullable[X] ← true
      if Y1 ··· Yi−1 are all nullable
        then FIRST[X] ← FIRST[X] ∪ FIRST[Yi]
      if Yi+1 ··· Yk are all nullable
        then FOLLOW[Yi] ← FOLLOW[Yi] ∪ FOLLOW[X]
      if Yi+1 ··· Yj−1 are all nullable
        then FOLLOW[Yi] ← FOLLOW[Yi] ∪ FIRST[Yj]
until FIRST, FOLLOW, and nullable did not change in this iteration.
image.png

2. Build the Parsing table.

To construct this table, enter production X → γ in row X, column T of the table for each T ∈ FIRST(γ ). Also, if γ is nullable, enter the production
in row X, column T for each T ∈ FOLLOW[X].


image.png

The presence of duplicate entries means that predictive parsing will not work.
Also this is grammar is ambiguous and ambiguous grammar will always lead to duplicate entries in the predictive parsing table.

3. Use recursive-decent parser to implement it based on the production for each (X,T) in parsing table.

LL(1)

What we talk about above actually is the LL(1), standing for Left-to-right parse, Leftmost-derivation,1-symbol lookahead.
We can generalize the notion of FIRST sets to describe the first k tokens of a string, and to make an LL(k) parsing table whose rows columns are every sequence of k terminals. This is not efficient since the table are so large.

Eliminating Left Recursion

Suppose we build a predictive parser for that ambiguous grammar (Grammar 1)



These two productions are certain to cause duplicate entries in the LL(1) parsing table, since any token in FIRST(T ) will also be in FIRST(E + T ). The problem is that E appears as the first right-hand-side symbol in an E-production; this is called left recursion. Grammars with left recursion cannot be LL(1).
To eliminate left recursion, we will rewrite using right recursion. We introduce
a new nonterminal E', and write


image.png

This derives the same set of strings (on T and +) as the original two productions, but now there is no left recursion. So we can reconstruct the grammar and build the LL(1) parser.


Left Factoring

We have solved Left Recursive problem, but there is one another problem when two production for the same nonterminal start with the same symbols.Like
S → if E then S else S
S → if E then S

We can solve that by left factoring the grammar, like below.
S → if E then S X
X →
X → else S

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,424评论 0 10
  • 男孩:丫头 如果我要离开 你会等我吗? 女孩:会啊,你要去哪 男孩:我要去英国了 这是一个难得的学习机会 女孩:...
    候鸟飞鱼阅读 335评论 0 0
  • 深夜我还不能睡,我擦完泪水,趴在床上,决定用文字记录,不然我会疯。 我们有三天没有说话了,我去你的宿舍五六次,都不...
    你打伞了吗阅读 351评论 1 0
  • 冬韵冬阳入画中,风轻云淡古今同。 碧池倒影更柔美,咏物抒情诗意浓。
    荷塘月色131419阅读 191评论 0 3
  • 首先感谢我的家人对我的支持,今天来到现场的三位都是我挚爱的家人,老公,儿子,婆婆,感谢他们对我工作的支持,一开始我...
    连夫人阅读 837评论 0 0