a b a b a b c a b a a a b c b a a b a b c b a a a a b c
b) The problem:
There is a FIRST() conflict between the two productions for non-terminal T:
T -> a b U T -> a a UWhen the parser expects to find input matching a T, and finds the terminal a, it can't decide which branch of the if statement to choose:
if (lookahead == 'a') {
match('a'); match('b'); U();
}
else if (lookahead == 'a') {
match('a'); match('a'); U();
}
else {
error();
}
c) Transformation:
Left Factoring (in Swedish: vänsterfaktorisering)
d) The old grammar for the non-terminal T:
T -> a b U T -> a a Uis transformed to:
T -> a Rest Rest -> b U | a U
e) New parser:
// Correct parser: goodparser-1.c
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int lookahead;
void error(void) {
printf("Syntax error: unexpected token '%c'\n", lookahead);
exit(EXIT_FAILURE);
}
// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
int input;
while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
;
return input;
}
void match(int expected) {
if (lookahead == expected)
lookahead = scan();
else
error();
}
// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void), Rest(void);
void S(void) {
if (lookahead == 'a') {
match('a'); match('b'); T();
}
else if (lookahead == 'b') {
match('b'); match('a'); T();
}
else {
error();
}
}
void T(void) {
match('a'); Rest();
}
void Rest(void) {
if (lookahead == 'b') {
match('b'); U();
}
else if (lookahead == 'a') {
match('a'); U();
}
else {
error();
}
}
void U(void) {
match('a'); match('b'); match('c');
}
int main(void) {
printf("Badparser 1. Enter input. End with EOF (CTRL-D on Linux).\n");
lookahead = scan(); // To get started, so we have something to compare to
S();
if (lookahead != EOF)
error();
printf("Done!\n");
return EXIT_SUCCESS;
}
a b b c a b b a c a b b a a c a b b a a a c ...Or, with a regular expression: abba*c
b) The problem:
There is left recursion (in Swedish: vänsterrekursion) in this production for the non-terminal T:
T -> T aWhen the parser finds input matching a T, it will then try to parse the right side of the production (T a), which starts with a T, so it will call the function T, which will call itself, and so on. The program will be stuck in an infinite recursion.
void T(void) {
if (lookahead == 'b') {
T(); match('a');
}
...
c) Transformation:
Elimination of left recursion
d) The old grammar for the non-terminal T:
T -> T a T -> bis transformed to:
T -> b Rest Rest -> a Rest | empty
e) New parser:
// Correct parser: goodparser-2.c
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int lookahead;
void error(void) {
printf("Syntax error: unexpected token '%c'\n", lookahead);
exit(EXIT_FAILURE);
}
// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
int input;
while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
;
return input;
}
void match(int expected) {
if (lookahead == expected)
lookahead = scan();
else
error();
}
// These are the non-terminals, calling each other recursively
void S(void), T(void), Rest(void);
void S(void) {
match('a'); match('b'); T(); match('c');
}
void T(void) {
match('b'); Rest();
}
void Rest(void) {
if (lookahead == 'a') {
match('a'); Rest();
}
else {
// Any input is OK, since we can fit an empty sequence anywhere
}
}
int main(void) {
printf("Badparser 2. Enter input. End with EOF (CTRL-D on Linux).\n");
lookahead = scan(); // To get started, so we have something to compare to
S();
if (lookahead != EOF)
error();
printf("Done!\n");
return EXIT_SUCCESS;
}
a b a b c c
b) The problem:
The grammar is ambiguous (in Swedish: tvetydig). The input a b a b c c can be parsed in two different ways:
c) Yes, the parser below parses the language correctly. There is no problem with the language, which just consists of the only valid input a b a b c c. The problem is with how it is to be matched using the productions in the grammar. Is a b c a T, or is it a U that is, in turn, a T?
d) A new, non-ambiguous, grammar that describes the same language:
S -> a b T c T -> a b cAnother non-ambiguous grammar that also describes the same language:
S -> a b T c T -> U U -> a b cYet another non-ambiguous grammar that describes the same language is this, which is the simplest grammar for the language:
S -> a b a b c c
Follow-up question: With these three different grammars, the same input a b a b c c can be parsed in three different ways. Doesn't that mean that it is ambiguous? Do we still have the same problem that we tried to solve?
e) A new parser that implements the first new grammar given above, but still the same language as before:
// A simple parser: goodparser-3.c
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int lookahead;
void error(void) {
printf("Syntax error: unexpected token '%c'\n", lookahead);
exit(EXIT_FAILURE);
}
// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
int input;
while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
;
return input;
}
void match(int expected) {
if (lookahead == expected)
lookahead = scan();
else
error();
}
// These are the non-terminals, calling each other recursively
void S(void), T(void);
void S(void) {
match('a'); match('b'); T(); match('c');
}
void T(void) {
match('a'); match('b'); match('c');
}
int main(void) {
printf("Goodparser 3. Enter input. End with EOF (CTRL-D on Linux).\n");
lookahead = scan(); // To get started, so we have something to compare to
S();
if (lookahead != EOF)
error();
printf("Done!\n");
return EXIT_SUCCESS;
}