Entry
Compiler: Syntax: Backus-Naur form: Yacc: How to use Yacc to generate bottom up parser?
Oct 24th, 2003 01:19
Knud van Eeden,
----------------------------------------------------------------------
--- Knud van Eeden --- 22 October 2003 - 02:09 am --------------------
Compiler: Syntax: Backus-Naur form: Yacc: How to use Yacc to generate
bottom up parser?
---
---
Note:
In Windows, you can use bison.exe instead of yacc.
In Linux, use yacc from the command line.
---
---
In analyzing a program, there are 3 modules necessary:
1. a scanner or lexical analyzer.
That is the role of LEX.
2. a parser or syntax analyzer.
That is the role of YACC
3. a semantic analyzer
That is to be supplied by you
---
You can further analyze a program
1. top down
2. bottom up
---
---
A 'top down' analyzer is usually written by looking at the
Backus Naur Form for your language, and then replacing each
non-terminal on the left side by a corresponding procedure
or function.
So you will also have to inform about the Backus Naur Form of
your language in order to be able to do this coding in a
bit systematic way.
Creating a top down program is usually rather easy to do by hand, but
tends to be slow, when there are a lot of rules to handle.
So in practice, a bottom up method is the preferred way.
---
---
A 'bottom up' analyzer is usually generated automatically by a
compiler compiler program, because it is usually rather
complex to code by hand.
You will also have to inform about the Backus Naur Form of
your language in order to have this automatic generation
taken place.
---
---
A well known compiler compiler is
YACC (=Yet another compiler compiler).
This YACC program was created in 1970 by S. C. Johnson.
---
YACC is e.g. available as a command on the UNIX or LINUX
operating systems, and has been used to help implement
hundreds of compilers.
---
---
To get YACC in Linux:
---
For example, if you open in Linux Red Hat v9.0 a console
(right click on the desktop, then 'console')
and you type 'yacc' on the command line, it will show
'usage: yacc [-dlrtv] [-b fileprefix] [-o outputfilename] [-p
symbolprefix] filename'
---
---
To get YACC and LEX in Windows, you use e.g. 'FLEX' and 'BISON'
So download and run
'flex.exe' (which is the same as LEX)
and
'bison.exe' (which is a GNU program similar to YACC)
'bison.simple'
at:
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html
---
To see help in bison.exe, type
bison -h
---
---
YACC reads a syntax description written in a language similar to a
Backus Naur Form, and generates a bottom up parser, that is a 'shift
reduce' parser for your source language (e.g. TSE).
---
---
Steps: Overview:
---
1. you describe all of the syntax of this language in this YACCs Backus
Naur Form like language
2. fed this resulting text into YACC
3. then YACC will generate a text file containing a parser for this
language
4. his resulting parser is in the form of a C program
1. The name of this C program is 'yyparse.c'
1. This 'yyparse.c' C program will include:
1. The parse stack
2. A value stack
3. The parse table
4. The semantic code (supplied by you in {})
2. You must write yourself a 'main()' procedure
for 'yyparse.c'.
2. If necessary, the 'yyparse' parser calls the scanner procedure
(which must be named 'yylex')
1. This scanner you can write yourself
2. or you can use a program Lex, which generates
a lexical analyzer (=a scanner) for you, default
named 'yylex').
Lex will read a textual description of the tokens of your
source language (e.g. TSE) in order to generate automatically
this scanner program.
3. To inform about possible errors, the 'yyparse' parser
calls the error procedure
(which must be named 'yyerror')
5. to perform any semantic actions during parsing
1. you will have to write sequences of C code to perform these
actions
2. then embed these code sequences in this Backus Naur Form like
language
3. When YACC generates the C parser, it will then include these code
sequences
---
---
Steps: Worked out:
1. describe all of the syntax of this language in this YACCs Backus
Naur Form like language
1. Suppose you want to write an interpreter for a very simple
language, consisting of a single expression that has only
add and subtract operators.
This is the syntax description of this language, along with the
embedded semantic code (which will tell what to do when
recognized)
1. You inform that 'NUMBER' is a token:
%token NUMBER
2. You inform that '+' and '-' are left associative operators
%left '+' '-'
3. You inform about the syntax of the language
expression : expression '+' expression { $$ = $1 + $3; }
| expression '-' expression { $$ = $1 - $3; }
| '(' expression ')' { $$ = $2; }
| NUMBER { $$ = $1; }
;
So this tells that an
expression is an expression plus an expression. And the
resulting semantic action is to make the result $$ equal to
the left expression $1, plus the right expression $2.
or
expression is an expression minus an expression. And the
resulting semantic action is to make the result $$ equal to
the left expression $1, minus the right expression $2.
or
expression is open bracket followed by an expression followed
by a closed bracket. And the result $$ is equal to the middle
expression $2
or
an expression is a number. And the resulting semantic action
is to make the result $$ equal to the first expression $1
2. fed this into YACC
3. then YACC will generate a parser for this language
4. this resulting parser is in the form of a C program
1. The name of this C program is 'yyparse.c'
1. This 'yyparse.c' C program includes:
1. The parse stack
2. A value stack
3. The parse table
4. The semantic code (supplied by you in {})
2. You must write yourself a 'main()' procedure
for 'yyparse.c'.
2. If necessary, the 'yyparse' parser calls the scanner procedure
(which must be named 'yylex')
1. This scanner you can write yourself
2. or you can use a program Lex, which generates
a lexical analyzer (=a scanner) for you, default
named 'yylex').
For example:
%%
yylex() {
int c;
c = getchar();
if ( isdigit( c ) ) {
yylval = c - '0';
return NUMBER;
}
return( c );
}
Lex will then read a textual description of the tokens of your
source language (e.g. TSE) in order to generate automatically
this scanner program.
3. To inform about possible errors, the 'yyparse' parser
calls the error procedure
(which must be named 'yyerror')
5. to perform any semantic actions during parsing
1. you will have to write sequences of C code to perform these
actions
2. then embed these code sequences in this Backus Naur Form like
language
3. When YACC generates the C parser, it will then include these code
sequences
---
---
So in the above description, it states that NUMBER is a token returned
by the scanner, and that the operators '+' and '-' are left
associative.
---
In the definition of the expression, 'expression' is a non-terminal
symbol
(because it is on the left side, with a ':' following it.
---
A semicolon ';' terminates each production rule.
---
Each alternate form of a rule can have semantic actions attached to it.
---
The C code to perform the semantic actions are enclosed in the '{'
and '}'
braces.
---
Within this code,
$$ represents the value of that production rule
$1, $2, $3, ... represent the value of the first, second, third, ...
element of the right side
---
Semantic expressions:
---
{ $$ = $1 + $3 }
For example, the first semantic action says that the value of parsing
and executing an expression plus another expression, is the value of the
first expression plus the value of the second expression.
---
{ $$ = $1 }
For example, the fourth semantic action says that
whenever the scanner fetches a NUMBER token, the
scanner must also specify the token's value
(by setting the special variable 'yylval', which
is by default supplied by YACC).
---
When YACC compiles this description, it produces a C routine named
'yyparse' to parse and evaluate an expression.
---
This 'yyparse.c' program includes the parse stack, a value stack,
the parse table, and the semantic code.
---
Within the semantic code, the $$ and the $1, $2, $3, ... are replaced
by references to the value stack to obtain the appropriate values.
---
The parser yyparse calls the scanner (which must be named 'yylex'),
whenever it needs the next input token.
---
When it detects a syntax error, the parser calls an error routine
named 'yyerror'.
---
A companion to YACC is LEX, which generates a lexical analyzer
(=a scanner) named 'yylex'.
---
Thus YACC and LEX are designed to work together.
---
LEX reads a textual description of the tokens of your source language,
in order to generate the scanner.
---
---
Book: see also:
---
[book: Mak, Ronald - writing compilers and interpreters (an applied
approach using C++) - p. 812 - 'Compiler compilers' -
http://www.amazon.com/exec/obidos/tg/detail/-
/0471113530/qid=1066522305/sr=1-1/ref=sr_1_1/103-7895116-2553433?
v=glance&s=books]
---
[book: Aho, Alfred V. / Sethi, Ravi / Ullman, Jeffrey D. - compilers
(principles, techniques and tools) - p. 257 - 'Parser generators' -
http://www.amazon.com/exec/obidos/tg/detail/-
/0201100886/qid=1066522256/sr=8-1/ref=sr_8_1/103-7895116-2553433?
v=glance&s=books&n=507846]
---
[book: Levine, John R. / Mason, Tony / Brown, Doug - Lex & Yacc -
http://www.amazon.com/exec/obidos/tg/detail/-
/1565920007/qid=1066525729/sr=1-1/ref=sr_1_1/103-7895116-2553433?
v=glance&s=books]
---
[book: Louden, Kenneth C. - Compiler Construction -
http://www.amazon.com/exec/obidos/tg/detail/-
/0534939724/qid=1066978603/sr=1-1/ref=sr_1_1/103-7895116-2553433?
v=glance&s=books]
---
---
Internet: see also:
---
The Pat Terry compiler book: Online: Compilers and Compiler Generators
(An Introduction With C++)
http://www.scifac.ru.ac.za/cspt/compbook.htm
---
Compiler: Syntax: Backus-Naur form: Yacc: Windows: Bison: Can you give
a simple example of using Bison?
http://www.faqts.com/knowledge_base/view.phtml/aid/25752/fid/1250
---
a YACC based TINY compiler (only source)
http://www.mathcs.sjsu.edu/faculty/louden/cmptext/
----------------------------------------------------------------------