Friday, November 30, 2007

02. The "labelled BNF" Formalism

Since there is plenty of information related to the "labelled BNF" formalism, I would like to refer to [1]: It explains what it is about and how you can use it; since it is only 13 pages long it does not take much time to work through. But if you prefer not to, it is also fine - provided you know what BNF is about! Basically, "labelled BNF" grammars are the same as "BNF" grammars with the exception that every rule has got a label attached. I cite the following short information from [1]:
... the grammar formalism Labelled BNF (LBNF), which is used in the compiler construction tool BNF Converter. Given a grammar written in LBNF, the BNF Converter produces a complete compiler front end (up to, but excluding, type checking), i.e. a lexer, a parser, and an abstract syntax definition. Moreover, it produces a pretty-printer ...
So, it is about what we already know from the introduction; below you see the grammar for the arithmetic expressions:
--Begin LBNF file

Program. Program ::= [Formula] ;
Formula. Formula ::= Exp ;

Term. Exp ::= Exp TermOp Exp1 ;
Fact. Exp1 ::= Exp1 FactOp Exp2 ;

Nat. Exp2 ::= Integer ;
PNat. Exp2 ::= "+" Integer ;
NNat. Exp2 ::= "-" Integer ;

Real. Exp2 ::= Double ;
PReal. Exp2 ::= "+" Double ;
NReal. Exp2 ::= "-" Double ;

Var. Exp2 ::= Ident ;
PVar. Exp2 ::= "+" Ident ;
NVar. Exp2 ::= "-" Ident ;

Func. Exp2 ::= Ident "(" [Exp] ")" ;
PFunc. Exp2 ::= "+" Ident "(" [Exp] ")" ;
NFunc. Exp2 ::= "-" Ident "(" [Exp] ")" ;

POE. Exp2 ::= PreOp Exp ;
Pos. PreOp ::= "+" ;
Neg. PreOp ::= "-" ;

AddOp. TermOp ::= "+" ;
SubOp. TermOp ::= "-" ;

MulOp. FactOp ::= "*" ;
DivOp. FactOp ::= "/" ;
ModOp. FactOp ::= "%" ;

-- extra stuff

separator Formula ";" ;
separator Exp "," ;
coercions Exp 2 ;
comment "#" ;

entrypoints Program ;

--LBNF file ends here
As you see, the input is treated as a 'program' of formulas, where every formula is made of expressions. There are terms and factors, natural and real numbers, variables and functions. Every number, variable and function can have a positive or a negative sign pre-attached.

Further a "POE" is an expression with a sign in front; it was necessary to split the handling of signs between the primitive expressions (numbers, variables and functions) and the others, since otherwise the BNF Converter had difficulties to produce the desired parser for arithmetic expressions we look for. Please do not dismiss this rule, since we will have to deal with it later (for the 'unifier') quite a lot!

Then there are the operators and some extra stuff which I do not wish to discuss here. But if you are interested to learn more about those rules please have a look at [1] or [2].

I`m not sure if this is the most simple grammar to capture arithmetic expressions, but it did the job for me. But I think it might be possible to avoid the positive and negative versions of the primitive expressions: This would reduce the amount of code produced for the parser, the number of AST classes, and finally we would need to apply less changes to it, to get an infix to postfix converter.

No comments: