Legend of the Regular Grammar
The regular grammar defines the basic language elements i.e. tokens as
certain classes of character sequences like numbers, identifiers, operators
and strings.
Each rule defining such a class of character
sequences has the following structure: <Class Type> <Class
Identifier> :: <Regular Expression>
We distinguish four types of classes:
-
let
Helper class, used to define the more complex tokens
They didn't belong to the language definition.
-
com
Comments
They didn't belong to the language definition.
-
tok
Tokens
They represent the regular grammar of the language definition.
-
ign
Character sequences which should be ignored i.e. skipped by the scanner
They didn't belong to the language definition.
A regular expression spezifies
the character sequences belonging to the class. Such a description usually
consists of the following elements and operators:
-
Expression1 Expression2 ... ExpressionN
Concatenation of partial expressions
-
Expression1 | Expression2 | ... | ExpressionN
Union of partial expressions ( alternatives )
-
Expression1 - Expression2 - ... - ExpressionN
Difference of partial expressions
-
[ Expression ]
Optional partial expression
-
{ Expression }
Iteration of a partial expression ( 0 .. )
-
Expression +
Iteration of a partial expression ( 1 .. )
-
( Expression )
Combination of a partial expression ( subexpression )
-
Class identifier
Abreviation for the corresponding regular expression
-
"String"
Literal: string / character sequence
-
'Characterset'
Literal: characterset ( 1 .. )
Case ignore character classes
can be spezified with an [I]
behind the class identifier.
Legend of the Context-free Grammar
The syntax rules are described in
EBNF ( Extended Backus-Naur-Form ).
A startsymbol must exist for each source file type. That means the syntax
within each file has to be conform to the corresponding start rule.
The other are internal helper rules.
Each rule
is structured as follows: <Rule Type: start or let>
<Rule Identifier> :: <EBNF-konform Expression>
An EBNF-konform expression defines a part
of the language syntax. It consists of a set of alternative productions i.e.
partial expressions, separated by the character '|'.
A production can be spezified with the help of the following elements and operators:
-
Expression1 Expression2 ... ExpressionN
Concatenation of partial expressions
-
Expression1 | Expression2 | ... | ExpressionN
Union of partial expressions ( alternatives )
-
[ Expression ]
Optional partial expression
-
{ Expression }
Iteration of a partial expression ( 0 .. )
-
Expression +
Iteration of a partial expression ( 1 .. )
-
( Expression )
Combination of a partial expression ( subexpression )
-
Token Identifier
... from the regular grammar
-
Rule Identifier
... from the context-free grammar
-
Keyword
Constant string / character sequence
-
_other_
Special keyword denoting the
character sequences from the set 'Sigma* \ Tokenset'