&cfgid; Language Reference
contents
&cfgid; Language Reference
Regular Grammar
The regular grammar defines the basic language elements i.e. tokens as
certain classes of character sequences like numbers, identifiers, operators
and strings.
Each rule defining such a class of character
sequences has the following structure:
<Class Type>
<
[Member group :] Class Identifier [! Next group to activate]>
:: <Regular Expression>
We distinguish five types of classes:
-
let
Helper class, used to define the more complex tokens
They didn't belong to the language definition.
-
com
Comments
They didn't belong to the language definition.
-
tok
Tokens
They represent the regular grammar of the language definition.
-
ign
Character sequences which should be ignored i.e. skipped by the scanner
They didn't belong to the language definition.
-
ind
(De)indent tokens
Indent and dedent events will be forwarded to the parser.
Otherwise these character sequences will be skipped by the scanner.
-
lan
Embedded language tokens
These are special token classes which has been introduced in order to
integrate embedded languages.
A regular expression spezifies
the character sequences belonging to the class. Such a description usually
consists of the following elements and operators:
-
Expression1 Expression2 ... ExpressionN
Concatenation of partial expressions
-
Expression1 | Expression2 | ... | ExpressionN
Union of partial expressions ( alternatives )
-
Expression1 - Expression2 - ... - ExpressionN
Difference of partial expressions
-
[ Expression ]
Optional partial expression
-
{ Expression }
Iteration of a partial expression ( 0 .. )
-
Expression +
Iteration of a partial expression ( 1 .. )
-
( Expression )
Combination of a partial expression ( subexpression )
-
< LeftParanthesis > InnerExpression < RightParanthesis >
non-regular dyck expression
-
Class identifier
Abreviation for the corresponding regular expression
-
"String"
Literal: string / character sequence
-
'Characterset'
Literal: characterset ( 1 .. )
Case ignore character classes
can be spezified with an [I]
behind the class identifier.
Context-free Grammar
The syntax rules are described in
EBNF ( Extended Backus-Naur-Form ).
A startsymbol must exist for each source file type. That means the syntax
within each file has to be conform to the corresponding start rule.
The other are internal helper rules.
Each rule
is structured as follows: <Rule Type: start or let>
<Rule Identifier> :: <EBNF-konform Expression>
An EBNF-konform expression defines a part
of the language syntax. It consists of a set of alternative productions i.e.
partial expressions, separated by the character '|'.
A production can be spezified with the help of the following elements and operators:
-
Expression1 Expression2 ... ExpressionN
Concatenation of partial expressions
-
Expression1 | Expression2 | ... | ExpressionN
Union of partial expressions ( alternatives )
-
[ Expression ]
Optional partial expression
-
{ Expression }
Iteration of a partial expression ( 0 .. )
-
Expression +
Iteration of a partial expression ( 1 .. )
-
( Expression )
Combination of a partial expression ( subexpression )
-
Token Identifier
... from the regular grammar
-
Rule Identifier
... from the context-free grammar
-
Keyword
Constant string / character sequence
-
_other_
Special keyword denoting the
character sequences from the set 'Sigma* \ Tokenset'
&botcom;