contents
 

xml Language Reference



Regular Grammar


The regular grammar defines the basic language elements i.e. tokens as certain classes of character sequences like numbers, identifiers, operators and strings.

Each rule defining such a class of character sequences has the following structure:
<Class Type> < [Member group :] Class Identifier [! Next group to activate]> :: <Regular Expression>

We distinguish five types of classes:
 


A regular expression spezifies the character sequences belonging to the class. Such a description usually consists of the following elements and operators:
 

Case ignore character classes can be spezified with an [I] behind the class identifier.
 
 

 [xml.sty] Grammatik fuer [.xml]-Dateien


 Extensible Markup Language
 [xml2.lex] Token definitions for [.xml]-Files
 [xml1.lex] Token definitions for [.xml]-Files


 Extensible Markup Language ( base tokens 1 )
letChar  :: 

'\09' | '\0a' | '\0d' | '\x00000020' .. '\x0000d7ff' | '\x0000e000' .. '\x0000fffd' | '\x00010000' .. '\x0010ffff'


 
     
letSpace  :: 

'\09' | '\0a' | '\0d' | '\20'


 
     
letBaseChar  :: 

'\x00000041' .. '\x0000005A' | '\x00000061' .. '\x0000007A' | '\x000000C0' .. '\x000000D6' | '\x000000D8' .. '\x000000F6' | '\x000000F8' .. '\x000000FF' | '\x00000100' .. '\x00000131' | '\x00000134' .. '\x0000013E' | '\x00000141' .. '\x00000148' | '\x0000014A' .. '\x0000017E' | '\x00000180' .. '\x000001C3' | '\x000001CD' .. '\x000001F0' | '\x000001F4' .. '\x000001F5' | '\x000001FA' .. '\x00000217' | '\x00000250' .. '\x000002A8' | '\x000002BB' .. '\x000002C1' | '\x00000386' | '\x00000388' .. '\x0000038A' | '\x0000038C' | '\x0000038E' .. '\x000003A1' | '\x000003A3' .. '\x000003CE' | '\x000003D0' .. '\x000003D6' | '\x000003DA' | '\x000003DC' | '\x000003DE' | '\x000003E0' | '\x000003E2' .. '\x000003F3' | '\x00000401' .. '\x0000040C' | '\x0000040E' .. '\x0000044F' | '\x00000451' .. '\x0000045C' | '\x0000045E' .. '\x00000481' | '\x00000490' .. '\x000004C4' | '\x000004C7' .. '\x000004C8' | '\x000004CB' .. '\x000004CC' | '\x000004D0' .. '\x000004EB' | '\x000004EE' .. '\x000004F5' | '\x000004F8' .. '\x000004F9' | '\x00000531' .. '\x00000556' | '\x00000559' | '\x00000561' .. '\x00000586' | '\x000005D0' .. '\x000005EA' | '\x000005F0' .. '\x000005F2' | '\x00000621' .. '\x0000063A' | '\x00000641' .. '\x0000064A' | '\x00000671' .. '\x000006B7' | '\x000006BA' .. '\x000006BE' | '\x000006C0' .. '\x000006CE' | '\x000006D0' .. '\x000006D3' | '\x000006D5' | '\x000006E5' .. '\x000006E6' | '\x00000905' .. '\x00000939' | '\x0000093D' | '\x00000958' .. '\x00000961' | '\x00000985' .. '\x0000098C' | '\x0000098F' .. '\x00000990' | '\x00000993' .. '\x000009A8' | '\x000009AA' .. '\x000009B0' | '\x000009B2' | '\x000009B6' .. '\x000009B9' | '\x000009DC' .. '\x000009DD' | '\x000009DF' .. '\x000009E1' | '\x000009F0' .. '\x000009F1' | '\x00000A05' .. '\x00000A0A' | '\x00000A0F' .. '\x00000A10' | '\x00000A13' .. '\x00000A28' | '\x00000A2A' .. '\x00000A30' | '\x00000A32' .. '\x00000A33' | '\x00000A35' .. '\x00000A36' | '\x00000A38' .. '\x00000A39' | '\x00000A59' .. '\x00000A5C' | '\x00000A5E' | '\x00000A72' .. '\x00000A74' | '\x00000A85' .. '\x00000A8B' | '\x00000A8D' | '\x00000A8F' .. '\x00000A91' | '\x00000A93' .. '\x00000AA8' | '\x00000AAA' .. '\x00000AB0' | '\x00000AB2' .. '\x00000AB3' | '\x00000AB5' .. '\x00000AB9' | '\x00000ABD' | '\x00000AE0' | '\x00000B05' .. '\x00000B0C' | '\x00000B0F' .. '\x00000B10' | '\x00000B13' .. '\x00000B28' | '\x00000B2A' .. '\x00000B30' | '\x00000B32' .. '\x00000B33' | '\x00000B36' .. '\x00000B39' | '\x00000B3D' | '\x00000B5C' .. '\x00000B5D' | '\x00000B5F' .. '\x00000B61' | '\x00000B85' .. '\x00000B8A' | '\x00000B8E' .. '\x00000B90' | '\x00000B92' .. '\x00000B95' | '\x00000B99' .. '\x00000B9A' | '\x00000B9C' | '\x00000B9E' .. '\x00000B9F' | '\x00000BA3' .. '\x00000BA4' | '\x00000BA8' .. '\x00000BAA' | '\x00000BAE' .. '\x00000BB5' | '\x00000BB7' .. '\x00000BB9' | '\x00000C05' .. '\x00000C0C' | '\x00000C0E' .. '\x00000C10' | '\x00000C12' .. '\x00000C28' | '\x00000C2A' .. '\x00000C33' | '\x00000C35' .. '\x00000C39' | '\x00000C60' .. '\x00000C61' | '\x00000C85' .. '\x00000C8C' | '\x00000C8E' .. '\x00000C90' | '\x00000C92' .. '\x00000CA8' | '\x00000CAA' .. '\x00000CB3' | '\x00000CB5' .. '\x00000CB9' | '\x00000CDE' | '\x00000CE0' .. '\x00000CE1' | '\x00000D05' .. '\x00000D0C' | '\x00000D0E' .. '\x00000D10' | '\x00000D12' .. '\x00000D28' | '\x00000D2A' .. '\x00000D39' | '\x00000D60' .. '\x00000D61' | '\x00000E01' .. '\x00000E2E' | '\x00000E30' | '\x00000E32' .. '\x00000E33' | '\x00000E40' .. '\x00000E45' | '\x00000E81' .. '\x00000E82' | '\x00000E84' | '\x00000E87' .. '\x00000E88' | '\x00000E8A' | '\x00000E8D' | '\x00000E94' .. '\x00000E97' | '\x00000E99' .. '\x00000E9F' | '\x00000EA1' .. '\x00000EA3' | '\x00000EA5' | '\x00000EA7' | '\x00000EAA' .. '\x00000EAB' | '\x00000EAD' .. '\x00000EAE' | '\x00000EB0' | '\x00000EB2' .. '\x00000EB3' | '\x00000EBD' | '\x00000EC0' .. '\x00000EC4' | '\x00000F40' .. '\x00000F47' | '\x00000F49' .. '\x00000F69' | '\x000010A0' .. '\x000010C5' | '\x000010D0' .. '\x000010F6' | '\x00001100' | '\x00001102' .. '\x00001103' | '\x00001105' .. '\x00001107' | '\x00001109' | '\x0000110B' .. '\x0000110C' | '\x0000110E' .. '\x00001112' | '\x0000113C' | '\x0000113E' | '\x00001140' | '\x0000114C' | '\x0000114E' | '\x00001150' | '\x00001154' .. '\x00001155' | '\x00001159' | '\x0000115F' .. '\x00001161' | '\x00001163' | '\x00001165' | '\x00001167' | '\x00001169' | '\x0000116D' .. '\x0000116E' | '\x00001172' .. '\x00001173' | '\x00001175' | '\x0000119E' | '\x000011A8' | '\x000011AB' | '\x000011AE' .. '\x000011AF' | '\x000011B7' .. '\x000011B8' | '\x000011BA' | '\x000011BC' .. '\x000011C2' | '\x000011EB' | '\x000011F0' | '\x000011F9' | '\x00001E00' .. '\x00001E9B' | '\x00001EA0' .. '\x00001EF9' | '\x00001F00' .. '\x00001F15' | '\x00001F18' .. '\x00001F1D' | '\x00001F20' .. '\x00001F45' | '\x00001F48' .. '\x00001F4D' | '\x00001F50' .. '\x00001F57' | '\x00001F59' | '\x00001F5B' | '\x00001F5D' | '\x00001F5F' .. '\x00001F7D' | '\x00001F80' .. '\x00001FB4' | '\x00001FB6' .. '\x00001FBC' | '\x00001FBE' | '\x00001FC2' .. '\x00001FC4' | '\x00001FC6' .. '\x00001FCC' | '\x00001FD0' .. '\x00001FD3' | '\x00001FD6' .. '\x00001FDB' | '\x00001FE0' .. '\x00001FEC' | '\x00001FF2' .. '\x00001FF4' | '\x00001FF6' .. '\x00001FFC' | '\x00002126' | '\x0000212A' .. '\x0000212B' | '\x0000212E' | '\x00002180' .. '\x00002182' | '\x00003041' .. '\x00003094' | '\x000030A1' .. '\x000030FA' | '\x00003105' .. '\x0000312C' | '\x0000AC00' .. '\x0000D7A3'


 
     
letCombChar  :: 

'\x00000300' .. '\x00000345' | '\x00000360' .. '\x00000361' | '\x00000483' .. '\x00000486' | '\x00000591' .. '\x000005A1' | '\x000005A3' .. '\x000005B9' | '\x000005BB' .. '\x000005BD' | '\x000005BF' | '\x000005C1' .. '\x000005C2' | '\x000005C4' | '\x0000064B' .. '\x00000652' | '\x00000670' | '\x000006D6' .. '\x000006DC' | '\x000006DD' .. '\x000006DF' | '\x000006E0' .. '\x000006E4' | '\x000006E7' .. '\x000006E8' | '\x000006EA' .. '\x000006ED' | '\x00000901' .. '\x00000903' | '\x0000093C' | '\x0000093E' .. '\x0000094C' | '\x0000094D' | '\x00000951' .. '\x00000954' | '\x00000962' .. '\x00000963' | '\x00000981' .. '\x00000983' | '\x000009BC' | '\x000009BE' | '\x000009BF' | '\x000009C0' .. '\x000009C4' | '\x000009C7' .. '\x000009C8' | '\x000009CB' .. '\x000009CD' | '\x000009D7' | '\x000009E2' .. '\x000009E3' | '\x00000A02' | '\x00000A3C' | '\x00000A3E' | '\x00000A3F' | '\x00000A40' .. '\x00000A42' | '\x00000A47' .. '\x00000A48' | '\x00000A4B' .. '\x00000A4D' | '\x00000A70' .. '\x00000A71' | '\x00000A81' .. '\x00000A83' | '\x00000ABC' | '\x00000ABE' .. '\x00000AC5' | '\x00000AC7' .. '\x00000AC9' | '\x00000ACB' .. '\x00000ACD' | '\x00000B01' .. '\x00000B03' | '\x00000B3C' | '\x00000B3E' .. '\x00000B43' | '\x00000B47' .. '\x00000B48' | '\x00000B4B' .. '\x00000B4D' | '\x00000B56' .. '\x00000B57' | '\x00000B82' .. '\x00000B83' | '\x00000BBE' .. '\x00000BC2' | '\x00000BC6' .. '\x00000BC8' | '\x00000BCA' .. '\x00000BCD' | '\x00000BD7' | '\x00000C01' .. '\x00000C03' | '\x00000C3E' .. '\x00000C44' | '\x00000C46' .. '\x00000C48' | '\x00000C4A' .. '\x00000C4D' | '\x00000C55' .. '\x00000C56' | '\x00000C82' .. '\x00000C83' | '\x00000CBE' .. '\x00000CC4' | '\x00000CC6' .. '\x00000CC8' | '\x00000CCA' .. '\x00000CCD' | '\x00000CD5' .. '\x00000CD6' | '\x00000D02' .. '\x00000D03' | '\x00000D3E' .. '\x00000D43' | '\x00000D46' .. '\x00000D48' | '\x00000D4A' .. '\x00000D4D' | '\x00000D57' | '\x00000E31' | '\x00000E34' .. '\x00000E3A' | '\x00000E47' .. '\x00000E4E' | '\x00000EB1' | '\x00000EB4' .. '\x00000EB9' | '\x00000EBB' .. '\x00000EBC' | '\x00000EC8' .. '\x00000ECD' | '\x00000F18' .. '\x00000F19' | '\x00000F35' | '\x00000F37' | '\x00000F39' | '\x00000F3E' | '\x00000F3F' | '\x00000F71' .. '\x00000F84' | '\x00000F86' .. '\x00000F8B' | '\x00000F90' .. '\x00000F95' | '\x00000F97' | '\x00000F99' .. '\x00000FAD' | '\x00000FB1' .. '\x00000FB7' | '\x00000FB9' | '\x000020D0' .. '\x000020DC' | '\x000020E1' | '\x0000302A' .. '\x0000302F' | '\x00003099' | '\x0000309A'


 
     
letExtender  :: 

'\x000000B7' | '\x000002D0' | '\x000002D1' | '\x00000387' | '\x00000640' | '\x00000E46' | '\x00000EC6' | '\x00003005' | '\x00003031' .. '\x00003035' | '\x0000309D' .. '\x0000309E' | '\x000030FC' .. '\x000030FE'


 
     
letDigit  :: 

'\x00000030' .. '\x00000039' | '\x00000660' .. '\x00000669' | '\x000006F0' .. '\x000006F9' | '\x00000966' .. '\x0000096F' | '\x000009E6' .. '\x000009EF' | '\x00000A66' .. '\x00000A6F' | '\x00000AE6' .. '\x00000AEF' | '\x00000B66' .. '\x00000B6F' | '\x00000BE7' .. '\x00000BEF' | '\x00000C66' .. '\x00000C6F' | '\x00000CE6' .. '\x00000CEF' | '\x00000D66' .. '\x00000D6F' | '\x00000E50' .. '\x00000E59' | '\x00000ED0' .. '\x00000ED9' | '\x00000F20' .. '\x00000F29'


 
     
letIdeograf  :: 

'\x00004E00' .. '\x00009FA5' | '\x00003007' | '\x00003021' .. '\x00003029'


 
     
letLetter  :: 

BaseChar | Ideograf


 
     
letPubChar  :: 

'\20' | '\0d' | '\0a' | HexChar | '-\'()+,./:=?;!*#@$_%'


 
     
letHexChar  :: 

'0' .. '9' | 'a' .. 'z' | 'A' .. 'Z'


 
     
letNameChar  :: 

Letter | Digit | '.:-_' | CombChar | Extender


 
     
letKANY  :: 

'Aa' 'Nn' 'Yy'


 
     
letKATTLIST  :: 

'Aa' 'Tt' 'Tt' 'Ll' 'Ii' 'Ss' 'Tt'


 
     
letKCDATA  :: 

'Cc' 'Dd' 'Aa' 'Tt' 'Aa'


 
     
letKELEMENT  :: 

'Ee' 'Ll' 'Ee' 'Mm' 'Ee' 'Nn' 'Tt'


 
     
letKEMPTY  :: 

'Ee' 'Mm' 'Pp' 'Tt' 'Yy'


 
     
letKENCODING  :: 

'Ee' 'Nn' 'Cc' 'Oo' 'Dd' 'Ii' 'Nn' 'Gg'


 
     
letKENTITY  :: 

'Ee' 'Nn' 'Tt' 'Ii' 'Tt' 'Yy'


 
     
letKENTITIES  :: 

'Ee' 'Nn' 'Tt' 'Ii' 'Tt' 'Ii' 'Ee' 'Ss'


 
     
letKFIXED  :: 

'Ff' 'Ii' 'Xx' 'Ee' 'Dd'


 
     
letKID  :: 

'Ii' 'Dd'


 
     
letKIDREF  :: 

'Ii' 'Dd' 'Rr' 'Ee' 'Ff'


 
     
letKIDREFS  :: 

'Ii' 'Dd' 'Rr' 'Ee' 'Ff' 'Ss'


 
     
letKIGNORE  :: 

'Ii' 'Gg' 'Nn' 'Oo' 'Rr' 'Ee'


 
     
letKIMPLIED  :: 

'Ii' 'Mm' 'Pp' 'Ll' 'Ii' 'Ee' 'Dd'


 
     
letKINCLUDE  :: 

'Ii' 'Nn' 'Cc' 'Ll' 'Uu' 'Dd' 'Ee'


 
     
letKNDATA  :: 

'Nn' 'Dd' 'Aa' 'Tt' 'Aa'


 
     
letKNMTOKEN  :: 

'Nn' 'Mm' 'Tt' 'Oo' 'Kk' 'Ee' 'Nn'


 
     
letKNMTOKENS  :: 

'Nn' 'Mm' 'Tt' 'Oo' 'Kk' 'Ee' 'Nn' 'Ss'


 
     
letKNOTATION  :: 

'Nn' 'Oo' 'Tt' 'Aa' 'Tt' 'Ii' 'Oo' 'Nn'


 
     
letKNO  :: 

'Nn' 'Oo'


 
     
letKPCDATA  :: 

'Pp' 'Cc' 'Dd' 'Aa' 'Tt' 'Aa'


 
     
letKPUBLIC  :: 

'Pp' 'Uu' 'Bb' 'Ll' 'Ii' 'Cc'


 
     
letKREQUIRED  :: 

'Rr' 'Ee' 'Qq' 'Uu' 'Ii' 'Rr' 'Ee' 'Dd'


 
     
letKSTANDALONE  :: 

'Ss' 'Tt' 'Aa' 'Nn' 'Dd' 'Aa' 'Ll' 'Oo' 'Nn' 'Ee'


 
     
letKSYSTEM  :: 

'Ss' 'Yy' 'Ss' 'Tt' 'Ee' 'Mm'


 
     
letKVERSION  :: 

'Vv' 'Ee' 'Rr' 'Ss' 'Ii' 'Oo' 'Nn'


 
     
letKXML  :: 

'Xx' 'Mm' 'Ll'


 
     
letKYES  :: 

'Yy' 'Ee' 'Ss'


 
     
letKDOCTYPE  :: 

'Dd' 'Oo' 'Cc' 'Tt' 'Yy' 'Pp' 'Ee'


 
     
letKeyword  :: 

KANY | KATTLIST | KCDATA | KDOCTYPE | KELEMENT | KEMPTY | KENCODING | KENTITY | KENTITIES | KFIXED | KID | KIDREF | KIDREFS | KIGNORE | KIMPLIED | KINCLUDE | KNDATA | KNMTOKEN | KNMTOKENS | KNO | KNOTATION | KPCDATA | KPUBLIC | KREQUIRED | KSTANDALONE | KSYSTEM | KVERSION | KXML | KYES


 
     
letIde  :: 

( Letter | '_' | ':' ) { NameChar }


 
     
tokDCharRef  :: 

"&#" ( '0' .. '9' ) + ";"


 
     
tokHCharRef  :: 

"&#x" HexChar + ";"


 
     
tokERef  :: 

"&" Ide ";"


 
     
tokPERef  :: 

"%" Ide ";"


 
     
 Extensible Markup Language ( base tokens 2 )
tokNmtoken  :: 

( NameChar - ( Letter | '_' | ':' ) ) { NameChar }


 
     
letEQ  :: 

{ Space } '=' { Space }


 
     
tokXMLDecl  :: 

"<?" KXML [ Space + KVERSION EQ '\'\"' ( HexChar | '.:-_' ) + '\'\"' ] [ Space + KENCODING EQ '\'\"' ( HexChar | '-_' ) + '\'\"' ] [ Space + KSTANDALONE EQ '\'\"' ( KYES | KNO ) '\'\"' ] { Space } "?>"


 
     
tokPI  :: 

( "<?" Ide ( { Char } - ( { Char } "?>" { Char } ) ) "?>" ) - XMLDecl


 
     
tokCDSect  :: 

"<![" KCDATA "[" ( { Char } - ( { Char } "]]>" { Char } ) ) "]]>"


 
     
tokElmStart  :: 

"<" | "</" | "<!"


 
     
tokTagEnd  :: 

">" | "/>"


 
     
tokXMLOpr  :: 

'=,*?+|()[]%#' | "<![" | "]]>" | "?>" | ")*"


 
     
 xml-relevant tokens
tokName  :: 

Ide


 
     
landtd:dtdEmbed:dtd  :: 

dtd

 

 
     
tokDTDStart:dtdEmbed  :: 

"<!" KDOCTYPE


 
     
tokLiteral  :: 

"\"" { Char - '<\"' } "\"" | "\'" { Char - '<\'' } "\'"


 
     
letDChar  :: 

Char - ( '<>&=,*?+|()[]%#' | Space | NameChar )


 
     
tokCharData  :: 

DChar + - ( { DChar } ( Literal | TagEnd ) { DChar } )


 
     
tokEmpty  :: 

Space +


 
     
tokComment  :: 

"<!--" ( { Char } - ( { Char } "--" { Char } ) ) "-->"


 



Context-free Grammar


The syntax rules are described in EBNF ( Extended Backus-Naur-Form ). A startsymbol must exist for each source file type. That means the syntax within each file has to be conform to the corresponding start rule. The other are internal helper rules.

Each rule is structured as follows: <Rule Type: start or let <Rule Identifier> :: <EBNF-konform Expression>
An EBNF-konform expression defines a part of the language syntax. It consists of a set of alternative productions i.e. partial expressions, separated by the character '|'.

A production can be spezified with the help of the following elements and operators:
 


startXDoc  :: 

[ XMLDecl ] Content


 
     
letAttrs  :: 

[ ( Empty Name [ Empty ] = [ Empty ] Literal Attrs ) + ] { Empty }


 
     
letContent  :: 

[ CDecl Content ]


 
     
letEBody  :: 

> Content </ Name [ Empty ] >


 
     
   | 

/>

 

 
     
letCDecl  :: 

HCharRef


 
     
   | 

=

 

 
     
   | 

Nmtoken

 

 
     
   | 

ERef

 

 
     
   | 

CharData

 

 
     
   | 

XMLOpr

 

 
     
   | 

DCharRef

 

 
     
   | 

CDSect

 

 
     
   | 

PERef

 

 
     
   | 

Name

 

 
     
   | 

>

 

 
     
   | 

< Name Attrs EBody

 

 
     
   | 

Literal

 

 
     
   | 

_other_

 

 
     
   | 

/>

 

 
     
   | 

DTDStart dtdEmbed

 

 
     
   | 

Comment

 

 
     
   | 

Empty

 

 
     
   | 

PI