parserFwk.pyparsing

Package parserFwk :: Module pyparsing

Module parserFwk.pyparsing

pyparsing module - Classes and methods to define and execute parsing grammars

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don't need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

Here is a program to parse "Hello, World!" (or any greeting of the form "<salutation>, <addressee>!"):

   from pyparsing import Word, alphas
   
   # define grammar of a greeting
   greet = Word( alphas ) + "," + Word( alphas ) + "!" 
   
   hello = "Hello, World!"
   print hello, "->", greet.parseString( hello )

The program outputs the following:

   Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operators.

The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:

extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
quoted strings
embedded comments

Classes
`And`	Requires all given ParseExpressions to be found in the given order.
`CaselessKeyword`
`CaselessLiteral`	Token to match a specified string, ignoring case of letters.
`CharsNotIn`	Token for matching words composed of characters not in a given set.
`Combine`	Converter to concatenate all matching tokens to a single string.
`Dict`	Converter to return a repetitive expression as a list, but also as a dictionary.
`Each`	Requires all given ParseExpressions to be found, but in any order.
`Empty`	An empty token, will always match.
`FollowedBy`	Lookahead matching of the given parse expression.
`Forward`	Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation.
`GoToColumn`	Token to advance to a specific column of input text; useful for tabular report scraping.
`Group`	Converter to return the matched tokens as a list - useful for returning tokens of ZeroOrMore and OneOrMore expressions.
`Keyword`	Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character.
`LineEnd`	Matches if current position is at the end of a line within the parse string
`LineStart`	Matches if current position is at the beginning of a line within the parse string
`Literal`	Token to exactly match a specified string.
`MatchFirst`	Requires that at least one ParseExpression is found.
`NoMatch`	A token that will never match.
`NotAny`	Lookahead to disallow matching with the given parse expression.
`OneOrMore`	Repetition of one or more of the given expression.
`OnlyOnce`	Wrapper for parse actions, to ensure they are only called once.
`Optional`	Optional matching of the given expression.
`Or`	Requires that at least one ParseExpression is found.
`ParseElementEnhance`	Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
`ParseExpression`	Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
`ParserElement`	Abstract base level parser element class.
`ParseResults`	Structured parse results, to provide multiple means of access to the parsed data:
`PositionToken`
`QuotedString`	Token for matching strings that are delimited by quoting characters.
`Regex`	Token for matching strings that match a given regular expression.
`SkipTo`	Token for skipping over all undefined text until the matched expression is found.
`StringEnd`	Matches if current position is at the end of the parse string
`StringStart`	Matches if current position is at the beginning of the parse string
`Suppress`	Converter for ignoring the results of a parsed expression.
`Token`	Abstract ParserElement subclass, for defining atomic matching patterns.
`TokenConverter`	Abstract subclass of ParseExpression, for converting parsed results.
`Upcase`	Converter to upper case all matching tokens.
`White`	Special matching class for matching whitespace.
`Word`	Token for matching words composed of allowed character sets.
`ZeroOrMore`	Optional repetition of zero or more of the given expression.

Exceptions
`ParseBaseException`	base exception class for all parsing runtime exceptions
`ParseException`	exception thrown when parse expressions don't match class
`ParseFatalException`	user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately
`RecursiveGrammarException`	exception thrown by validate() if the grammar could be improperly recursive
`ReparseException`

Function Summary
	`_expanded(p)`
	`col(loc, strg)` Returns current column within a string, counting newlines as line separators.
	`countedArray(expr)` Helper to define a counted list of expressions.
	`delimitedList(expr, delim, combine)` Helper to define a delimited list of expressions - the delimiter defaults to ','.
	`dictOf(key, value)` Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value.
	`downcaseTokens(s, l, t)` Helper parse action to convert tokens to lower case.
	`keepOriginalText(s, startLoc, t)`
	`line(loc, strg)` Returns the line of text containing loc within a string, counting newlines as line separators.
	`lineno(loc, strg)` Returns current line number within a string, counting newlines as line separators.
	`makeHTMLTags(tagStr)` Helper to construct opening and closing tag expressions for HTML, given a tag name
	`makeXMLTags(tagStr)` Helper to construct opening and closing tag expressions for XML, given a tag name
	`matchPreviousExpr(expr)` Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.
	`matchPreviousLiteral(expr)` Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.
	`nullDebugAction(*args)` 'Do-nothing' debug action, to suppress debugging output during parsing.
	`oneOf(strs, caseless, useRegex)` Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
	`operatorPrecedence(baseExpr, opList)` Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy.
	`removeQuotes(s, l, t)` Helper parse action for removing quotation marks from parsed quoted strings.
	`replaceHTMLEntity(t)`
	`replaceWith(replStr)` Helper method for common parse actions that simply return a literal value.
	`srange(s)` Helper to easily define string ranges for use in Word construction.
	`traceParseAction(f)` Decorator for debugging parse actions.
	`upcaseTokens(s, l, t)` Helper parse action to convert tokens to upper case.

Variable Summary
`str`	`__author__` = `'Paul McGuire <ptmcg@users.sourceforge.net>...`
`str`	`__version__` = `'1.4.5'`
`str`	`__versionTime__` = `'16 December 2006 07:20'`
`str`	`alphanums` = `'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQ...`
`str`	`alphas` = `'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRST...`
`unicode`	`alphas8bit` = `u'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\...`
`Combine`	`anyCloseTag` = `</W:(abcd...,abcd...)>`
`And`	`anyOpenTag` = `<W:(abcd...,abcd...)>`
`And`	`commaSeparatedList` = `commaSeparatedList`
`Combine`	`commonHTMLEntity` = `Combine:({{"&" Re:('gt\|lt\|amp\|nbsp\|qu...`
`Regex`	`cppStyleComment` = `C++ style comment`
`Regex`	`cStyleComment` = `C style comment`
`Regex`	`dblQuotedString` = `string enclosed in double quotes`
`Regex`	`dblSlashComment` = `// comment`
`Empty`	`empty` = `empty`
`str`	`hexnums` = `'0123456789ABCDEFabcdef'`
`Regex`	`htmlComment` = `Re:('<!--[\\s\\S]*?-->')`
`Regex`	`javaStyleComment` = `C++ style comment`
`LineEnd`	`lineEnd` = `lineEnd`
`LineStart`	`lineStart` = `lineStart`
`str`	`nums` = `'0123456789'`
`_Constants`	`opAssoc` = `<parserFwk.pyparsing._Constants object at 0x00...`
`str`	`printables` = `'0123456789abcdefghijklmnopqrstuvwxyzABCDEF...`
`unicode`	`punc8bit` = `u'\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xa...`
`Regex`	`pythonStyleComment` = `Python style comment`
`Regex`	`quotedString` = `quotedString using single or double quote...`
`Regex`	`restOfLine` = `Re:('.*')`
`Regex`	`sglQuotedString` = `string enclosed in single quotes`
`StringEnd`	`stringEnd` = `stringEnd`
`StringStart`	`stringStart` = `stringStart`

Function Details

col(loc, strg)

Returns current column within a string, counting newlines as line separators. The first column is number 1.

countedArray(expr)

Helper to define a counted list of expressions. This helper defines a pattern of the form:

   integer expr expr expr...

where the leading integer tells how many expr expressions follow. The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.

delimitedList(expr, delim=',', combine=False)

Helper to define a delimited list of expressions - the delimiter defaults to ','. By default, the list elements and delimiters can have intervening whitespace, and comments, but this can be overridden by passing 'combine=True' in the constructor. If combine is set to True, the matching tokens are returned as a single token string, with the delimiters included; otherwise, the matching tokens are returned as a list of tokens, with the delimiters suppressed.

dictOf(key, value)

Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes care of defining the Dict, ZeroOrMore, and Group tokens in the proper order. The key pattern can include delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The value pattern can include named results, so that the Dict results can include named token fields.

downcaseTokens(s, l, t)

Helper parse action to convert tokens to lower case.

line(loc, strg)

Returns the line of text containing loc within a string, counting newlines as line separators.

lineno(loc, strg)

Returns current line number within a string, counting newlines as line separators. The first line is number 1.

makeHTMLTags(tagStr)

Helper to construct opening and closing tag expressions for HTML, given a tag name

makeXMLTags(tagStr)

Helper to construct opening and closing tag expressions for XML, given a tag name

matchPreviousExpr(expr)

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression. For example:

   first = Word(nums)
   second = matchPreviousExpr(first)
   matchExpr = first + ":" + second

will match "1:1", but not "1:2". Because this matches by expressions, will *not* match the leading "1:1" in "1:10"; the expressions are evaluated first, and then compared, so "1" is compared with "10". Do *not* use with packrat parsing enabled.

matchPreviousLiteral(expr)

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression. For example:

   first = Word(nums)
   second = matchPreviousLiteral(first)
   matchExpr = first + ":" + second

will match "1:1", but not "1:2". Because this matches a previous literal, will also match the leading "1:1" in "1:10". If this is not desired, use matchPreviousExpr. Do *not* use with packrat parsing enabled.

nullDebugAction(*args)

'Do-nothing' debug action, to suppress debugging output during parsing.

oneOf(strs, caseless=False, useRegex=True)

Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.

Parameters:

strs - a string of space-delimited literals, or a list of string literals
caseless - (default=False) - treat all literals as caseless
useRegex - (default=True) - as an optimization, will generate a Regex object; otherwise, will generate a MatchFirst object (if caseless=True, or if creating a Regex raises an exception)

operatorPrecedence(baseExpr, opList)

Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy. Operators may be unary or binary, left- or right-associative. Parse actions can also be attached to operator expressions.

Parameters:

baseExpr - expression representing the most basic element for the nested
opList - list of tuples, one for each operator precedence level in the expression grammar; each tuple is of the form (opExpr, numTerms, rightLeftAssoc, parseAction), where:
- opExpr is the pyparsing expression for the operator; may also be a string, which will be converted to a Literal
- numTerms is the number of terms for this operator (must be 1 or 2)
- rightLeftAssoc is the indicator whether the operator is right or left associative, using the pyparsing-defined constants opAssoc.RIGHT and opAssoc.LEFT.
- parseAction is the parse action to be associated with expressions matching this operator expression (the parse action tuple member may be omitted)

removeQuotes(s, l, t)

Helper parse action for removing quotation marks from parsed quoted strings. To use, add this parse action to quoted string using:

 quotedString.setParseAction( removeQuotes )

replaceWith(replStr)

Helper method for common parse actions that simply return a literal value. Especially useful when used with transformString().

srange(s)

Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp '[]' string range definitions:

  srange("[0-9]")   -> "0123456789"
  srange("[a-z]")   -> "abcdefghijklmnopqrstuvwxyz"
  srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"

The input string must be enclosed in []'s, and the returned string is the expanded character set joined into a single string. The values enclosed in the []'s may be:

  a single character
  an escaped character with a leading backslash (such as \- or \])
  an escaped hex character with a leading '\0x' (\0x21, which is a '!' character)
  an escaped octal character with a leading '\0' (\041, which is a '!' character)
  a range of any of the above, separated by a dash ('a-z', etc.)
  any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)

traceParseAction(f)

Decorator for debugging parse actions.

upcaseTokens(s, l, t)

Helper parse action to convert tokens to upper case.

Variable Details

author

Type:: str
Value:: 'Paul McGuire <ptmcg@users.sourceforge.net>'

version

Type:: str
Value:: '1.4.5'

versionTime

Type:: str
Value:: '16 December 2006 07:20'

alphanums

Type:: str
Value:: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

alphas

Type:: str
Value:: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

alphas8bit

Type:: unicode
Value:: u'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\ \xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\x\ e3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\ \xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

anyCloseTag

Type:: Combine
Value:: </W:(abcd...,abcd...)>

anyOpenTag

Type:: And
Value:: <W:(abcd...,abcd...)>

commaSeparatedList

Type:: And
Value:: commaSeparatedList

commonHTMLEntity

Type:: Combine
Value:: Combine:({{"&" Re:('gt|lt|amp|nbsp|quot')} ";"})

cppStyleComment

Type:: Regex
Value:: C++ style comment

cStyleComment

Type:: Regex
Value:: C style comment

dblQuotedString

Type:: Regex
Value:: string enclosed in double quotes

dblSlashComment

Type:: Regex
Value:: // comment

empty

Type:: Empty
Value:: empty

hexnums

Type:: str
Value:: '0123456789ABCDEFabcdef'

htmlComment

Type:: Regex
Value:: Re:('')

javaStyleComment

Type:: Regex
Value:: C++ style comment

lineEnd

Type:: LineEnd
Value:: lineEnd

lineStart

Type:: LineStart
Value:: lineStart

nums

Type:: str
Value:: '0123456789'

opAssoc

Type:: _Constants
Value:: <parserFwk.pyparsing._Constants object at 0x00BC6C30>

printables

Type:: str
Value:: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\\ '()*+,-./:;<=>?@[\\]^_`{|}~'

punc8bit

Type:: unicode
Value:: u'\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\ \xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xd7\xf7'

pythonStyleComment

Type:: Regex
Value:: Python style comment

quotedString

Type:: Regex
Value:: quotedString using single or double quotes

restOfLine

Type:: Regex
Value:: Re:('.*')

sglQuotedString

Type:: Regex
Value:: string enclosed in single quotes

stringEnd

Type:: StringEnd
Value:: stringEnd

stringStart

Type:: StringStart
Value:: stringStart

Generated by Epydoc 2.1 on Fri Dec 22 02:04:35 2006

http://epydoc.sf.net