========== Change Log ========== Version 1.4.5 - December, 2006 ------------------------------ - Removed debugging print statement from QuotedString class. Sorry for not stripping this out before the 1.4.4 release! - A significant performance improvement, the first one in a while! For my Verilog parser, this version of pyparsing is about double the speed - YMMV. - Added support for pickling of ParseResults objects. (Reported by Jeff Poole, thanks Jeff!) - Fixed minor bug in makeHTMLTags that did not recognize tag attributes with embedded '-' or '_' characters. Also, added support for passing expressions to makeHTMLTags and makeXMLTags, and used this feature to define the globals anyOpenTag and anyCloseTag. - Fixed error in alphas8bit, I had omitted the y-with-umlaut character. - Added punc8bit string to complement alphas8bit - it contains all the non-alphabetic, non-blank 8-bit characters. - Added commonHTMLEntity expression, to match common HTML "ampersand" codes, such as "<", ">", "&", " ", and """. This expression also defines a results name 'entity', which can be used to extract the entity field (that is, "lt", "gt", etc.). Also added built-in parse action replaceHTMLEntity, which can be attached to commonHTMLEntity to translate "<", ">", "&", " ", and """ to "<", ">", "&", " ", and "'". - Added example, htmlStripper.py, that strips HTML tags and scripts from HTML pages. It also translates common HTML entities to their respective characters. Version 1.4.4 - October, 2006 ------------------------------- - Fixed traceParseAction decorator to also trap and record exception returns from parse actions, and to handle parse actions with 0, 1, 2, or 3 arguments. - Enhanced parse action normalization to support using classes as parse actions; that is, the class constructor is called at parse time and the __init__ function is called with 0, 1, 2, or 3 arguments. If passing a class as a parse action, the __init__ method must use one of the valid parse action parameter list formats. (This technique is useful when using pyparsing to compile parsed text into a series of application objects - see the new example simpleBool.py.) - Fixed bug in ParseResults when setting an item using an integer index. (Reported by Christopher Lambacher, thanks!) - Fixed whitespace-skipping bug, patch submitted by Paolo Losi - grazie, Paolo! - Fixed bug when a Combine contained an embedded Forward expression, reported by cie on the pyparsing wiki - good catch! - Fixed listAllMatches bug, when a listAllMatches result was nested within another result. (Reported by don pasquale on comp.lang.python, well done!) - Fixed bug in ParseResults items() method, when returning an item marked as listAllMatches=True - Fixed bug in definition of cppStyleComment (and javaStyleComment) in which '//' line comments were not continued to the next line if the line ends with a '\'. (Reported by eagle-eyed Ralph Corderoy!) - Optimized re's for cppStyleComment and quotedString for better re performance - also provided by Ralph Corderoy, thanks! - Added new example, indentedGrammarExample.py, showing how to define a grammar using indentation to show grouping (as Python does for defining statement nesting). Instigated by an e-mail discussion with Andrew Dalke, thanks Andrew! - Added new helper operatorPrecedence (based on e-mail list discussion with Ralph Corderoy and Paolo Losi), to facilitate definition of grammars for expressions with unary and binary operators. For instance, this grammar defines a 6-function arithmetic expression grammar, with unary plus and minus, proper operator precedence,and right- and left-associativity: expr = operatorGrammar( operand, [("!", 1, opAssoc.LEFT), ("^", 2, opAssoc.RIGHT), (oneOf("+ -"), 1, opAssoc.RIGHT), (oneOf("* /"), 2, opAssoc.LEFT), (oneOf("+ -"), 2, opAssoc.LEFT),] ) Also added example simpleArith.py and simpleBool.py to provide more detailed code samples using this new helper method. - Added new helpers matchPreviousLiteral and matchPreviousExpr, for creating adaptive parsing expressions that match the same content as was parsed in a previous parse expression. For instance: first = Word(nums) matchExpr = first + ":" + matchPreviousLiteral(first) will match "1:1", but not "1:2". Since this matches at the literal level, this will also match the leading "1:1" in "1:10". In contrast: first = Word(nums) matchExpr = first + ":" + matchPreviousExpr(first) will *not* match the leading "1:1" in "1:10"; the expressions are evaluated first, and then compared, so "1" is compared with "10". - Added keepOriginalText parse action. Sometimes pyparsing's whitespace-skipping leaves out too much whitespace. Adding this parse action will restore any internal whitespace for a parse expression. This is especially useful when defining expressions for scanString or transformString applications. - Added __add__ method for ParseResults class, to better support using Python sum built-in for summing ParseResults objects returned from scanString. - Added reset method for the new OnlyOnce class wrapper for parse actions (to allow a grammar to be used multiple times). - Added optional maxMatches argument to scanString and searchString, to short-circuit scanning after 'n' expression matches are found. Version 1.4.3 - July, 2006 ------------------------------ - Fixed implementation of multiple parse actions for an expression (added in 1.4.2). . setParseAction() reverts to its previous behavior, setting one (or more) actions for an expression, overwriting any action or actions previously defined . new method addParseAction() appends one or more parse actions to the list of parse actions attached to an expression Now it is harder to accidentally append parse actions to an expression, when what you wanted to do was overwrite whatever had been defined before. (Thanks, Jean-Paul Calderone!) - Simplified interface to parse actions that do not require all 3 parse action arguments. Very rarely do parse actions require more than just the parsed tokens, yet parse actions still require all 3 arguments including the string being parsed and the location within the string where the parse expression was matched. With this release, parse actions may now be defined to be called as: . fn(string,locn,tokens) (the current form) . fn(locn,tokens) . fn(tokens) . fn() The setParseAction and addParseAction methods will internally decorate the provided parse actions with compatible wrappers to conform to the full (string,locn,tokens) argument sequence. - REMOVED SUPPORT FOR RETURNING PARSE LOCATION FROM A PARSE ACTION. I announced this in March, 2004, and gave a final warning in the last release. Now you can return a tuple from a parse action, and it will be treated like any other return value (i.e., the tuple will be substituted for the incoming tokens passed to the parse action, which is useful when trying to parse strings into tuples). - Added setFailAction method, taking a callable function fn that takes the arguments fn(s,loc,expr,err) where: . s - string being parsed . loc - location where expression match was attempted and failed . expr - the parse expression that failed . err - the exception thrown The function returns no values. It may throw ParseFatalException if it is desired to stop parsing immediately. (Suggested by peter21081944 on wikispaces.com) - Added class OnlyOnce as helper wrapper for parse actions. OnlyOnce only permits a parse action to be called one time, after which all subsequent calls throw a ParseException. - Added traceParseAction decorator to help debug parse actions. Simply insert "@traceParseAction" ahead of the definition of your parse action, and each invocation will be displayed, along with incoming arguments, and returned value. - Fixed bug when copying ParserElements using copy() or setResultsName(). (Reported by Dan Thill, great catch!) - Fixed bug in asXML() where token text contains <, >, and & characters - generated XML now escapes these as <, > and &. (Reported by Jacek Sieka, thanks!) - Fixed bug in SkipTo() when searching for a StringEnd(). (Reported by Pete McEvoy, thanks Pete!) - Fixed "except Exception" statements, the most critical added as part of the packrat parsing enhancement. (Thanks, Erick Tryzelaar!) - Fixed end-of-string infinite looping on LineEnd and StringEnd expressions. (Thanks again to Erick Tryzelaar.) - Modified setWhitespaceChars to return self, to be consistent with other ParserElement modifiers. (Suggested by Erick Tryzelaar.) - Fixed bug/typo in new ParseResults.dump() method. - Fixed bug in searchString() method, in which only the first token of an expression was returned. searchString() now returns a ParseResults collection of all search matches. - Added example program removeLineBreaks.py, a string transformer that converts text files with hard line-breaks into one with line breaks only between paragraphs. - Added example program listAllMatches.py, to illustrate using the listAllMatches option when specifying results names (also shows new support for passing lists to oneOf). - Added example program linenoExample.py, to illustrate using the helper methods lineno, line, and col, and returning objects from a parse action. - Added example program parseListString.py, to which can parse the string representation of a Python list back into a true list. Taken mostly from my PyCon presentation examples, but now with support for tuple elements, too! Version 1.4.2 - April 1, 2006 (No foolin'!) ------------------------------------------- - Significant speedup from memoizing nested expressions (a technique known as "packrat parsing"), thanks to Chris Lesniewski-Laas! Your mileage may vary, but my Verilog parser almost doubled in speed to over 600 lines/sec! This speedup may break existing programs that use parse actions that have side-effects. For this reason, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature, your program must call the class method ParserElement.enablePackrat(). If your program uses psyco to "compile as you go", you must call enablePackrat before calling psyco.full(). If you do not do this, Python will crash. For best results, call enablePackrat() immediately after importing pyparsing. - Added new helper method countedArray(expr), for defining patterns that start with a leading integer to indicate the number of array elements, followed by that many elements, matching the given expr parse expression. For instance, this two-liner: wordArray = countedArray(Word(alphas)) print wordArray.parseString("3 Practicality beats purity")[0] returns the parsed array of words: ['Practicality', 'beats', 'purity'] The leading token '3' is suppressed, although it is easily obtained from the length of the returned array. (Inspired by e-mail discussion with Ralf Vosseler.) - Added support for attaching multiple parse actions to a single ParserElement. (Suggested by Dan "Dang" Griffith - nice idea, Dan!) - Added support for asymmetric quoting characters in the recently-added QuotedString class. Now you can define your own quoted string syntax like "<>". To define this custom form of QuotedString, your code would define: dblAngleQuotedString = QuotedString('<<',endQuoteChar='>>') QuotedString also supports escaped quotes, escape character other than '\', and multiline. - Changed the default value returned internally by Optional, so that None can be used as a default value. (Suggested by Steven Bethard - I finally saw the light!) - Added dump() method to ParseResults, to make it easier to list out and diagnose values returned from calling parseString. - A new example, a search query string parser, submitted by Steven Mooij and Rudolph Froger - a very interesting application, thanks! - Added an example that parses the BNF in Python's Grammar file, in support of generating Python grammar documentation. (Suggested by J H Stovall.) - A new example, submitted by Tim Cera, of a flexible parser module, using a simple config variable to adjust parsing for input formats that have slight variations - thanks, Tim! - Added an example for parsing Roman numerals, showing the capability of parse actions to "compile" Roman numerals into their integer values during parsing. - Added a new docs directory, for additional documentation or help. Currently, this includes the text and examples from my recent presentation at PyCon. - Fixed another typo in CaselessKeyword, thanks Stefan Behnel. - Expanded oneOf to also accept tuples, not just lists. This really should be sufficient... - Added deprecation warnings when tuple is returned from a parse action. Looking back, I see that I originally deprecated this feature in March, 2004, so I'm guessing people really shouldn't have been using this feature - I'll drop it altogether in the next release, which will allow users to return a tuple from a parse action (which is really handy when trying to reconstuct tuples from a tuple string representation!). Version 1.4.1 - February, 2006 ------------------------------ - Converted generator expression in QuotedString class to list comprehension, to retain compatibility with Python 2.3. (Thanks, Titus Brown for the heads-up!) - Added searchString() method to ParserElement, as an alternative to using "scanString(instring).next()[0][0]" to search through a string looking for a substring matching a given parse expression. (Inspired by e-mail conversation with Dave Feustel.) - Modified oneOf to accept lists of strings as well as a single string of space-delimited literals. (Suggested by Jacek Sieka - thanks!) - Removed deprecated use of Upcase in pyparsing test code. (Also caught by Titus Brown.) - Removed lstrip() call from Literal - too aggressive in stripping whitespace which may be valid for some grammars. (Point raised by Jacek Sieka). Also, made Literal more robust in the event of passing an empty string. - Fixed bug in replaceWith when returning None. - Added cautionary documentation for Forward class when assigning a MatchFirst expression, as in: fwdExpr << a | b | c Precedence of operators causes this to be evaluated as: (fwdExpr << a) | b | c thereby leaving b and c out as parseable alternatives. Users must explicitly group the values inserted into the Forward: fwdExpr << (a | b | c) (Suggested by Scot Wilcoxon - thanks, Scot!) Version 1.4 - January 18, 2006 ------------------------------ - Added Regex class, to permit definition of complex embedded expressions using regular expressions. (Enhancement provided by John Beisley, great job!) - Converted implementations of Word, oneOf, quoted string, and comment helpers to utilize regular expression matching. Performance improvements in the 20-40% range. - Added QuotedString class, to support definition of non-standard quoted strings (Suggested by Guillaume Proulx, thanks!) - Added CaselessKeyword class, to streamline grammars with, well, caseless keywords (Proposed by Stefan Behnel, thanks!) - Fixed bug in SkipTo, when using an ignoreable expression. (Patch provided by Anonymous, thanks, whoever-you-are!) - Fixed typo in NoMatch class. (Good catch, Stefan Behnel!) - Fixed minor bug in _makeTags(), using string.printables instead of pyparsing.printables. - Cleaned up some of the expressions created by makeXXXTags helpers, to suppress extraneous <> characters. - Added some grammar definition-time checking to verify that a grammar is being built using proper ParserElements. - Added examples: . LAparser.py - linear algebra C preprocessor (submitted by Mike Ellis, thanks Mike!) . wordsToNum.py - converts word description of a number back to the original number (such as 'one hundred and twenty three' -> 123) . updated fourFn.py to support unary minus, added BNF comments Version 1.3.3 - September 12, 2005 ---------------------------------- - Improved support for Unicode strings that would be returned using srange. Added greetingInKorean.py example, for a Korean version of "Hello, World!" using Unicode. (Thanks, June Kim!) - Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining hexadecimal value expressions. - NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE=== Modified tag and results definitions returned by makeHTMLTags(), to better support the looseness of HTML parsing. Tags to be parsed are now caseless, and keys generated for tag attributes are now converted to lower case. Formerly, makeXMLTags("XYZ") would return a tag with results name of "startXYZ", this has been changed to "startXyz". If this tag is matched against '', the matched keys formerly would be "Abc", "DEF", and "ghi"; keys are now converted to lower case, giving keys of "abc", "def", and "ghi". These changes were made to try to address the lax case sensitivity agreement between start and end tags in many HTML pages. No changes were made to makeXMLTags(), which assumes more rigorous parsing rules. Also, cleaned up case-sensitivity bugs in closing tags, and switched to using Keyword instead of Literal class for tags. (Thanks, Steve Young, for getting me to look at these in more detail!) - Added two helper parse actions, upcaseTokens and downcaseTokens, which will convert matched text to all uppercase or lowercase, respectively. - Deprecated Upcase class, to be replaced by upcaseTokens parse action. - Converted messages sent to stderr to use warnings module, such as when constructing a Literal with an empty string, one should use the Empty() class or the empty helper instead. - Added ' ' (space) as an escapable character within a quoted string. - Added helper expressions for common comment types, in addition to the existing cStyleComment (/*...*/) and htmlStyleComment () . dblSlashComment = // ... (to end of line) . cppStyleComment = cStyleComment or dblSlashComment . javaStyleComment = cppStyleComment . pythonStyleComment = # ... (to end of line) Version 1.3.2 - July 24, 2005 ----------------------------- - Added Each class as an enhanced version of And. 'Each' requires that all given expressions be present, but may occur in any order. Special handling is provided to group ZeroOrMore and OneOrMore elements that occur out-of-order in the input string. You can also construct 'Each' objects by joining expressions with the '&' operator. When using the Each class, results names are strongly recommended for accessing the matched tokens. (Suggested by Pradam Amini - thanks, Pradam!) - Stricter interpretation of 'max' qualifier on Word elements. If the 'max' attribute is specified, matching will fail if an input field contains more than 'max' consecutive body characters. For example, previously, Word(nums,max=3) would match the first three characters of '0123456', returning '012' and continuing parsing at '3'. Now, when constructed using the max attribute, Word will raise an exception with this string. - Cleaner handling of nested dictionaries returned by Dict. No longer necessary to dereference sub-dictionaries as element [0] of their parents. === NOTE: THIS CHANGE MAY BREAK SOME EXISTING CODE, BUT ONLY IF PARSING NESTED DICTIONARIES USING THE LITTLE-USED DICT CLASS === (Prompted by discussion thread on the Python Tutor list, with contributions from Danny Yoo, Kent Johnson, and original post by Liam Clarke - thanks all!) Version 1.3.1 - June, 2005 ---------------------------------- - Added markInputline() method to ParseException, to display the input text line location of the parsing exception. (Thanks, Stefan Behnel!) - Added setDefaultKeywordChars(), so that Keyword definitions using a custom keyword character set do not all need to add the keywordChars constructor argument (similar to setDefaultWhitespaceChars()). (suggested by rzhanka on the SourceForge pyparsing forum.) - Simplified passing debug actions to setDebugAction(). You can now pass 'None' for a debug action if you want to take the default debug behavior. To suppress a particular debug action, you can pass the pyparsing method nullDebugAction. - Refactored parse exception classes, moved all behavior to ParseBaseException, and the former ParseException is now a subclass of ParseBaseException. Added a second subclass, ParseFatalException, as a subclass of ParseBaseException. User-defined parse actions can raise ParseFatalException if a data inconsistency is detected (such as a begin-tag/end-tag mismatch), and this will stop all parsing immediately. (Inspired by e-mail thread with Michele Petrazzo - thanks, Michelle!) - Added helper methods makeXMLTags and makeHTMLTags, that simplify the definition of XML or HTML tag parse expressions for a given tagname. Both functions return a pair of parse expressions, one for the opening tag (that is, '') and one for the closing tag (''). The opening tagame also recognizes any attribute definitions that have been included in the opening tag, as well as an empty tag (one with a trailing '/', as in '' which is equivalent to ''). makeXMLTags uses stricter XML syntax for attributes, requiring that they be enclosed in double quote characters - makeHTMLTags is more lenient, and accepts single-quoted strings or any contiguous string of characters up to the next whitespace character or '>' character. Attributes can be retrieved as dictionary or attribute values of the returned results from the opening tag. - Added example minimath2.py, a refinement on fourFn.py that adds an interactive session and support for variables. (Thanks, Steven Siew!) - Added performance improvement, up to 20% reduction! (Found while working with Wolfgang Borgert on performance tuning of his TTCN3 parser.) - And another performance improvement, up to 25%, when using scanString! (Found while working with Henrik Westlund on his C header file scanner.) - Updated UML diagrams to reflect latest class/method changes. Version 1.3 - March, 2005 ---------------------------------- - Added new Keyword class, as a special form of Literal. Keywords must be followed by whitespace or other non-keyword characters, to distinguish them from variables or other identifiers that just happen to start with the same characters as a keyword. For instance, the input string containing "ifOnlyIfOnly" will match a Literal("if") at the beginning and in the middle, but will fail to match a Keyword("if"). Keyword("if") will match only strings such as "if only" or "if(only)". (Proposed by Wolfgang Borgert, and Berteun Damman separately requested this on comp.lang.python - great idea!) - Added setWhitespaceChars() method to override the characters to be skipped as whitespace before matching a particular ParseElement. Also added the class-level method setDefaultWhitespaceChars(), to allow users to override the default set of whitespace characters (space, tab, newline, and return) for all subsequently defined ParseElements. (Inspired by Klaas Hofstra's inquiry on the Sourceforge pyparsing forum.) - Added helper parse actions to support some very common parse action use cases: . replaceWith(replStr) - replaces the matching tokens with the provided replStr replacement string; especially useful with transformString() . removeQuotes - removes first and last character from string enclosed in quotes (note - NOT the same as the string strip() method, as only a single character is removed at each end) - Added copy() method to ParseElement, to make it easier to define different parse actions for the same basic parse expression. (Note, copy is implicitly called when using setResultsName().) (The following changes were posted to CVS as Version 1.2.3 - October-December, 2004) - Added support for Unicode strings in creating grammar definitions. (Big thanks to Gavin Panella!) - Added constant alphas8bit to include the following 8-bit characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ - Added srange() function to simplify definition of Word elements, using regexp-like '[A-Za-z0-9]' syntax. This also simplifies referencing common 8-bit characters. - Fixed bug in Dict when a single element Dict was embedded within another Dict. (Thanks Andy Yates for catching this one!) - Added 'formatted' argument to ParseResults.asXML(). If set to False, suppresses insertion of whitespace for pretty-print formatting. Default equals True for backward compatibility. - Added setDebugActions() function to ParserElement, to allow user-defined debugging actions. - Added support for escaped quotes (either in \', \", or doubled quote form) to the predefined expressions for quoted strings. (Thanks, Ero Carrera!) - Minor performance improvement (~5%) converting "char in string" tests to "char in dict". (Suggested by Gavin Panella, cool idea!) Version 1.2.2 - September 27, 2004 ---------------------------------- - Modified delimitedList to accept an expression as the delimiter, instead of only accepting strings. - Modified ParseResults, to convert integer field keys to strings (to avoid confusion with list access). - Modified Combine, to convert all embedded tokens to strings before combining. - Fixed bug in MatchFirst in which parse actions would be called for expressions that only partially match. (Thanks, John Hunter!) - Fixed bug in fourFn.py example that fixes right-associativity of ^ operator. (Thanks, Andrea Griffini!) - Added class FollowedBy(expression), to look ahead in the input string without consuming tokens. - Added class NoMatch that never matches any input. Can be useful in debugging, and in very specialized grammars. - Added example pgn.py, for parsing chess game files stored in Portable Game Notation. (Thanks, Alberto Santini!) Version 1.2.1 - August 19, 2004 ------------------------------- - Added SkipTo(expression) token type, simplifying grammars that only want to specify delimiting expressions, and want to match any characters between them. - Added helper method dictOf(key,value), making it easier to work with the Dict class. (Inspired by Pavel Volkovitskiy, thanks!). - Added optional argument listAllMatches (default=False) to setResultsName(). Setting listAllMatches to True overrides the default modal setting of tokens to results names; instead, the results name acts as an accumulator for all matching tokens within the local repetition group. (Suggested by Amaury Le Leyzour - thanks!) - Fixed bug in ParseResults, throwing exception when trying to extract slice, or make a copy using [:]. (Thanks, Wilson Fowlie!) - Fixed bug in transformString() when the input string contains 's (Thanks, Rick Walia!). - Fixed bug in returning tokens from un-Grouped And's, Or's and MatchFirst's, where too many tokens would be included in the results, confounding parse actions and returned results. - Fixed bug in naming ParseResults returned by And's, Or's, and Match First's. - Fixed bug in LineEnd() - matching this token now correctly consumes and returns the end of line "\n". - Added a beautiful example for parsing Mozilla calendar files (Thanks, Petri Savolainen!). - Added support for dynamically modifying Forward expressions during parsing. Version 1.2 - 20 June 2004 -------------------------- - Added definition for htmlComment to help support HTML scanning and parsing. - Fixed bug in generating XML for Dict classes, in which trailing item was duplicated in the output XML. - Fixed release bug in which scanExamples.py was omitted from release files. - Fixed bug in transformString() when parse actions are not defined on the outermost parser element. - Added example urlExtractor.py, as another example of using scanString and parse actions. Version 1.2beta3 - 4 June 2004 ------------------------------ - Added White() token type, analogous to Word, to match on whitespace characters. Use White in parsers with significant whitespace (such as configuration file parsers that use indentation to indicate grouping). Construct White with a string containing the whitespace characters to be matched. Similar to Word, White also takes optional min, max, and exact parameters. - As part of supporting whitespace-signficant parsing, added parseWithTabs() method to ParserElement, to override the default behavior in parseString of automatically expanding tabs to spaces. To retain tabs during parsing, call parseWithTabs() before calling parseString(), parseFile() or scanString(). (Thanks, Jean-Guillaume Paradis for catching this, and for your suggestions on whitespace-significant parsing.) - Added transformString() method to ParseElement, as a complement to scanString(). To use transformString, define a grammar and attach a parse action to the overall grammar that modifies the returned token list. Invoking transformString() on a target string will then scan for matches, and replace the matched text patterns according to the logic in the parse action. transformString() returns the resulting transformed string. (Note: transformString() does *not* automatically expand tabs to spaces.) Also added scanExamples.py to the examples directory to show sample uses of scanString() and transformString(). - Removed group() method that was introduced in beta2. This turns out NOT to be equivalent to nesting within a Group() object, and I'd prefer not to sow more seeds of confusion. - Fixed behavior of asXML() where tags for groups were incorrectly duplicated. (Thanks, Brad Clements!) - Changed beta version message to display to stderr instead of stdout, to make asXML() easier to use. (Thanks again, Brad.) Version 1.2beta2 - 19 May 2004 ------------------------------ - *** SIMPLIFIED API *** - Parse actions that do not modify the list of tokens no longer need to return a value. This simplifies those parse actions that use the list of tokens to update a counter or record or display some of the token content; these parse actions can simply end without having to specify 'return toks'. - *** POSSIBLE API INCOMPATIBILITY *** - Fixed CaselessLiteral bug, where the returned token text was not the original string (as stated in the docs), but the original string converted to upper case. (Thanks, Dang Griffith!) **NOTE: this may break some code that relied on this erroneous behavior. Users should scan their code for uses of CaselessLiteral.** - *** POSSIBLE CODE INCOMPATIBILITY *** - I have renamed the internal attributes on ParseResults from 'dict' and 'list' to '__tokdict' and '__toklist', to avoid collisions with user-defined data fields named 'dict' and 'list'. Any client code that accesses these attributes directly will need to be modified. Hopefully the implementation of methods such as keys(), items(), len(), etc. on ParseResults will make such direct attribute accessess unnecessary. - Added asXML() method to ParseResults. This greatly simplifies the process of parsing an input data file and generating XML-structured data. - Added getName() method to ParseResults. This method is helpful when a grammar specifies ZeroOrMore or OneOrMore of a MatchFirst or Or expression, and the parsing code needs to know which expression matched. (Thanks, Eric van der Vlist, for this idea!) - Added items() and values() methods to ParseResults, to better support using ParseResults as a Dictionary. - Added parseFile() as a convenience function to parse the contents of an entire text file. Accepts either a file name or a file object. (Thanks again, Dang!) - Added group() method to And, Or, and MatchFirst, as a short-cut alternative to enclosing a construct inside a Group object. - Extended fourFn.py to support exponentiation, and simple built-in functions. - Added EBNF parser to examples, including a demo where it parses its own EBNF! (Thanks to Seo Sanghyeon!) - Added Delphi Form parser to examples, dfmparse.py, plus a couple of sample Delphi forms as tests. (Well done, Dang!) - Another performance speedup, 5-10%, inspired by Dang! Plus about a 20% speedup, by pre-constructing and cacheing exception objects instead of constructing them on the fly. - Fixed minor bug when specifying oneOf() with 'caseless=True'. - Cleaned up and added a few more docstrings, to improve the generated docs. Version 1.1.2 - 21 Mar 2004 --------------------------- - Fixed minor bug in scanString(), so that start location is at the start of the matched tokens, not at the start of the whitespace before the matched tokens. - Inclusion of HTML documentation, generated using Epydoc. Reformatted some doc strings to better generate readable docs. (Beautiful work, Ed Loper, thanks for Epydoc!) - Minor performance speedup, 5-15% - And on a process note, I've used the unittest module to define a series of unit tests, to help avoid the embarrassment of the version 1.1 snafu. Version 1.1.1 - 6 Mar 2004 -------------------------- - Fixed critical bug introduced in 1.1, which broke MatchFirst(!) token matching. **THANK YOU, SEO SANGHYEON!!!** - Added "from future import __generators__" to permit running under pre-Python 2.3. - Added example getNTPservers.py, showing how to use pyparsing to extract a text pattern from the HTML of a web page. Version 1.1 - 3 Mar 2004 ------------------------- - ***Changed API*** - While testing out parse actions, I found that the value of loc passed in was not the starting location of the matched tokens, but the location of the next token in the list. With this version, the location passed to the parse action is now the starting location of the tokens that matched. A second part of this change is that the return value of parse actions no longer needs to return a tuple containing both the location and the parsed tokens (which may optionally be modified); parse actions only need to return the list of tokens. Parse actions that return a tuple are deprecated; they will still work properly for conversion/compatibility, but this behavior will be removed in a future version. - Added validate() method, to help diagnose infinite recursion in a grammar tree. validate() is not 100% fool-proof, but it can help track down nasty infinite looping due to recursively referencing the same grammar construct without some intervening characters. - Cleaned up default listing of some parse element types, to more closely match ordinary BNF. Instead of the form :[contents-list], some changes are: . And(token1,token2,token3) is "{ token1 token2 token3 }" . Or(token1,token2,token3) is "{ token1 ^ token2 ^ token3 }" . MatchFirst(token1,token2,token3) is "{ token1 | token2 | token3 }" . Optional(token) is "[ token ]" . OneOrMore(token) is "{ token }..." . ZeroOrMore(token) is "[ token ]..." - Fixed an infinite loop in oneOf if the input string contains a duplicated option. (Thanks Brad Clements) - Fixed a bug when specifying a results name on an Optional token. (Thanks again, Brad Clements) - Fixed a bug introduced in 1.0.6 when I converted quotedString to use CharsNotIn; I accidentally permitted quoted strings to span newlines. I have fixed this in this version to go back to the original behavior, in which quoted strings do *not* span newlines. - Fixed minor bug in HTTP server log parser. (Thanks Jim Richardson) Version 1.0.6 - 13 Feb 2004 ---------------------------- - Added CharsNotIn class (Thanks, Lee SangYeong). This is the opposite of Word, in that it is constructed with a set of characters *not* to be matched. (This enhancement also allowed me to clean up and simplify some of the definitions for quoted strings, cStyleComment, and restOfLine.) - **MINOR API CHANGE** - Added joinString argument to the __init__ method of Combine (Thanks, Thomas Kalka). joinString defaults to "", but some applications might choose some other string to use instead, such as a blank or newline. joinString was inserted as the second argument to __init__, so if you have code that specifies an adjacent value, without using 'adjacent=', this code will break. - Modified LineStart to recognize the start of an empty line. - Added optional caseless flag to oneOf(), to create a list of CaselessLiteral tokens instead of Literal tokens. - Added some enhancements to the SQL example: . Oracle-style comments (Thanks to Harald Armin Massa) . simple WHERE clause - Minor performance speedup - 5-15% Version 1.0.5 - 19 Jan 2004 ---------------------------- - Added scanString() generator method to ParseElement, to support regex-like pattern-searching - Added items() list to ParseResults, to return named results as a list of (key,value) pairs - Fixed memory overflow in asList() for deeply nested ParseResults (Thanks, Sverrir Valgeirsson) - Minor performance speedup - 10-15% Version 1.0.4 - 8 Jan 2004 --------------------------- - Added positional tokens StringStart, StringEnd, LineStart, and LineEnd - Added commaSeparatedList to pre-defined global token definitions; also added commasep.py to the examples directory, to demonstrate the differences between parsing comma-separated data and simple line-splitting at commas - Minor API change: delimitedList does not automatically enclose the list elements in a Group, but makes this the responsibility of the caller; also, if invoked using 'combine=True', the list delimiters are also included in the returned text (good for scoped variables, such as a.b.c or a::b::c, or for directory paths such as a/b/c) - Performance speed-up again, 30-40% - Added httpServerLogParser.py to examples directory, as this is a common parsing task Version 1.0.3 - 23 Dec 2003 --------------------------- - Performance speed-up again, 20-40% - Added Python distutils installation setup.py, etc. (thanks, Dave Kuhlman) Version 1.0.2 - 18 Dec 2003 --------------------------- - **NOTE: Changed API again!!!** (for the last time, I hope) + Renamed module from parsing to pyparsing, to better reflect Python linkage. - Also added dictExample.py to examples directory, to illustrate usage of the Dict class. Version 1.0.1 - 17 Dec 2003 --------------------------- - **NOTE: Changed API!** + Renamed 'len' argument on Word.__init__() to 'exact' - Performance speed-up, 10-30% Version 1.0.0 - 15 Dec 2003 --------------------------- - Initial public release Version 0.1.1 thru 0.1.17 - October-November, 2003 -------------------------------------------------- - initial development iterations: - added Dict, Group - added helper methods oneOf, delimitedList - added helpers quotedString (and double and single), restOfLine, cStyleComment - added MatchFirst as an alternative to the slower Or - added UML class diagram - fixed various logic bugs