- Changes for 0.9 INCOMPATIBLE CHANGE: ParseRecord and HeaderFooter now take an attrs dictionary, for use with startElement. (This should have been done last summer with the changes for Group to take an attr.) There was no easy and elegant way to make a backwards compatible solution, but I did make it give a TypeError if you pass in what would be the proper term for the old API. To make old code work, use {} for attrs. Fast path optimizations in iteration (for 'characters') - 10% performance boost for my test case. There is now a default iterator boundary tag: 'record' It's possible for an expression to go to completion but allow some text to remain unparsed. This now throws a new exception (a subtype of the old one) to allow the handlers to do something different for that case. This is used for the Bioformat format recognition code. Martel.SimpleRecordFilter is used by the Bioformat code to write a quick test filter, to determine if more identification work should be done. Added Martel.NoCase to produce a case insensitive version of the given expression, as in >>> import Martel >>> print str(Martel.NoCase(Martel.Str("JAN"))) [Jj][Aa][Nn] >>> Added Martel.Debug to print a message if the matching reaches that part of the pattern. The IterParser class handles iteration of records when on the record boundary. Uses 'yield' so only works with 2.2. The older Iterator.py classes still work for iteration support on pre-2.2 installations. The Dispatch class help with "well-formed" ContentHandler calls by mapping elements like into 'start_spam'. The Parser class contains special-case performance code to handle Dispatch calls. New function 'replace_groups' takes an expression and a list of replacement (old_tag, new_expression). It replaces the contents of the matching group with the new expression. >>> exp = Martel.Group("spam", Martel.Str("viking")) >>> exp = Martel.replace_groups(exp, [("spam", Martel.Str("rabbit"))]) >>> from xml.sax import saxutils >>> p = exp.make_parser() >>> p.setContentHandler(saxutils.XMLGenerator()) >>> p.parseString("rabbit") rabbit >>> Added: Martel.UntilSep = read up to a seperator character, don't consume it Martel.UntilEol = read up to a newline character, don't consume it Time.make_expression("%(MM)") parses '01' .. '12' (must have two digits) Time.make_expression("%(DD)") parses '01' .. '31' (must have two digits) Chaged ToSep and DelimitedFields to take a 'sep' instead of a 'delimiter' - Changes for 0.8 Added new, standard definitions Martel.Punctuation = Any(string.punctuation) Martel.Unprintable = AnyBut(string.printable) Martel.Word = \w+ Martel.Spaces = [\t\v\f ]+ (whitespace expect newline characters) Martel.ToSep = read up to a seperator character Martel.DelimitedFields = read field sepearted characters, up to a \R Renamed Martel.Integer to Martel.Digits Martel.SignedInteger to Martel.Integer Both the additions and the renames take an optional name and attributes, which are used for a Group around the term. Added a new type of Expression -- NullOp. This simplified the implementation of Time.py New submodule "Time.py" for building patterns and/or expressions for parsing date/times. Added "LAX" as a new way to handle "simple" XML records. XX ContentHandler Factory Bug fixed! - someone in personal email pointed out the named group backreferences ("(?P=name)" construct) weren't working. Turned out I didn't even have a regression test for that case. Both problems now fixed. Bug fixed! - Brad pointed out the debug code didn't trim to the min/max sizes of the string, so negative indexing sometimes caused large and useless output. Regression tests added for all the new code. Some cleanup here and there. - Changes during CVS Group attributes: Group takes an optional 'attrs' object (?P