;๒ Vษ]@c@sฏdZdkZdkZdkZdkZdkZdklZlZl Z l Z ydk l Z Wne j odk Z nXydklZWn e j odklZnXdkZdeifd„ƒYZdefd„ƒYZd efd „ƒYZd efd „ƒYZhZd eifd„ƒYZegƒZd„Zd„Zd„Zdeifd„ƒYZdeifd„ƒYZdeifd„ƒYZ dS(sImplement Martel parsers. The classes in this module are used by other Martel modules and not typically by external users. There are two major parsers, 'Parser' and 'RecordParser.' The first is the standard one, which parses the file as one string in memory then generates the SAX events. The other reads a record at a time using a RecordReader and generates events after each read. The generated event callbacks are identical. At some level, both parsers use "_do_callback" to convert mxTextTools tags into SAX events. XXX finish this documentation XXX need a better way to get closer to the likely error position when parsing. XXX need to implement Locator N(s xmlreaders _exceptionsshandlerssaxutils(s TextTools(sStringIOsParserExceptioncBstZdZd„ZRS(s used when a parse cannot be donecCs|idt|ƒ7_dS(Ns; in %s(sselfs_msgsreprstext(sselfstext((sMartel/Parser.pys setLocation-s(s__name__s __module__s__doc__s setLocation(((sMartel/Parser.pysParserException+s sParserPositionExceptioncBstZd„Zd„ZRS(NcCs$ti|d|tƒ||_dS(Ns'error parsing at or beyond character %d(sParserExceptions__init__sselfspossNone(sselfspos((sMartel/Parser.pys__init__1s cCs'|i|7_d|i|_|SdS(Ns'error parsing at or beyond character %d(sselfspossoffsets_msg(sselfsoffset((sMartel/Parser.pys__iadd__6s(s__name__s __module__s__init__s__iadd__(((sMartel/Parser.pysParserPositionException0s sParserIncompleteExceptioncBstZd„Zd„ZRS(NcCs#ti||ƒ|id7_dS(Ns (unparsed text remains)(sParserPositionExceptions__init__sselfsposs_msg(sselfspos((sMartel/Parser.pys__init__<scCs#ti||ƒ|id7_dS(Ns (unparsed text remains)(sParserPositionExceptions__iadd__sselfsoffsets_msg(sselfsoffset((sMartel/Parser.pys__iadd__?s(s__name__s __module__s__init__s__iadd__(((sMartel/Parser.pysParserIncompleteException;s sParserRecordExceptioncBstZdZRS(s4used by the RecordParser when it can't read a record(s__name__s __module__s__doc__(((sMartel/Parser.pysParserRecordExceptionCs sMartelAttributeListcBs}tZd„Zd„Zd„Zd„Zd„Zd„Zd„Zd„Z d „Z d „Z d „Z d „Z d „ZRS(NcCsdSdS(Ni((sself((sMartel/Parser.pys getLengthXscCs t|‚dS(N(s IndexErrorsi(sselfsi((sMartel/Parser.pysgetNameZscCs t|‚dS(N(s IndexErrorsi(sselfsi((sMartel/Parser.pysgetType\scCs t|‚dS(N(s IndexErrorsi(sselfsi((sMartel/Parser.pysgetValue^scCsdSdS(Ni((sself((sMartel/Parser.pys__len__`scCs3t|ƒtdƒjo t|‚n t|‚dS(Ni(stypeskeys IndexErrorsKeyError(sselfskey((sMartel/Parser.pys __getitem__bs cCsgSdS(N((sself((sMartel/Parser.pyskeysgscCsgSdS(N((sself((sMartel/Parser.pysvaluesiscCsgSdS(N((sself((sMartel/Parser.pysitemskscCsdSdS(Ni((sselfskey((sMartel/Parser.pyshas_keymscCs|SdS(N(s alternative(sselfskeys alternative((sMartel/Parser.pysgetoscCsdSdS(Ns{}((sself((sMartel/Parser.pys__repr__qscCsdSdS(Ns{}((sself((sMartel/Parser.pys__str__ss(s__name__s __module__s getLengthsgetNamesgetTypesgetValues__len__s __getitem__skeyssvaluessitemsshas_keysgets__repr__s__str__(((sMartel/Parser.pysMartelAttributeListWs            cCsœ|i} |i} |i}xX|D]P\} }} } ||jpt d||f‚||jo| |||!ƒn| i dƒoW| dj oE| i dƒpt dt | ƒ‚|| \}}| ||ƒq๋n| | tƒ| ot||| | ||ƒn| ||| !ƒ| }| i dƒo2| i dƒo|| \}}||ƒqrq"|| ƒq"W||jo| |||!ƒndS(s‰internal function to convert the tagtable into ContentHandler events 's' is the input text 'begin' is the current position in the text 'end' is 1 past the last position of the text allowed to be parsed 'taglist' is the tag list from mxTextTools.parse 'cont_handler' is the SAX ContentHandler 'attrlookup' is a dict mapping the encoded tag name to the element info sbegin = %d and l = %ds>s>ignores>GsUnknown special tag %sN(s cont_handlers characterss startElements endElementstagliststagslsrssubtagssbeginsAssertionErrorsss startswithsreprs attrlookupsrealtagsattrss_attribute_lists _do_callbacksend(sssbeginsendstaglists cont_handlers attrlookupsrealtags endElementslssubtagss startElementstags characterssrsattrs((sMartel/Parser.pys _do_callbackzs4    ! $  c  Cs์xท|D]ฏ\} } }} || jptd|| f‚|| jo|o|i ||| !7_ n|| ƒ} | t j o| | tƒnW|i| ƒ}|t j o:|\} }|| ƒ} | t j o| | |ƒq่n| o&t|| || |||||ƒ n"|o|i || |!7_ n|}|| ƒ} | t j o| | ƒq|i| ƒ}|t j o7|\} }|| ƒ} | t j o| | ƒqถqqW||jo|o|i |||!7_ ndS(sPinternal function to convert the tagtable into ContentHandler events THIS IS A SPECIAL CASE FOR Dispatch.Dispatcher objects 's' is the input text 'begin' is the current position in the text 'end' is 1 past the last position of the text allowed to be parsed 'taglist' is the tag list from mxTextTools.parse 'start_table_get' is the Dispatcher._start_table 'cont_handler' is the Dispatcher 'end_table_get' is the Dispatcher._end_table 'cont_handler' is the SAX ContentHandler 'attrlookup' is a dict mapping the encoded tag name to the element info sbegin = %d and l = %dN(stagliststagslsrssubtagssbeginsAssertionErrors save_stacks cont_handlers _save_textsssstart_table_getsfsNones_attribute_lists attrlookupsgetsxsrealtagsattrss_do_dispatch_callbacks end_table_getsend(sssbeginsendstaglistsstart_table_gets cont_handlers save_stacks end_table_gets attrlookupsrealtagsfslssubtagsstagsxsrsattrs((sMartel/Parser.pys_do_dispatch_callbackฒsF!             c Cs |odk}d|_nti||dt|ƒƒ\}}}t |t iƒo5t|d|||ii||i|ii|ƒ n1|itijot|d||||ƒn| o&|ot|iƒSq t|ƒSn |t|ƒjo|SntSdS(sฤparse the string with the tagtable and send the ContentHandler events Specifically, it sends the startElement, endElement and characters events but not startDocument and endDocument. Ni(s debug_levelsGenerates _positions TextToolsstagssstagtableslensresultstaglistsposs isinstances cont_handlersDispatchs Dispatchers_do_dispatch_callbacks _start_tablesgets _save_stacks _end_tables attrlookups __class__shandlersContentHandlers _do_callbacksParserPositionExceptionsNone( ssstagtables cont_handlers debug_levels attrlookupsposstaglistsGeneratesresult((sMartel/Parser.pys_parse_elementss(  '    sParsercBsYtZdZddhfd„Zd„Zd„Zd„Zd„Zd„Zd „Z RS( s"Parse the input data all in memoryiicCsj|\}}}tii|ƒt|ƒtfƒjp t d‚||_||_||_||_dS(Ns(mxTextTools only allows a tuple tagtable( swant_groupref_namess debug_levels attrlookups xmlreaders XMLReaders__init__sselfstypestagtablesAssertionError(sselfstagtables.4swant_groupref_namess debug_levels attrlookup((sMartel/Parser.pys__init__&s #   cCset|i|i|i|ifƒ}|i|iƒƒ|i |i ƒƒ|i |i ƒƒ|SdS(N( sParsersselfstagtableswant_groupref_namess debug_levels attrlookupsparserssetContentHandlersgetContentHandlerssetErrorHandlersgetErrorHandlers setDTDHandlers getDTDHandler(sselfsparser((sMartel/Parser.pyscopy4s $cCs*tƒ}ti|i|ƒ|iƒSdS(N(sStringIOsxspprintsselfstagtablesgetvalue(sselfsx((sMartel/Parser.pys__str__<s cCs|i|iƒƒdS(s“parse using the input file object XXX will be removed with the switch to Python 2.0, where parse() takes an 'InputSource' N(sselfs parseStringsfileobjsread(sselfsfileobj((sMartel/Parser.pys parseFileAscCs3ti|ƒ}|i|iƒp |iƒƒdS(s"parse using the URL or file handleN(ssaxutilssprepare_input_sourcessourcesselfs parseFilesgetCharacterStreams getByteStream(sselfssource((sMartel/Parser.pysparseJscCsฎ|iiƒ|iotiƒnt||i|i|i |i ƒ}|t jonDt |tiƒo|ii|ƒn|}|iit|ƒƒ|iiƒdS(sŽparse using the given string XXX will be removed with the switch to Python 2.0, where parse() takes an 'InputSource' N(sselfs _cont_handlers startDocumentswant_groupref_namess _match_groupsclears_parse_elementsssstagtables debug_levels attrlookupsresultsNones isinstances _exceptionss SAXExceptions _err_handlers fatalErrorspossParserIncompleteExceptions endDocument(sselfssspossresult((sMartel/Parser.pys parseStringOs   cCsdS(N((sself((sMartel/Parser.pysclosens( s__name__s __module__s__doc__s__init__scopys__str__s parseFilesparses parseStringsclose(((sMartel/Parser.pysParser#s     s RecordParsercBsPtZdZfd„Zd„Zd„Zd„Zd„Zd„Zd„Z RS(s'Parse the input data a record at a timec CsŽ|\}}} tii|ƒ||_||_t |ƒt fƒjp t d‚||_ ||_||_| |_||_ ||_dS(s.parse the input data a record at a time format_name - XML tag name for the whole data file record_tagtable - mxTexTools tag table for each record want_groupref_names - flag to say if the match_group table needs to be reset (will disappear with better support from mxTextTools) make_reader - callable object which creates a RecordReader; first parameter will be an input file object reader_args - optional arguments to pass to make_reader after the input file object s(mxTextTools only allows a tuple tagtableN(swant_groupref_namess debug_levels attrlookups xmlreaders XMLReaders__init__sselfs format_namesattrsstypesrecord_tagtablesAssertionErrorstagtables make_readers reader_args( sselfs format_namesattrssrecord_tagtables.8s make_readers reader_argsswant_groupref_namess debug_levels attrlookup((sMartel/Parser.pys__init__ss  #     cCs}t|i|i|i|i|i|if|i|i ƒ}|i |i ƒƒ|i |iƒƒ|i|iƒƒ|SdS(N(s RecordParsersselfs format_namesattrsstagtableswant_groupref_namess debug_levels attrlookups make_readers reader_argssparserssetContentHandlersgetContentHandlerssetErrorHandlersgetErrorHandlers setDTDHandlers getDTDHandler(sselfsparser((sMartel/Parser.pyscopyscCs.tƒ}ti|i|ƒd|iƒSdS(Nsparse records: (sStringIOsxspprintsselfstagtablesgetvalue(sselfsx((sMartel/Parser.pys__str__šs cCs<|iiƒy|i|f|iŒ}Wnuttfj o ‚n[t ƒ}t i d|ƒ|i it|iƒtiƒdƒƒ|iiƒdSnX|iotiƒn|ii|i|iƒd}x<no4y|iƒ}Wnuttfj o ‚n[t ƒ}t i d|ƒ|i it|iƒtiƒdƒƒ|iiƒdSnX|tjoPnt||i|i|i |i!ƒ}|tjonRt#|t$i%ƒo||7}|i i&|ƒn!||}|i i&t(|ƒƒ|t)|ƒ}qใW|ii*|iƒ|iiƒdS(s“parse using the input file object XXX will be removed with the switch to Python 2.0, where parse() takes an 'InputSource' sfileiNi(+sselfs _cont_handlers startDocuments make_readersfileobjs reader_argssreadersKeyboardInterrupts SystemExitsStringIOsoutfiles tracebacks print_excs _err_handlers fatalErrorsParserRecordExceptionsgetvaluessyssexc_infos endDocumentswant_groupref_namess _match_groupsclears startElements format_namesattrssfilepossnextsrecordsNones_parse_elementsstagtables debug_levels attrlookupsresults isinstances _exceptionss SAXExceptionserrorspossParserPositionExceptionslens endElement(sselfsfileobjsfileposspossoutfilesresultsreadersrecord((sMartel/Parser.pys parseFileŸsT  )   )     cCs3ti|ƒ}|i|iƒp |iƒƒdS(s"parse using the URL or file handleN(ssaxutilssprepare_input_sourcessourcesselfs parseFilesgetCharacterStreams getByteStream(sselfssource((sMartel/Parser.pysparse฿scCst|ƒ}|i|ƒdS(sŽparse using the given string XXX will be removed with the switch to Python 2.0, where parse() takes an 'InputSource' N(sStringIOsssstrfilesselfs parseFile(sselfsssstrfile((sMartel/Parser.pys parseStringไs cCsdS(N((sself((sMartel/Parser.pysclose๎s( s__name__s __module__s__doc__s__init__scopys__str__s parseFilesparses parseStringsclose(((sMartel/Parser.pys RecordParserqs    @  sHeaderFooterParsercBsDtZdZd„Zd„Zd„Zd„Zd„Zd„ZRS(s9Header followed by 0 or more records followed by a footerc Csก| \} }}tii|ƒ||_||_||_ ||_ ||_ ||_ ||_ ||_| |_| |_| |_| |_||_||_dS(N(swant_groupref_namess debug_levels attrlookups xmlreaders XMLReaders__init__sselfs format_namesattrssmake_header_readersheader_reader_argssheader_tagtables make_readers reader_argssrecord_tagtablesmake_footer_readersfooter_reader_argssfooter_tagtable(sselfs format_namesattrssmake_header_readersheader_reader_argssheader_tagtables make_readers reader_argssrecord_tagtablesmake_footer_readersfooter_reader_argssfooter_tagtables.24swant_groupref_namess debug_levels attrlookup((sMartel/Parser.pys__init__๓s             cCs=tƒ}ti|i|i|if|ƒd|iƒSdS(Nsheader footer records: (sStringIOsxspprintsselfsheader_tagtablesrecord_tagtablesfooter_tagtablesgetvalue(sselfsx((sMartel/Parser.pys__str__ s cCsกt|i|i|i|i|i|i|i|i |i |i |i |i |i|ifƒ }|i|iƒƒ|i|iƒƒ|i|iƒƒ|SdS(N(sHeaderFooterParsersselfs format_namesattrssmake_header_readersheader_reader_argssheader_tagtables make_readers reader_argssrecord_tagtablesmake_footer_readersfooter_reader_argssfooter_tagtableswant_groupref_namess debug_levels attrlookupsparserssetContentHandlersgetContentHandlerssetErrorHandlersgetErrorHandlers setDTDHandlers getDTDHandler(sselfsparser((sMartel/Parser.pyscopyscCst|ƒ}|i|ƒdS(N(sStringIOsssstrfilesselfs parseFile(sselfsssstrfile((sMartel/Parser.pys parseString s cCs3ti|ƒ}|i|iƒp |iƒƒdS(s"parse using the URL or file handleN(ssaxutilssprepare_input_sourcessourcesselfs parseFilesgetCharacterStreams getByteStream(sselfssource((sMartel/Parser.pysparse$scCs |iiƒ|ii|i|iƒ|iotiƒnd}d}|i t j oly)|i |f|iŒ}|iƒ}Wn{ttfj o ‚natƒ} tid| ƒt| iƒtiƒdƒ} |ii| ƒ|iiƒdSnX|t jo d}n|t |ƒ7}t!||i"|i|i#|i$ƒ}|t joqฦt&|t'i(ƒo%|ii|ƒ|iiƒdSqฦ|}|iit*|ƒƒ|iiƒdSn|i t jo|df\}}n|i,ƒ\}}|i-t joy)|i.|f|i/hd|<Ž} Wn{ttfj o ‚natƒ} tid| ƒt| iƒtiƒdƒ} |ii| ƒ|iiƒdSnXxYnoMy| iƒ}Wn{ttfj o ‚natƒ} tid| ƒt| iƒtiƒdƒ} |ii| ƒ|iiƒdSnX|t jo(|ii2|iƒ|iiƒdSnt!||i3|i|i#|i$ƒ}|t jonBt&|t'i(ƒo||7}nt*||ƒ}|ii4|ƒ|t |ƒ7}qฟWn|i-t j p t5d‚t } y)|i.|f|i/hd|<Ž} WnZttfj o ‚n@tƒ} tid| ƒt| iƒtiƒdƒ} nXx]| t joOy| iƒ}Wn[ttfj o ‚nAtƒ} tid| ƒt| iƒtiƒdƒ} PnX|t jot*|ƒ} Pnt!||i3|i|i#|i$ƒ}|t jon{t&|t'i(ƒo||7}nt*||ƒ}|} yQd}| i,ƒ\}}|i-|f|i8hd||<Ž}|iƒ}Wnีttfj o ‚nป|ii4| ƒt } |i,ƒ\}}y-|i.|f|i/hd||<Ž} Wq่ttfj o ‚q่tƒ} tid| ƒt| iƒtiƒdƒ} Pq่XnXt!||i:|i|i#|i$ƒ}|t joŒ|i,ƒ\}} | p |i;dƒo;t*|t |ƒƒ} |ii| ƒ|iiƒdSn|ii2|iƒ|iiƒdSn^|i,ƒ\}} |i.|f|i/hd|| <Ž} | iƒ}|ii4| ƒt } |t |ƒ7}qบW| i,ƒ\}} y5|i-|f|i8hd| <Ž}|iƒ}WnCttfj o ‚n)|ii| ƒ|iiƒdSnX|t jo d}nt!||i:|i|i#|i$ƒ}|t joŒ|i,ƒ\}} | p |i;dƒo;t*|t |ƒƒ} |ii| ƒ|iiƒdSn|ii2|iƒ|iiƒdSn"|ii| ƒ|iiƒdSdS(Nissfileis lookaheadsinternal error(<sselfs _cont_handlers startDocuments startElements format_namesattrsswant_groupref_namess _match_groupsclearsfileposs lookaheadsmake_header_readersNonesfileobjsheader_reader_argss header_readersnextsheadersKeyboardInterrupts SystemExitsStringIOsoutfiles tracebacks print_excsParserRecordExceptionsgetvaluessyssexc_infosexcs _err_handlers fatalErrors endDocumentslens_parse_elementssheader_tagtables debug_levels attrlookupsresults isinstances _exceptionss SAXExceptionspossParserPositionExceptionsxs remaindersmake_footer_readers make_readers reader_argssreadersrecords endElementsrecord_tagtableserrorsAssertionErrors record_excsfootersfooter_reader_argss footer_readersfooter_tagtablesread(sselfsfileobjsfileposs header_readerspossheadersresults footer_readersfootersreadersoutfiles remainders record_excsexcs lookaheadsrecordsx((sMartel/Parser.pys parseFile)sn             )            )           )-         ) %          ( s__name__s __module__s__doc__s__init__s__str__scopys parseStringsparses parseFile(((sMartel/Parser.pysHeaderFooterParser๑s     (!s__doc__surllibspprints tracebackssyssstringsxml.saxs xmlreaders _exceptionsshandlerssaxutilssmxs TextToolss ImportErrors cStringIOsStringIOsDispatchs SAXExceptionsParserExceptionsParserPositionExceptionsParserIncompleteExceptionsParserRecordExceptions _match_groupsAttributesImplsMartelAttributeLists_attribute_lists _do_callbacks_do_dispatch_callbacks_parse_elementss XMLReadersParsers RecordParsersHeaderFooterParser(sParsers xmlreadersMartelAttributeLists _match_groupsParserRecordExceptionspprints_do_dispatch_callbacks RecordParsersurllibshandlersHeaderFooterParsers_parse_elementssstringsParserPositionExceptionssaxutilss_attribute_listssyss _exceptionssStringIOs tracebacksDispatchsParserExceptions _do_callbacks TextToolssParserIncompleteException((sMartel/Parser.pys?s0-     8 N #N€