package Lire::ReportParser; use strict; use base qw/ Lire::DocBookParser Lire::Config::Parser /; use Lire::Report::TableInfo; use Lire::Config::ListSpec; use Lire::Config::ChartSpec; =pod =head1 NAME Lire::ReportParser - Lire::XMLParser which parses XML reports =head1 SYNOPSIS package MyParser; use base qw/ Lire::ReportParser /; sub parse_end { return "Finished"; } package main:: my $parser = new MyParser; my $result = eval { $parser->parsefile( "report.xml" ) }; croak "Error parsing report.xml: $@" if $@ print $result, "\n"; =head1 DESCRIPTION This is a Lire::XMLParser(3pm) subclass which handle XML document adhering to the Lire Report Markup Language format. It's primary purpose is to write custom handlers for the Lire XML Report format. =head1 USAGE You create an instance of a subclass of Lire::ReportParser and use either one of the parse() or parsefile() methods to process the XML reports. You'll probably never use the Lire::ReportParser module directly; you'll likely use one subclass which actually does something when processing the document. =head2 new( %args ) my $parser = new Lire::ReportParser::ReportBuilder(); The new() method takes parameters in the form of 'key' => value pairs. The available parameters are specific to each processor. There are no generic parameters. =cut sub namespaces { my $self = $_[0]; return { %{$self->Lire::Config::Parser::namespaces()}, 'http://www.logreport.org/LRML/' => 'lire' }; } sub elements_spec { my $self = $_[0]; return { %{$self->Lire::Config::Parser::elements_spec()}, %{$self->Lire::DocBookParser::elements_spec()}, 'lire:report' => [ 'lire:title', 'lire:date', 'lire:timespan', 'lire:description', 'lire:section' ], 'lire:section' => [ 'lire:title', 'lire:description', 'lire:subreport', 'lire:missing-subreport' ], 'lire:subreport' => [ 'lire:title', 'lire:description', 'lire:chart-configs', 'lire:table' ], 'lire:missing-subreport' => [], 'lire:title' => [ 'PCDATA' ], 'lire:description' => [ @Lire::DocBookParser::top_levels ], 'lire:date' => [ 'PCDATA' ], 'lire:timespan' => [ 'PCDATA' ], 'lire:table' => [ 'lire:table-info', 'lire:group-summary', 'lire:entry' ], 'lire:entry' => [ 'lire:name', 'lire:value', 'lire:group' ], 'lire:name' => [ 'PCDATA' ], 'lire:value' => [ 'PCDATA' ], 'lire:group' => [ 'lire:group-summary', 'lire:entry' ], 'lire:chart-configs' => [ 'lrcml:param' ], 'lire:table-info' => [ 'lire:column-info', 'lire:group-info' ], 'lire:group-info' => [ 'lire:column-info', 'lire:group-info' ], 'lire:column-info' => [], 'lire:group-summary' => [ 'lire:value' ], } } =pod =head1 WRITING AN XML REPORT PROCESSOR Using Lire::ReportParser, one can write an XML report processor. The programming model is similar to the expat event-based interface or the well-known SAX model. The principal difference with those models is that this module offers hooks specifically tailored for Lire's XML reports. For example, instead of having one generic element-start event, you have methods for each specific type of element, making it easy to hook on only the elements you're interested in. It also offers some functions that make it easy to determine the context (always a difficulty in event-based programming). If you are uncomfortable with that kind of programming, there is also an object-oriented API available to the XML reports. That API is more similar to DOM type of programming. Its principal drawback is that its less performant since it has to parse the whole XML document in memory to build an object representation. But if you need to "navigate" the document, it's a lot better than the event-based API. The main way of using that API to write a custom XML report handler is by subclassing the Lire::ReportParser module and overriding the functions related to the elements you are interested in. There are 3 categories of methods you can override. =over 4 =item Customization Methods Those are methods that customize the way the Lire::ReportParser will operate. The most important one is the new() "constructor". =item Generic element methods Those are methods that are invoked on each element before the more specific or higher ones and can be used to hook before the other events are synthesized. =head1 HIGH-LEVEL EVENT METHODS For each element defined, an I_start() and an I_end() method are invoked. For elements that contains character data, an I_char() method will also be invoked altough, you probably want to hook onto the easier handle_I() methods in these cases. When you override any of those mehod (except the handle_I() one), you B invoke the parent method also: sub subreport_start { my ( $self, $name, $attr ) = @_; $self->SUPER::subreport_start( $name, $attr ); # Processor specific handling. } =head2 report_start( $name, $attr ) Called when the report element start tag is encountered. The only defined attribute is C. The current version is 2.0, but older 1.0 report can still be parsed. =cut sub report_start { my ( $self, $name, $attr ) = @_; $self->init_stack( 'config_spec' ); $self->init_stack( 'config_value' ); $self->error( "missing 'version' attribute on root element\n" ) unless defined $attr->{'version'}; $self->error( "'version' attribute should be 2.0 or 2.1: $attr->{'version'}") unless $attr->{'version'} eq '2.0' || $attr->{'version'} eq '2.1'; $self->{'lrp_subreport_count'} = 0; return; } =pod =head2 report_end( $name ) Called when the report element end tag is encountered. =cut sub report_end { } sub title_start { my ( $self, $name, $attr ) = @_; $self->collector_start( $name ); return; } sub title_char { my ( $self, $char ) = @_; $self->collector_char( $char ); return; } sub title_end { my ( $self, $name ) = @_; my $title = $self->get_collector( $name ); $title =~ s/\s+/ /g; $self->handle_title( $title ); return; } =pod =head2 handle_title( $title ) Method invoked after the C element was processed. The $title parameter contains the content of the element. This can be a report's, subreport's or section's title. You'll need to use the in_element() method to determine the context. =cut sub handle_title { my ( $self, $title ) = @_; return; } =pod =head2 handle_description( $docbook_desc ) Unless the description_start and description_end events are overriden the content of the description will be collected and will be available in the handle_description() method. =cut sub description_start { my $self = $_[0]; $self->dbk_init(); return; } sub description_end { my $self = $_[0]; $self->handle_description( $self->dbk_string() ); return; } sub date_start { my ( $self, $name, $attr ) = @_; $self->{'lrp_curr_date'} = { 'date' => '', 'time' => $attr->{'time'} }; return; } sub date_char { $_[0]->{'lrp_curr_date'}{'date'} .= $_[1]; return; } sub date_end { my ( $self, $name ) = $_[0]; $self->handle_date( $self->{'lrp_curr_date'}{'date'}, $self->{'lrp_curr_date'}{'time'} ); return; } =pod =head2 handle_date( $date, $date_epoch ) Called after the C<date> element was parsed. The formatted date is available in the $date parameter, the date in number of seconds since the epoch is available in the $date_epoch parameter. This can be the report's or a subreport's date, you'll need to use the in_element() method to determine the appropriate context. =cut sub handle_date {} sub timespan_start { my ( $self, $name, $attr ) = @_; $self->{'lrp_curr_timespan'} = { 'timespan' => '', 'start' => $attr->{'start'}, 'end' => $attr->{'end'}, 'period' => $attr->{'period'} }; return; } sub timespan_char { my ( $self, $char ) = @_; $self->{'lrp_curr_timespan'}{'timespan'} .= $char; return; } sub timespan_end { my ( $self) = @_; $self->handle_timespan( $self->{'lrp_curr_timespan'}{'timespan'}, $self->{'lrp_curr_timespan'}{'start'}, $self->{'lrp_curr_timespan'}{'end'}, $self->{'lrp_curr_timespan'}{'period'} ); return; } =pod =head2 handle_timespan( $timespan, $epoch_start, $epoch_end, $period ) Called after the C<timespan> element was parsed. The formatted timespan is available in the $timespan parameter, starting and ending dates of the timespan are available as number of seconds since the epoch in the $epoch_start and $epoch_end parameters. The $period parameter contians the timespan's period attribute. This can be the timespan of the report or the subreport, you'll need to use the in_element() method to determine the appropriate context. =cut sub handle_timespan {} =pod =head2 section_start( $name, $attr ) Called when the opening tag of a C<section> element is encountered. =cut sub section_start { my ( $self, $name, $attr ) = @_; $self->{'lrp_section_subreport_count'} = 0; return; } =pod =head2 section_end( $name ) Called when the closing tag of a C<section> element is encountered. =cut sub section_end { my $self = $_[0]; $self->{'lrp_section_subreport_count'} = 0; return; } =pod =head2 missing_subreport_start( $name, $attr ) Called when the opening tag of a C<missing-subreport> element is encountered. The C<superservice> attribute contains the superservice's of the subreport, the C<type> attribute contains the report specification ID and the C<reason> attribute will contains the reason why the subreport is missing. =cut sub missing_subreport_start {} =pod =head2 missing_subreport_end( $name ) Called when the closing tag of a C<missing-subreport> element is encountered. =cut sub missing_subreport_end {} =pod =head2 subreport_start( $name, $attr ) Called when the opening tag of the C<subreport> element is encountered. The C<superservice> attribute contains the subreport's superservice and the C<type> attribute contains the ID of the report specification that was used to generate that subreport. =cut sub subreport_start { my ( $self, $name, $attr ) = @_; $self->error( "missing 'superservice' attribute" ) unless defined $attr->{'superservice'}; $self->error( "missing 'type' attribute" ) unless defined $attr->{'type'}; $self->{'lrp_section_subreport_count'}++; $self->init_stack( 'lrp_group' ); $self->init_stack( 'lrp_entry' ); $self->{'lrp_subreport'} = $attr; return; } =pod =head2 subreport_end( $name ) Called when the C<subreport>'s closing tag is encountered. =cut sub subreport_end { my $self = $_[0]; $self->error( "ASSERTION failed: 'lrp_group' stack should be empty\n" ) unless $self->is_stack_empty( 'lrp_group' ); $self->error( "ASSERTION failed: 'lrp_group' stack should be empty\n" ) unless $self->is_stack_empty( 'lrp_entry' ); delete $self->{'lrp_subreport'}; $self->{'lrp_curr_table_info'} = undef; $self->{'lrp_curr_group_info'} = undef; $self->{'lrp_subreport_count'}++; return; } =pod =head2 table_start( $name, $attr ) Called when the opening tag of the C<table> element is encountered. The C<show> attribute contains the maximum number of entries that should be displayed (there may more entries than this number). =cut sub table_start { my ( $self, $name, $attr ) = @_; $self->stack_push( 'lrp_group', { 'entry_count' => 0, 'show' => $attr->{'show'}, } ); $self->{'lrp_curr_table'} = $self->stack_peek( 'lrp_group' ); return; } =pod =head2 table_end( $name ) Called when the C<table>'s closing tag is encountered. =cut sub table_end { my $self = $_[0]; $self->stack_pop( 'lrp_group' ); delete $self->{'lrp_curr_table'}; return; } =pod =head2 table_info_start( $name, $attr ) Called when the C<table-info>'s closing tag is encountered. There should be no reason for subclasses to override this method. The Lire::ReportParser takes care of parsing the C<table-info> content and offers that information through a Lire::Report::TableInfo object which is accessible through the current_table_info() method. =cut sub table_info_start { my ( $self, $name, $attr ) = @_; $self->{'lrp_curr_table_info'} = new Lire::Report::TableInfo(); $self->init_stack( 'lrp_group-info' ); $self->stack_push( 'lrp_group-info', $self->{'lrp_curr_table_info'} ); return; } =pod =head2 table_info_end( $name ) Called when the C<table-info>'s closing tag is encountered. See table_info_start() documentation for important comments. =cut sub table_info_end { my $self = $_[0]; $self->stack_pop( 'lrp_group-info' ); $self->stack_peek( 'lrp_group' )->{'group_info'} = $self->current_table_info(); $self->error( "ASSERTION failed: stack 'lrp_group-info' should be empty" ) unless $self->is_stack_empty( 'lrp_group-info' ); return; } =pod =head2 group_info_start( $name, $attr ) Called when the C<group-info>'s opening tag is encountered. See table_info_start() documentation for important comments. =cut sub group_info_start { my ( $self, $name, $attr ) = @_; my $curr_info = $self->stack_peek( 'lrp_group-info' ); $self->stack_push( 'lrp_group-info', $curr_info->create_group_info( $attr->{'name'} ) ); return; } =pod =head2 group_info_end( $name ) Called when the C<group-info>'s closing tag is encountered. See table_info_start() documentation for important comments. =cut sub group_info_end { my $self = $_[0]; $self->stack_pop( 'lrp_group-info' ); return; } =pod =head2 column_info_start( $name, $attr ) Called when the C<column-info>'s opening tag is encountered. See table_info_start() documentation for important comments. =cut sub column_info_start { my ( $self, $name, $attr ) = @_; my $group_info = $self->stack_peek( 'lrp_group-info' ); my $info = $group_info->create_column_info( $attr->{'name'}, $attr->{'class'}, $attr->{'type'}, $attr->{'label'} ); $info->max_chars( $attr->{'max-chars'} ); $info->avg_chars( $attr->{'avg-chars'} ); $info->col_start( $attr->{'col-start'} ); $info->col_end( $attr->{'col-end'} ); $info->col_width( $attr->{'col-width'} ); return; } =pod =head2 column_info_end( $name ) Called when the C<column-info>'s closing tag is encountered. See table_info_start() documentation for important comments. =cut sub column_info_end {} =pod =head2 group_summary_start( $name, $attr ) Called when the C<group-summary>'s opening tag is encountered. =cut sub group_summary_start { my ( $self, $name, $attr ) = @_; return; } =pod =head2 group_summary_end( $name ) Called when the C<group-summary>'s closing tag is encountered. =cut sub group_summary_end { my $self = $_[0]; return; } =pod =head2 group_start( $name, $attr ) Called when the opening tag of the C<group> element is encountered. C<group> elements introduce a kind of nested table. The C<show> attribute contains the maximum number of entries that should be displayed, altough more entries may be present in the report. =cut sub group_start { my ( $self, $name, $attr ) = @_; my $entry = $self->stack_peek( 'lrp_entry' ); $entry->{'child_idx'}++; my $info = $entry->{'group_info'}->info_by_index( $entry->{'child_idx'} ); $self->stack_push( 'lrp_group', { 'entry_count' => 0, 'show' => $attr->{'show'}, 'group_info' => $info, } ); return; } =pod =head2 group_end( $name ) Called when the C<group>'s closing tag is encountered. =cut sub group_end { my $self = $_[0]; $self->stack_pop( 'lrp_group' ); return; } =pod =head2 entry_start( $name, $attr ) Called when the opening tag of an C<entry> element is encountered. =cut sub entry_start { my ( $self, $name, $attr ) = @_; my $group = $self->stack_peek( 'lrp_group' ); $group->{'entry_count'}++; $self->stack_push( 'lrp_entry', { %$attr, 'child_idx' => -1, 'group_info' => $group->{'group_info'}, } ); return; } =pod =head2 entry_end( $name ) Called when the C<entry>'s closing tag is encountered. =cut sub entry_end { my $self = $_[0]; $self->stack_pop( 'lrp_entry' ); return; } sub name_start { my ( $self, $name, $attr ) = @_; my $entry = $self->stack_peek( 'lrp_entry' ); $entry->{'child_idx'}++; my $info = ( $attr->{'col'} ? $self->current_table_info()->column_info_by_name( $attr->{'col'} ) : $entry->{'group_info'}->info_by_index( $entry->{'child_idx'} ) ); $self->{'lrp_curr_name'} = { %$attr, 'content' => "", 'col_info' => $info, }; return; } sub name_char { my ( $self, $char ) = @_; $self->{'lrp_curr_name'}{'content'} .= $char; return; } sub name_end { my $self = $_[0]; $self->{'lrp_curr_name'}{'value'} = $self->{'lrp_curr_name'}{'content'} unless defined $self->{'lrp_curr_name'}{'value'}; $self->handle_name( $self->{'lrp_curr_name'} ); return; } =pod =head2 handle_name( $name_rec ) Called after a C<name> element was parsed. The $name_rec parameter is an hash reference which contains the different values of the name datum. Keys that are defined in this hash: =over 4 =item content That's the actual content of the name element. This contains the name in a format suitable for display. =item value This contains the unformatted value of the name. For example, when the name is a time string, this attribute will contains the time in seconds since epoch. =item range For some names, the actual content express a range (time, size, etc.). This attribute contains the length of the range. =item col_info The Lire::ColumnInfo object describing the column in which this name appears. =back =cut sub handle_name {} sub value_start { my ( $self, $name, $attr ) = @_; # Value in group-summary are handled differently because # they aren't part of the entry children. $self->stack_peek( 'lrp_entry' )->{'child_idx'}++ unless $self->within_element( "lire:group-summary" ); $self->{'lrp_curr_value'} = { %$attr, 'content' => "", 'col_info' => $self->current_table_info()->column_info_by_name( $attr->{'col'} ), }; return; } sub value_char { my ( $self, $char ) = @_; $self->{'lrp_curr_value'}{'content'} .= $char; return; } sub value_end { my $self = $_[0]; $self->{'lrp_curr_value'}{'value'} = $self->{'lrp_curr_value'}{'content'} unless defined $self->{'lrp_curr_value'}{'value'}; if ( $self->within_element( "lire:group-summary" ) ) { $self->handle_summary_value( $self->{'lrp_curr_value'} ); } else { $self->handle_value( $self->{'lrp_curr_value'} ); } return; } =pod =head2 handle_value( $value_rec ) Called after a C<value> element was parsed. The $value_rec parameter is an hash reference which contains the different values of the value datum. Keys that are defined in this hash: =over 4 =item content That's the actual content of the value element. This contains the value in a format suitable for display. =item value This contains the unformatted value. For example, when bytes are displayed using "1M" or "1.1G", this will contains the value in bytes. =item total This is used by values that represent an average. It contains the total which makes up the average. =item n This is used by values that represent an average. It contains the total which was used in the division to compute the average. =item col_info The Lire::ColumnInfo object describing the column in which this name appears. =back =cut sub handle_value {} =pod =head2 handle_summary_value( $value_rec ) Called after a C<value> element located in the group-summary element was parsed. The $value_rec parameter is identical than in the handle_value() method. =cut sub handle_summary_value {} sub chart_configs_start { my $self = $_[0]; my $spec = new Lire::Config::ListSpec( 'name' => 'chart_configs' ); $spec->add( new Lire::Config::ChartSpec( 'name' => 'chart' ) ); $self->stack_push( 'config_spec', $spec ); $self->stack_push( 'config_value', $spec->instance() ); return; } sub chart_configs_end { my $self = $_[0]; $self->stack_pop( 'config_spec' ); $self->handle_chart_configs( $self->stack_pop( 'config_value' )->as_value() ); return; } =pod =head2 handle_chart_configs( $configs ) If the Subreport contained chart configurations, an array reference of Lire::Report::ChartConfig objects will be passed to this event handler. =cut sub handle_chart_configs { my ( $self, $configs ) = @_; return; } =pod =head1 CONTEXT METHODS Finally, here a bunch of additional methods that can be used to query some context information when processing elements. =cut =pod =head2 current_subreport_count( ) Returns the number of subreport that are present to date in the report. That number is equals to the number of processed C<subreport> elements, i.e. the current subreport isn't counted untill the closing tag was processed. =cut sub current_subreport_count { return $_[0]{'lrp_subreport_count'}; } =pod =head2 current_section_subreport_count( ) Returns the number of subreport that are present to date in the section. That number is equals to the number of processed C<subreport> elements, i.e. the current subreport isn't counted untill the closing tag was processed. =cut sub current_section_subreport_count { return $_[0]{'lrp_section_subreport_count'}; } =pod =head2 current_date( ) Returns the content of the C<date> element that applies to the current element. This will either be the current subreport's date or the default one taken from the C<report> element. The date is returned as an hash reference which will contain the formatted date in the C<date> key and the date in seconds since epoch in the C<time> key. =cut sub current_date { return $_[0]{'lrp_curr_date'}; } =pod =head2 current_timespan( ) Returns the content of the C<timespan> element that applies to the current element. This will either be the current subreport's date or the default one taken from the C<report> element. The timespan is returned as an hash reference which will contain the formatted timespan in the C<timespan> key. The starting and ending date of the timespan are available as seconds since epoch in the C<start> and C<end> keys. The C<period> key contains the report's timespan. =cut sub current_timespan { return $_[0]{'lrp_curr_timespan'}; } =pod =head2 current_superservice( ) Useful in C<subreport> context, it returns the superservice's of the current subreport. =cut sub current_superservice { return $_[0]{'lrp_subreport'}{'superservice'}; } =pod =head2 current_type( ) Useful in C<subreport> context, it returns the ID of the report specification that was used to generate the current subreport. =cut sub current_type { return $_[0]{'lrp_subreport'}{'type'}; } =pod =head2 current_table_info() Useful when processing C<group> and C<entry>, this returns a Lire::Report;:TableInfo object which describes the layout of the current table. =cut sub current_table_info { return $_[0]{'lrp_curr_table_info'}; } =pod =head2 current_group_entry_show( ) Useful in C<table> and C<group> context, it returns the maximum number of entries that should be displayed. =cut sub current_group_entry_show { return $_[0]->stack_peek( 'lrp_group' )->{'show'}; } =pod =head2 show_current_entry( ) Useful in C<entry> context , this can be used to test whether or not the current C<entry> should be displayed based on the current entry index and the parent's C<show> attribute. =cut sub show_current_entry { my $group = $_[0]->stack_peek( 'lrp_group' ); return ( !defined $group->{'show'} || $group->{'entry_count'} <= $group->{'show'} ); } =pod =head2 current_table_entry_count( ) Useful in C<table> context, it returns the number of entries that were processed so far. This only reports the entries in the C<table> element, not counting the one in the nested C<group>. =cut sub current_table_entry_count { return $_[0]{'lrp_curr_table'}{'entry_count'}; } 1; __END__ =pod =head1 SEE ALSO Lire::Report(3pm), Lire::XMLParser(3pm) Lire::ReportParser::AsciiDocBookFormatter(3pm), Lire::ReportParser::AsciiWriter(3pm), Lire::ReportParser::HTMLDocBookFormatter(3pm), Lire::ReportParser::HTMLWriter(3pm), Lire::ReportParser::ReportBuilder(3pm), Lire::ReportParser::ExcelWriter(3pm), Lire::Report::TableInfo(3pm) =head1 AUTHOR Francis J. Lacoste <flacoste@logreport.org> =head1 VERSION $Id: ReportParser.pm,v 1.52 2006/07/23 13:16:29 vanbaal Exp $ =head1 COPYRIGHT Copyright (C) 2001-2004 Stichting LogReport Foundation LogReport@LogReport.org This file is part of Lire. Lire is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program (see COPYING); if not, check with http://www.gnu.org/copyleft/gpl.html. =cut