;;; ;;; mp4h.mp4h -- Documentation for mp4h ;;; Copyright 2000-2002, Denis Barbier ;;; .out alt=" /mp4h.mp4h > mp4h.html ../src/mp4h -D format=pod -D srcdir="" /mp4h.mp4h | \\ sed -e 's/_LT_//g' > mp4h.pod " /> /> ;;; ;;; We need to change comment characters to view original source code ;;;
Introduction
The &mp4h; software is a macro-processor specifically designed to deal with HTML documents. It allows powerful programming constructs, with a syntax familiar to HTML authors. This software is based on , written by Brian J. Fox, Even if both syntaxes look similar, source code is completely different. Indeed, a subset of &Meta-HTML; was used as a part of a more complex program, &WML; () written by Ralf S. Engelschall and which i maintain since January 1999. For licensing reasons, it was hard to hack &Meta-HTML; and so i decided to write my own macro-processor. Instead of rewriting it from scratch, I preferred using another macro-processor engine. I chose , written by René Seindal, because of its numerous advantages : this software is stable, robust and very well documented. This version of &mp4h; is derived from GNU &m4; version 1.4n, which is a development version. The &mp4h; software is not an HTML editor; its unique goal is to provide an easy way to define its own macros inside HTML documents. There is no plan to add functionalities to automagically produce valid HTML documents, if you want to clean up your code or validate it, simply use a post-processor like .
Command line options
Optional arguments are enclosed within square brackets. All option synonyms have a similar syntax, so when a long option accepts an argument, short option do too. Syntax call is mp4h [options] [filename [filename] ...] Options are described below. If no filename is specified, or if its name is , then characters are read on standard input. Operation modes Preprocessor features Parser features NUMBER is a combination of In version <__version__ />, default value is 3114=2+8+32+1024+2048. Limits control Debugging Flags are any of:
Description
The &mp4h; software is a macro-processor, which means that keywords are replaced by other text. This chapter describes all primitives. As &mp4h; has been specially designed for HTML documents, its syntax is very similar to HTML, with tags and attributes. One important feature has no equivalent in HTML: comments until end of line. All text following three colons is discarded until end of line, like ;;; This is a comment ### ### Defining new macros ### Function Macros All examples in this documentation are processed through &mp4h; with expansion flags set to zero (see a description of possible expansion flags at the end of document), it is why simple tags contain a trailing slash. But &mp4h; can output plain HTML files with other expansion flags. The definition of new tags is the most common task provided by &mp4h;. As with HTML, macro names are case insensitive, unless option is used to change this default behaviour. In this documentation, only lowercase letters are used. There are two kinds of tags: simple and complex. A simple tag has the following form: whereas a complex tag looks like: body Since version 0.9.1, &mp4h; knows XHTML syntax too, so your input file may conform to HTML or XHTML syntax. In this manual, we adopt the latter, which is why simple tags have a trailing slash in attributes. If you want to produce HTML files with this input file, you may either choose an adequate flag or use a post-processor like . When a simple tag is defined by &mp4h;, it can be parsed even if the trailing slash is omitted, because &mp4h; knows that this tag is simple. But it is a good practice to always append a trailing slash to simple tags. In macro descriptions below, a slash indicates a simple tag, and a V letter that attributes are read verbatim (without expansion) (see the chapter on macro expansion for further details). [attributes=verbatim] [endtag=required] [whitespace=delete] This function lets you define your own tags. First argument is the command name. Replacement text is the function body. bar Even if spaces have usually few incidence on HTML syntax, it is important to note that bar and bar are not equivalent, the latter form contains two newlines that were not present in the former. Some spaces are suppressed in replacement text, in particular any leading or trailing spaces, and newlines not enclosed within angle brackets. Define a complex tag bar ;;; body is: %body Here it is By default attributes are expanded before text is replaced. If this attribute is used, attributes are inserted into replacement text without expansion. quux Body: %Ubody Attributes: %Uattributes Here we go [attributes=verbatim] [endtag=required] [whitespace=delete] This command is similar to the previous one, except that no operation is performed if this command was already defined. = Copy a function. This command is useful to save a macro definition before redefining it. one two Delete a command definition. one [position=before|after] [action=insert|append|replace] Add text to a predefined macro. This mechanism allows modifications of existing macros without having to worry about its type, whether it is complex or not. Before After [position=before|after] Print current hooks of a macro. Text inserted with position=before:! Text inserted with position=after:! Like , except that pairs are printed with double quotes surrounding attribute values, and a leading space is added if some text is printed. ;;; %attributes /> [,[,...]] Extract from the pairs for names matching any of , .... /> [,[,...]] Remove from the pairs for names matching any of , .... />/> The two previous functions are special, because unlike all other macros, their expansion do not form a group. This is necessary to parse the resulting list of attributes. In those two functions, names of attributes may be regular expressions. Main goal of these primitives is to help writing macros accepting any kind of attributes without having to declare them. A canonical example is /> /> href=""> But we want now to add an image attribute. So we may write /> /> href=""> /> src="" alt="" border=0 /> /> We need a mechanism to tell &mp4h; that some attributes refer to specific HTML tags. A solution is to prepend attribute with tag name, e.g. /> /> href=""> /> src="" alt="" /> /> This example shows that regular expressions may be used within attributes names, but it is still incomplete, because we want to remove prefix from attributes. One solution is with , but there is a more elegant one: /> /> href=""> /> src="" alt="" /> /> When there are subexpressions within regular expressions, they are printed instead of the whole expression. Note also that i put a colon before the prefix in order not to mix them with XML namespaces. ### ### Entity functions ### Entities Entities are macros in the same way as tags, but they do not take any arguments. Whereas tags are normally used to mark up text, entities contain already marked up text. Also note that unlike tags, entities are by default case sensitive. An entity has the following form: &entity; This function lets you define your own entities. First argument is the entity name. Replacement text is the function body. bar &foo; ### ### Variable functions ### Variables Variables are a special case of simple tags, because they do not accept attributes. In fact their use is different, because variables contain text whereas macros act like operators. A nice feature concerning variables is their manipulation as arrays. Indeed variables can be considered like newline separated lists, which will allow powerful manipulation functions as we will see below. [=] [[=]] ... This command sets variables. [=] [[=]] ... As above but attributes are read verbatim. name= This command assigns a variable the value of the body of the command. This is particularly useful when variable values contain newlines and/or quotes. Note that the variable can not be indexed with this command. Note also, that this command behaves as set-var-verbatim: The body is not expanded until the variable is shown with get-var. [] ... Show variable contents. If a numeric value within square brackets is appended to a variable name, it represents the index of an array. The first index of arrays is 0 by convention. This is version Operating sytem is "" [] ... As above but attributes are not expanded. 0.10.1 ;;; Here is version ;;; Here is version ;;; Here is version [] ... All variables are global, there is no variable or macro scope. For this reason a stack is used to preserve variables. When this command is invoked, arguments are names of variables, whose values are put at the top of the stack and variables are reset to an empty string. [] ... This is the opposite: arguments are names of variables, which are set to the value found at the top of the stack, and stack is popped down. The tag pushes its last argument first, whereas first pops its first argument. Inside: src= name= text= Before: src= name= text= After: src= name= text= [] ... Undefine variables. Returns when this variable exists. [by=] Increment the variable whose name is the first argument. Default increment is one. "> Change increment amount. [by=] Decrement the variable whose name is the first argument. Default decrement is one. "> Change decrement amount. Copy a variable into another. If this variable is not defined or is defined to an empty string, then it is set to the second argument. Show informations on symbols. If it is a variable name, the word is printed as well as the number of lines contained within this variable. If it is a macro name, one of the following messages is printed: , , or bar quux ### ### String functions ### String Functions Prints the length of the string. ;;; /> ;;; /> />;;; Convert to lowercase letters. Convert to uppercase letters. Convert to a title, with a capital letter at the beginning of every word. [ []] Extracts a substring from a string. First argument is original string, second and third are respectively start and end indexes. By convention first character has a null index. 4 /> 4 6 /> [caseless=true] Returns if first two arguments are equal. 1: 2: Comparison is case insensitive. 1: 2: [caseless=true] Returns if the first two arguments are not equal. 1: 2: Comparison is case insensitive. 1: 2: [caseless=true] Compares two strings and returns one of the values less, greater or equal depending on this comparison. 1: 2: Comparison is case insensitive. 1: [caseless=true] Prints an array containing indexes where the character appear in the string. Comparison is case insensitive. 1: 2: [ ...] Prints according to a given format. Currently only the flag character is recognized, and extension is supported to change order of arguments. 1: 2: ### ### Regexp functions ### Regular Expressions Regular expression support is provided by the PCRE (Perl Compatible Regular Expressions) library package, which is open source software, copyright by the University of Cambridge. This is a very nice piece of software, latest versions are available at . Before version 1.0.6, POSIX regular expressions were implemented. For this reason, the following macros recognize two attributes, and . But Perl allows a much better control on regular expressions with so called modifiers, which are assed to the new attribute. It may contain one or more modifiers: Attribute is a synonym for the modifier, whereas is a synonym for the modifier. This behaviour was different up to &mp4h; 1.0.6. [] [caseless=true] [singleline=true|false] [reflags=[imsx]] Replace a regular expression in a string by a replacement text. "[c-e]" /> "([c-e])" "\\1 " /> ".$" "" /> ".$" "" singleline=false /> " ([a-c]) | [0-9] " ":\\1:" reflags=x /> [] [caseless=true] [singleline=true|false] [reflags=[imsx]] Performs substitutions inside variable content. [caseless=true] [singleline=true|false] [reflags=[imsx]] [action=report|extract|delete|startpos|endpos|length] Prints if string contains regexp. Prints the expression matching regexp in string. Prints the string without the expression matching regexp in string. Prints the first char of the expression matching regexp in string. If there is no match, returns . Prints the last char of the expression matching regexp in string. If there is no match, returns . Prints the length of the expression matching regexp in string. 1: 2: 3: 4: 5: 6: ### ### Array functions ### Arrays With &mp4h; one can easily deal with string arrays. Variables can be treated as a single value or as a newline separated list of strings. Thus after defining one can view its content or one of these values: Returns an array size which is the number of lines present in the variable. Add a value (or more if this value contains newlines) at the end of an array. Remove the toplevel value of an array and returns this string. Prints the last entry of an array. [caseless=true] Add a value at the end of an array if this value is not already present in this variable. Comparison is case insensitive. [] ... Concatenates all arrays into the first one. [caseless=true] If value is contained in array, returns its index otherwise returns -1. Comparison is case insensitive. [start=] Shifts an array. If offset is negative, indexes below 0 are lost. If offset is positive, first indexes are filled with empty strings. Now: And: "> Change origin of shifts (default is 0). [caseless=true] [numeric=true] [sortorder=reverse] Sort lines of an array in place. Default is to sort lines alphabetically. Comparison is case insensitive. Sort lines numerically Reverse sort order ;;; ### ### Numerical operators ### Numerical operators These operators perform basic arithmetic operations. When all operands are integers result is an integer too, otherwise it is a float. These operators are self-explanatory. [] ... [] ... [] ... [] ... [] ... [] ... />" /> /> Unlike functions listed above the modulo function cannot handle more than 2 arguments, and these arguments must be integers. Those functions compare two numbers and returns when this comparison is true. If one argument is not a number, comparison is false. Returns if first argument is greater than second. Returns if first argument is lower than second. Returns if arguments are equal. Returns if arguments are not equal. ### ### Relational operators ### Relational operators Returns if string is empty, otherwise returns an empty string. [] ... Returns the last argument if all arguments are non empty. [] ... Returns the first non empty argument. ### ### Flow functions ### Flow functions [] ... [separator=] This function groups multiple statements into a single one. Some examples will be seen below with conditional operations. A less intuitive but very helpful use of this macro is to preserve newlines when is specified. Text on 3 lines without whitespace=delete Text on 3 lines with whitespace=delete Note that newlines are suppressed in and result is certainly unwanted. [] ... [separator=] Like , but this tag is complex. "> By default arguments are put aside. This attribute define a separator inserted between arguments. Does the opposite job to , its argument is no more treated as a single object when processed by another command. [] ... Prints its arguments without expansion. They will never be expanded unless the tag is used to cancel this tag. [] ... Cancels the tag. bar=LT=/define-tag>" "=LT=" "<" /> quux=LT=/define-tag>" "=LT=" "" /> [] If string is non empty, second argument is evaluated otherwise third argument is evaluated. [] If first two arguments are identical strings, third argument is evaluated otherwise fourth argument is evaluated. [] If first two arguments are not identical strings, third argument is evaluated otherwise fourth argument is evaluated. When argument is not empty, its body is evaluated. While condition is true, body function is evaluated. 0 />>;;; ;;; [start=] [end=] [step=] This macro is similar to the Perl's macro: a variable loops over array values and function body is evaluated for each value. first argument is a generic variable name, and second is the name of an array. "> Skips first indexes. "> Stops after index has reached that value. "> Change index increment (default is 1). If step is negative, array is treated in reverse order. = [= ... This command performs multiple conditions with a single instruction. x /> x=2 x /> y=1 y /> y=2 y /> /> Breaks the innermost loop. 0 />>;;; ;;; 5 />;;; [up=number] This command immediately exits from the innermost macro. A message may also be inserted. But this macro changes token parsing so its use may become very hazardous in some situations. "> This attribute determines how much levels have to be exited. By default only one level is skipped. With a null value, all current macros are exited from. A negative value do the same, and stops processing current file. Prints a warning on standard error. [status=] [message=] Immediately exits program. "> Prints a message to the standard error. "> Selects the code returned by the program (-1 by default). This is a special command: its content is stored and will be expanded after end of input. ### ### File functions ### File functions [matching=regexp] Returns a newline separated list of files contained in a given directory. patname= Resolves all symbolic links, extra ``/'' characters and references to /./ and /../ in pathname, and expands into the resulting absolute pathname. All but the last component of pathname must exist when real-path is called. This tag is particularly useful when comparing if file or directory names are identical. /> Returns if file exists. Returns an array of informations on this file. These informations are: size, type, ctime, mtime, atime, owner and group. /> file= | command= [alt=] [verbatim=true] Insert the contents of a file in the file system - if the attribute is given -, or the output from executing a system command - if the attribute is given - into the input stream. For backwards compatibility, if neither the nor the attributes are given, the first argument is taken as a file to include. "> The given file is read and inserted into the input stream. This attribute cannot be combined with the command attribute. Files are first searched in current directory, then in directories specified on command-line with the option, next in directories listed in the MP4HLIB environment variable (it used to be MP4HPATH for versions prior to 1.3), and last under the compile-time location (/usr/local/lib/mp4h/<__version__ />:/usr/local/share/mp4h by default). "> The given command line is executed on the operating system, and the output of it is inserted in the input stream. This attribute cannot be combined with the file attribute. The given command line is executed using the popen(3) standard C library routine. The command is executed using the standard system shell, which on Posix compliant systems is sh(1). "> If file is not found, this alternate action is handled. If this atribute is not set and file is not found, then an error is raised. This attribute has no effect when the command attribute is specified. File content is included without expansion. This is similar to using the &m4; undivert macro with a filename as argument. name= Load definitions from a package file. This tag does nothing, its body is simply discarded. [] Change comment characters. [ ] [display=visible] By default, all characters between and pairs are read without parsing. When called without argument, this macro inhibates this feature. When called with two arguments, it redefines begin and end delimiters. Begin delimiter must begin with a left-angle bracket, and end delimiter must end with a right-angle bracket. Delimiters are also written into output. ### ### Diversion functions ### Diversion functions Diversions are a way of temporarily saving output. The output of &mp4h; can at any time be diverted to a temporary file, and be reinserted into the output stream, undiverted, again at a later time. Numbered diversions are counted from 0 upwards, diversion number 0 being the normal output stream. The number of simultaneous diversions is limited mainly by the memory used to describe them, because &mp4h; tries to keep diversions in memory. However, there is a limit to the overall memory usable by all diversions taken altogether. When this maximum is about to be exceeded, a temporary file is opened to receive the contents of the biggest diversion still in memory, freeing this memory for other diversions. So, it is theoretically possible that the number of diversions be limited by the number of available file descriptors. [ divnum= ] Output is diverted using this tag, where diversion-number is the diversion to be used. If the divnum attribute is left out, diversion-number is assumed to be zero. If output is diverted to a non-existent diversion, it is simply discarded. This can be used to suppress unwanted output. See the example below. When all &mp4h; input will have been processed, all existing diversions are automatically undiverted, in numerical order. Several calls of divert with the same argument do not overwrite the previous diverted text, but append to it. This is sent nowhere... This is output. [ divnum= ] This tag explicitly undiverts diverted text saved in the diversion with the specified number. If the divnum attribute is not given, all diversions are undiverted, in numerical order. When diverted text is undiverted, it is not reread by &mp4h;, but rather copied directly to the current output. It is therefore not an error to undivert into a diversion. Unlike &m4;, the &mp4h; undivert tag does not allow a file name as argument. The same can be accomplished with the include tag with the verbatim="true" attribute. This text is diverted. This text is not diverted. This tag expands to the number of the current diversion. Initial Diversion one: Diversion two: ### ### Debugging ### Debugging functions When constructs become complex it could be hard to debug them. Functions listed below are very useful when you could not figure what is wrong. These functions are not perfect yet and must be improved in future releases. Prints the replacement text of a user defined macro. For instance, the macro used to generate all examples of this documentation is This comand acts like the flag but can be ynamically changed. Selects a file where debugging messages are diverted. If this filename is empty, debugging messages are sent back to standard error, and if it is set to these messages are discarded. There is no way to print these debugging messages into the document being processed. [] ... Declare these macros traced, i.e. informations about these macros will be printed if flag or macro are used. [] ... These macros are no more traced. ### ### Miscellaneous ### Miscellaneous [] Without argument this macro prints current input filename. With an argument, this macro sets the string returned by future invocation of this macro. [] Without argument this macro prints current number line in input file. With an argument, this macro sets the number returned by future invocation of this macro. This is <__file__/>, line <__line__/>. If you closely look at source code you will see that this number is wrong. Indeed the number line is the end of the entire block containing this instruction. Prints the version of &mp4h;. Discard all characters until newline is reached. This macro ensures that following string is a comment and does not depend of the value of comment characters. This is a comment foo This is a comment bar [epoch] Prints local time according to the epoch passed on argument. If there is no argument, current local time is printed. An epoch time specification. A format specification as used with the strftime(3) C library routine. /> /> /> Prints the time spent since last call to this macro. The printed value is the number of clock ticks, and so is dependant of your CPU. The number of clock ticks since the beginning of generation of this documentation by &mp4h; is: = Set locale-specific variables. By default, the portable "C" locale is selected. As locales have different names on different platforms, you must refer to your system documentation to find which values are adapted to your system. Change the output format of floats by setting the number of digits after the decimal point. Default is to print numbers in the "%6.f" format.
External packages
It is possible to include external files with the command. Files are first searched in current directory, then in directories specified on command-line with the option, next in directories listed in the MP4HLIB environment variable (it used to be MP4HPATH for versions prior to 1.3), and last under the compile-time location (/usr/local/lib/mp4h/<__version__ />:/usr/local/share/mp4h by default). Another way to include packages is with the command. There are two differences between and : first, package name has no suffix; and more important, a package cannot be loaded more than once.
Macro expansion
This part describes internal mechanism of macro expansion. It must be as precise and exhaustive as possible so if you have any suggestion. Basics Let us begin with some examples: This is a simple tag This is a complex tag Body function User defined macros may have attributes like HTML tags. To handle these attributes in replacement text, following conventions have been adopted (mostly derived from &Meta-HTML;): Sequence is replaced by the command name. Attributes are numbered from 0. In replacement text, is replaced by first argument, by the 2nd, etc. As there is no limitation on the number of arguments, is the 21st argument and not the third followed by the 0 letter. %1 Sequence prints number of attributes. Sequence is replaced by , which is useful in nested definitions. ;;; outer, # attributes: %# ;;; inner1, # attributes: %#;;; ;;; ;;; inner2, # attributes: %%#;;; ;;; Sequence is replaced by the space separated list of attributes. Sequence is replaced by the body of a complex macro. %body Dr. Foo The two forms above accept modifiers. When or is used, a newline separated list of attributes is printed. : " /> Another alternate form is obtained by replacing by , in which case text is replaced but will not be expanded. This does make sense only when macro has been defined with , otherwise attributes are expanded before replacement. Before expansion: %Uattributes After expansion: %attributes Before expansion: %Uattributes After expansion: %attributes and here %attributes /> /> Modifiers and can be combined. Input expansion is completely different in &Meta-HTML; and in &mp4h;. With &Meta-HTML; it is sometimes necessary to use other constructs like and . In order to improve compatibity with &Meta-HTML;, these constructs are recognized and are interpreted like . Another feature provided for compatibility reason is the fact that for simple tags and are equivalent. These features are in the current &mp4h; version but may disappear in future releases. Attributes Attributes are separated by spaces, tabulations or newlines, and each attribute must be a valid &mp4h; entity. For instance with the definitions above, can not be an attribute since it must be finished by . But this is valid: /> or even /> In these examples, the tag has only one argument. Under certain circumstances it is necessary to group multiple statements into a single one. This can be done with double quotes or with the primitive, e.g. /> Unlike HTML single quotes can not replace doube quotes for this purpose. If double quotes appear in an argument, they must be escaped by a backslash " />. Macro evaluation Macros are characterized by name container status (simple or complex) if attributes are expanded or not function type (primitive or user defined macro) for primitives, adress of corresponding code in memory and for user defined macros the replacement text Characters are read on input until a left angle bracket is found. Then macro name is read. After that attributes are read, verbatim or not depending on how this macro as been defined. And if this macro is complex, its body is read verbatim. When this is finished, some special sequences in replacement text are replaced (like , , , , etc.) and resulting text is put on input stack in order to be rescanned. By default attributes are evaluated before any replacement. %body Consider the following example, to change text in typewriter font: %body This definition has a major drawback: This is an example We would like the inner tags be removed. 1 "" /> %body 1 "" /> First idea is to use an auxiliary variable to know whether we still are inside such an environment: 1 "" /> %body 1 "" /> (the presence of asterisks in HTML tags is explained in next section). This is an example But if we use simple tags, as in the example below, our definition does not seem to work. It is because attributes are expanded before they are put into replacement text. %attributes " /> If we want to prevent this problem we have to forbid attributes expansion with ;;; %attributes;;; " /> Expansion flags When you want to embed some server-side scripting language in your pages, you face up some weird problems, like in >Hello The question is how do &mp4h; know that this input has some extra delimiters? The answer is that &mp4h; should not try to handle some special delimiters, because it cannot handle all of them (there are ASP, ePerl, PHP,... and some of them are customizable). Now, remember that &mp4h; is a macro-processor, not an XML parser. So we must focus on macros,and format our input file so that it can be parsed without any problem. Previous example may be written Hello because quotes prevent inner right-angle bracket from closing the tag. Another common problem is when we need to print only a begin or an end tag alone. For instance it is very desirable to define its own headers and footers with ... put here some informations .... Asterisks mark these tags as pseudo-simple tags, which means that they are complex HTML tags, but used as simple tags within &mp4h; because tags would not be well nested otherwise. This asterisk is called ``trailing star'', it appears at the end of the tag name. Sometimes HTML tags are not parsable, as in this javascript code: ... document.write('<*img src="foo.gif"'); if (text) document.write(' alt="'+text+'"'); document.write('>'); ... The ``leading star'' is an asterisk between left-angle bracket and tag name, which prevents this tag from being parsed. That said we can now understand what the flag is for. It controls how expansion is performed by mp4h. It is followed by an integer, which is a bit sum of the following values Run mp4h -h to find default value. Current value matches HTML syntax, and it will tend to zero when XHTML syntax becomes more familiar.
Author
Mp4h has its own .
Thanks
Sincere thanks to Brian J. Fox for writing &Meta-HTML; and Rene Seindal for maintaining this wonderful macro parser called GNU &m4;.