(* $Id: pxp_document.mli 696 2004-08-20 14:49:57Z gerd $ * ---------------------------------------------------------------------- * PXP: The polymorphic XML parser for Objective Caml. * Copyright by Gerd Stolpmann. See LICENSE for details. *) (**********************************************************************) (* *) (* Pxp_document: *) (* Object model of the document/element instances *) (* *) (**********************************************************************) (* QUESTIONS: * - T_attribute of (string * att_value) * may be better. Attributes do not have attributes (XPATH?) *) (* ====================================================================== * OVERVIEW * * class type node ............. The common class type of the nodes of * the element tree. Nodes are either * elements (inner nodes) or data nodes * (leaves) * class type extension ........ The minimal properties of the so-called * extensions of the nodes: Nodes can be * customized by applying a class parameter * that adds methods/values to nodes. * class data_impl : node ...... Implements data nodes. * class element_impl : node ... Implements element nodes * class document .............. A document is an element with some additional * properties * * ====================================================================== * * THE STRUCTURE OF NODE TREES: * * Every node except the root node has a parent node. The parent node is * always an element, because data nodes never contain other nodes. * In the other direction, element nodes may have children; both elements * and data nodes are possible as children. * Every node knows its parent (if any) and all its children (if any); * the linkage is maintained in both directions. A node without a parent * is called a root. * It is not possible that a node is the child of two nodes (two different nodes * or a multiple child of the same node). * You can break the connection between a node and its parent; the method * "delete" performs this operations and deletes the node from the parent's * list of children. The node is now a root, for itself and for all * subordinate nodes. In this context, the node is also called an orphan, * because it has lost its parent (this is a bit misleading because the * parent is not always the creator of a node). * In order to simplify complex operations, you can also set the list of * children of an element. Nodes that have been children before are unchanged; * new nodes are added (and the linkage is set up), nodes no more occurring * in the list are handled if they have been deleted. * If you try to add a node that is not a root (either by an "add" or by a * "set" operation) the operation fails. * * CREATION OF NODES * * The class interface supports creation of nodes by cloning a so-called * exemplar. The idea is that it is sometimes useful to implement different * element types by different classes, and to implement this by looking up * exemplars. * Imagine you have three element types A, B, and C, and three classes * a, b, and c implementing the node interface (for example, by providing * different extensions, see below). The XML parser can be configured to * have a lookup table * { A --> a0, B --> b0, C --> c0 } * where a0, b0, c0 are exemplars of the classes a, b, and c, i.e. empty * objects belonging to these classes. If the parser finds an instance of * A, it looks up the exemplar a0 of A and clones it (actually, the method * "create_element" performs this for elements, and "create_data" for data * nodes). Clones belong to the same class as the original nodes, so the * instances of the elements have the same classes as the configured * exemplars. * Note: This technique assumes that the interface of all exemplars is the * same! * * THE EXTENSION * * The class type node and all its implementations have a class parameter * 'ext which must at least fulfil the properties of the class type "extension". * The idea is that you can add properties, for example: * * class my_extension = * object * (* minimal properties required by class type "extension": *) * method clone = ... * method node = ... * method set_node n = ... * (* here my own methods: *) * method do_this_and_that ... * end * * class my_element_impl = [ my_extension ] element_impl * class my_data_impl = [ my_extension ] data_impl * * The whole XML parser is parameterized with 'ext, so your extension is * visible everywhere (this is the reason why extensibility is solved by * parametric polymorphism and not by inclusive polymorphism (subtyping)). * * * SOME COMPLICATED TYPE EXPRESSIONS * * Sometimes the following type expressions turn out to be necessary: * * 'a node extension as 'a * This is the type of an extension that belongs to a node that * has an extension that is the same as we started with. * * 'a extension node as 'a * This is the type of a node that has an extension that belongs to a * node of the type we started with. * * * DOCUMENTS * ... * * ====================================================================== * * SIMPLE USAGE: ... *) (* ====================================================================== * THE DYNAMIC MODIFICATION OF NODE TREES AND VALIDATION * ====================================================================== * * The parser creates a node tree while parsing the input text, and the * node tree can be modified later by some transformation algorithm. For * both tasks the same interface may be used. However, PXP 1.0 introduced * an interface that did not separate the two aspects "modification of the * tree" and "validation of the tree", i.e. modification methods also * did some validation. The following two sections describe: The PXP 1.0 * model, and the PXP 1.1 changes. * * ------- * PXP 1.0 * ------- * * Method add_node: There are two different modes selected by the optional * argument ~force. ~force:true simply adds the node as last child to the * current node. However, ~force:false (the default) performs some validation * checks that may have three results: (1) The node is added, (2) The node * is silently dropped, (3) An error condition is detected, and an exception * is raised. The mode ~force:false is used by the parser, and historically, * add_node was designed as the parser's method of adding new nodes; ~force * was added later. * * The checks are only performed if the added node is a text node (node type * is T_data), and if the current element node has a type restricting the * addition of text nodes. In detail, the following is checked: * - If the element has type EMPTY, the addition of whitespace text is * not rejected, but the text is dropped (case 2). The addition of * other text material is an error (case 3). * - If the element has a regexp type, the addition of whitespace text is * not rejected, but the text is dropped (case 2); however there is * a special mode forcing to add such whitespace text nodes (see below). * The addition of other text material is an error (case 3). * Furthermore, it is also an error if whitespace text is added, and the * document is stand-alone, and the element is declared in an external * entity. * * Method keep_always_whitespace_mode: turns a special mode on forcing that * whitespace text nodes inside regexp-type elements are always added. * * Method internal_init (i.e. object creation): When an element node is * created, the attribute list is passed as (string * string) list. This * method compares this list with the declared attlist of the DTD, and * - adds missing attributes if the DTD has a default value * - rejects nondeclared attributes * - checks whether required attributes are passed * - parses and normalizes attribute values * - checks some conditions for stand-alone documents * * Method local_validate: Checks whether the subnodes of the element match * the type of the element. * * --------------------- * PROBLEMS WITH PXP 1.0 * --------------------- * * - It is not very obvious when validation checks are performed (which * methods do them and under which conditions) * - It is difficult to transform trees because the transformation algorithm * might call a modification method that also performs some validation checks, * but the tree is not yet valid because the algorithm is in the middle of * the transformation * * ------- * PXP 1.1 * ------- * * New method append_node: always adds the node to the node list (same as * add_node ~force:true) * * New method classify_data_node: performs the checks of add_node ~force:false, * and returns the result: * - CD_other: The node to add is not a text node and cannot be classified * - CD_normal: The text node can be added * - CD_empty: The node is ignorable (= empty), and the containing * element is declared as EMPTY. The parser must not add the node. * - CD_ignorable: The node contains ignorable whitespace, and the parser * should not add the node unless a special configuration forces the * addition * - CD_error: the rules do not allow to add the text node here * * Method add_node: is now deprecated. For compatibility, the method * classifies the node to add, and decides whether to add the node, not * to add the node, or whether to raise an exception. * * Method keep_always_whitespace_mode: is removed. A new parser option * modifies the behaviour of the parser such that ignorable whitespace is * added anyway (option drop_ignorable_whitespace = false). * * Object creation: You can pass attributes as (string * string) list, and * as (string * att_value) list; internal_init simply processes both lists. * Attributes passed as att_value are already normalized (and compatible * with the stand-alone declaration, if any); the method does not normalize * them again. Two options control validation: * ~valcheck: (default true) It is checked that there * is an element type declaration, or that the DTD is in well-formed * mode. Passing 'false' means that it is not checked whether the * element type exists, that you can add any attributes. * * New method validate_contents: The new name for local_validate; it is * checked whether the elements contained in the list of sub nodes match * the declared content model. * (The name local_validate is deprecated.) * * New method validate_attlist: Checks whether the attlist matches the * ATTLIST declaration. * (Impl.: Call create_element again with valcheck options ON.) * * New method validate: This method can be called after manual modifications * of the tree to ensure that the changed tree is still valid: * - All text subnodes must be classified as non-errorneous * - All element subnodes are validated by validate_subelements * - The attributes are validated by validate_attlist * Note that this method is not used by the parser. *) open Pxp_types open Pxp_dtd type node_type = T_element of string | T_data | T_super_root (* XPath calls them simply root nodes *) | T_pinstr of string (* The string is the target of the PI *) | T_comment | T_none | T_attribute of string (* The string is the name of the attribute *) | T_namespace of string (* The string is the namespace normprefix *) (* * * [node_type] * AUTO * This type enumerates the possible node types: * - [T_element name]: The node is an element and has element type [name] * - [T_data]: The node is a data node * - [T_super_root]: The node is a super root node * - [T_pinstr name]: The node contains a processing instruction with * target [name] * - [T_comment]: The node is a comment * - [T_attribute name]: The node contains an attribute called [name] * - [T_namespace prefix]: The node identifies a namespace for the * normalized [prefix] * - [T_none]: This is a "bottom value" used if there is no reasonable * type. * -- * *) (* About T_super_root, T_pinstr, T_comment: * These types are extensions to my original design. They have mainly * been added to simplify the implementation of standards (such as * XPath) that require that nodes of these types are included into the * main document tree. * There are options (see Pxp_yacc) forcing the parser to insert such * nodes; in this case, the nodes are actually element nodes serving * as wrappers for the additional data structures. The options are: * enable_super_root_node, enable_pinstr_nodes, enable_comment_nodes. * By default, such nodes are not created. *) (* About T_attribute, T_namespace: * These types are fully virtual. This means that it is impossible * to make the parser insert such nodes into the regular tree. They are * normally created by special methods to allow additional views on the * document tree. *) (* The result type of the method classify_data_node: *) type data_node_classification = CD_normal | CD_other | CD_empty | CD_ignorable | CD_error of exn (* * [data_node_classification] * AUTO * This type enumerates the result values of the method * [classify_data_node]. See the description of this method. * *) (* QUESTION: Perhaps we should reexport att_value here. It is the only * type from Pxp_types that is needed regularly. *) (* Regular definition: *) class type [ 'node ] extension = object ('self) method clone : 'self (* "clone" should return an exact deep copy of the object. *) method node : 'node (* "node" returns the corresponding node of this extension. This method * intended to return exactly what previously has been set by "set_node". *) method set_node : 'node -> unit (* "set_node" is invoked once the extension is associated to a new * node object. *) end ;; class type [ 'ext ] node = (* * 'ext [node] * class type 'ext node = object ... end * This is the common class type of all classes representing * nodes. * * Not all classes implement all methods. As the type system of O'Caml * demands that there must be always a method definition for all * methods of the type, methods will raise the exception * [Method_not_applicable] if they are called on a class not supporting * them. The exception [Namespace_method_not_applicable] is reserved * for the special case that a namespace method is invoked on a * class that does not support namespaces. * sig-class-type-node * *) object ('self) constraint 'ext = 'ext node #extension method extension : 'ext (* * * obj # [extension] * AUTO * Returns the extension object of the node object [obj]. * Applicable to element, data, comment, processing instruction, * and super root nodes. * *) method remove : unit -> unit (* * obj # [remove] () * AUTO * Removes [obj] from the tree. After this * operation, [obj] is no longer the child of the former father node, * i.e. it does neither occur in the former father's list of children * nor is the former father the parent of [obj]. The node [obj] * becomes orphaned. * * If [obj] is already a root, [remove] does nothing. * * Note: This method does not check whether the modified XML tree * is still valid. * Elements, comments, processing instructions, data nodes, * super root nodes. * node-delete * *) method delete : unit (* DEPRECATED METHOD * remove() does exactly the same *) method remove_nodes : ?pos:int -> ?len:int -> unit -> unit (* * obj # [remove_nodes] ~pos ~len () * AUTO * Removes the specified nodes from the list of children of * [obj]. The method deletes the nodes from position [pos] to * [pos+len-1]. The optional argument [pos] defaults to 0. The * optional argument [len] defaults to the length of the children * list. * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method parent : 'ext node (* * obj # [parent] * AUTO * Get the parent node, or raise [Not_found] if this node is * a root node. For attribute and namespace nodes, the parent is * artificially defined as the element to which these nodes apply. * All node types. * *) method root : 'ext node (* * obj # [root] * AUTO * Gets the root node of the tree. * Every node is contained in a tree with a root, so this method always * succeeds. Note that this method searches for the root, * which costs time proportional to the length of the path to the root. * All node types. * *) method orphaned_clone : 'self (* * obj # [orphaned_clone] * AUTO * Returns a clone of the node and the complete tree below * this node (deep clone). The clone does not have a parent (i.e. the * reference to the parent node is not cloned). While copying the * subtree strings are skipped; normally the original tree and the * copy tree share strings. Extension objects are cloned by invoking * the [clone] method on the original objects; how much of * the extension objects is cloned depends on the implemention of * this method. * All node types. * node-clone * *) method orphaned_flat_clone : 'self (* * obj # [orphaned_flat_clone] * AUTO * return a clone of this element where all subnodes are omitted. * The type of the node, and the attributes are the same as in the * original node. The clone has no parent. * All node types. * *) method append_node : 'ext node -> unit (* * obj # [append_node] n * AUTO * Adds the node [n] to the list of children of [obj]. The * method expects that [n] is a root, and it requires that [n] and * [obj] share the same DTD. * * Note: This method does not check whether the modified XML tree * is still valid. * This method is only applicable to element nodes. * node-add * *) method classify_data_node : 'ext node -> data_node_classification (* * obj # [classify_data_node] n * AUTO * Classifies the passed data node [n], and returns whether it * is reasonable to append the data node to the list of subnodes * (using [append_node]). The following return values are possible: * - [CD_normal]: Adding [n] does not violate any validation * constraint * - [CD_other]: [n] is not a data node * - [CD_empty]: The element [obj] is declared as [EMTPY], and * [n] contains the empty string. It is allowed to append * [n] but it does not make sense * - [CD_ignorable]: The element [obj] is declared such that * it is forbidden to put character data into it. However, * the node [n] only contains white space which is allowed * as an exception to this rule. This means that it is allowed * to append [n] but [n] would not contain any information * except formatting hints. * - [CD_error e]: It is an error to append [n]. The exception * [e], usually a [Validation_error], contains details about * the problem. * -- * Note that the method always returns and never raises an exception. * Elements. * *) method add_node : ?force:bool -> 'ext node -> unit (* add_node is now DEPRECATED; use append_node instead! *) (* Append new sub nodes -- mainly used by the parser itself, but * of course open for everybody. If an element is added, it must be * an orphan (i.e. does not have a parent node); and after addition * *this* node is the new parent. * The method performs some basic validation checks if the current node * has a regular expression as content model, or is EMPTY. You can * turn these checks off by passing ~force:true to the method. *) method insert_nodes : ?pos:int -> 'ext node list -> unit (* * obj # [insert_nodes] ~pos nl * AUTO * Inserts the list of nodes [nl] in-place into the list of * children of [obj]. The insertion is performed at position [pos], * i.e. in the modified list of children, the first element of * [nl] will have position [pos]. If the optional argument [pos] * is not passed to the method, the list [nl] is appended * to the list of children. * * The method requires that all elements of * the list [nl] are roots, and that all elements and [obj] * share the same DTD. * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method set_nodes : 'ext node list -> unit (* * obj # [set_nodes] l * AUTO * Sets the list of children to [l]. It is required that * every member of [l] is either a root or was already a children * of this node before the method call, and it is required that * all members and the current object share the same DTD. * * Former children which are not members of [l] are removed from * the tree and get orphaned (see method [remove]). * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method add_pinstr : proc_instruction -> unit (* * obj # [add_pinstr] pi * AUTO * Adds the processing instruction [pi] to the set of * processing instructions contained in [obj]. If [obj] is an * element node, you can add any number of processing instructions. * If [obj] is a processing instruction node, you can put at most * one processing instruction into this node. * Elements, and processing instruction nodes. * *) method pinstr : string -> proc_instruction list (* * obj # [pinstr] n * AUTO * Returns all processing instructions that are * directly contained in [obj] and that have a target * specification of [n]. * All node types. However, this method is only reasonable * for processing instruction nodes, and for elements; for all * other node types the method will return the empty list. Note * that the parser can be configured such that it creates * processing instruction nodes or not; in the first case, only * the processing instruction nodes contain processing instruction, * in the latter case, only the elements embracing the instructions * contain them. * *) method pinstr_names : string list (* * obj # [pinstr_names] * AUTO * Returns the targets of all processing instructions that are * directly contained in [obj]. * All node types. However, this method is only reasonable * for processing instruction nodes, and for elements; for all * other node types the method will return the empty list. Note * that the parser can be configured such that it creates * processing instruction nodes or not; in the first case, only * the processing instruction nodes contain processing instruction, * in the latter case, only the elements embracing the instructions * contain them. * *) method node_position : int (* * obj # [node_position] * AUTO * Returns the position of [obj] among all children of the parent * node. Positions are counted from 0. There are several cases: * - The regular nodes get positions from 0 to l-1 where l is the * length of the list of regular children. * - Attribute nodes and namespace nodes are irregular nodes, * which means here that their positions are counted seperately. * All attribute nodes have positions from 0 to m-1; all namespace * nodes have positions from 0 to n-1. * - If [obj] is a root, this method raises [Not_found]. * -- * All node types. * *) method node_path : int list (* * obj # [node_path] * AUTO * Returns the list of node positions describing * the location of this node in the whole tree. The list describes * the path from the root node down to this node; the first path * element is the index of the child of the root, the second path * element is the index of the child of the child, and so on, and * the last path element is the index of this node. The method returns * [[[]]] if this node is the root node. * * Attribute and namespace nodes are not part of the regular tree, so * there is a special rule for them. Attribute nodes of an element * node [x] have the node path [[x # node_path @ [-1; p]]] where * [p] is the position of the attribute node. Namespace nodes of an * element node [x] have the node path [[x # node_path @ [-2; p]]] * where [p] is the position of the namespace node. * (This definition respects the document order.) * All node types. * *) method sub_nodes : 'ext node list (* * obj # [sub_nodes] * AUTO * Returns the regular children of the node as list. Only * elements, data nodes, comments, and processing instructions can * occur in this list; attributes and namespace nodes are not * considered as regular nodes, and super root nodes can only * be root nodes and will never be children of another node. * The returned list is always empty if [obj] is a data node, * comment, processing instruction, attribute, or namespace. * All node types. * *) method iter_nodes : ('ext node -> unit) -> unit (* * obj # [iter_nodes] f * AUTO * Iterates over the regular children of [obj], and * calls the function [f] for every child ch: [f ch]. The * regular children are the nodes returned by [sub_nodes], see * there for an explanation. * All node types. * document-iterators * *) method iter_nodes_sibl : ('ext node option -> 'ext node -> 'ext node option -> unit) -> unit (* * obj # [iter_nodes_sibl] f * AUTO * Iterates over the regular children of [obj], and * calls the function [f] for every child: [f pred ch succ]. * - [ch] is the child * - [pred] is [None] if the child is the first in the list, * and [Some p] otherwise; [p] is the predecessor of [ch] * - [succ] is [None] if the child is the last in the list, * and [Some s] otherwise; [s] is the successor of [ch] * -- * The * regular children are the nodes returned by [sub_nodes], see * there for an explanation. * All node types. * document-iterators * *) method nth_node : int -> 'ext node (* * obj # [nth_node] n * AUTO * Returns the n-th regular child of [obj], [n >= 0]. * Raises [Not_found] if the index [n] is out of the valid range. * All node types. * *) method previous_node : 'ext node (* * obj # [previous_node] * AUTO * Returns the predecessor of [obj] * in the list of regular children of the parent, or raise [Not_found] * if this node is the first child. This is equivalent to * [obj # parent # nth_node (obj # node_position - 1)]. * All node types. * *) method next_node : 'ext node (* * obj # [next_node] * AUTO * Returns the successor of [obj] * in the list of regular children of the parent, or raise [Not_found] * if this node is the last child. This is equivalent to * [obj # parent # nth_node (obj # node_position + 1)]. * All node types. * *) method data : string (* * obj # [data] * AUTO * This method returns what is considered as * the data of the node which depends on the node type: * - Data nodes: the method returns the character string the node * represents * - Element nodes, super root nodes: the method returns the * concatenated character strings of all (direct or indirect) * data nodes below [obj] * - Comment nodes: the method returns the * comment string (without delimiters), or it raises Not_found if the * comment string is not set * - Processing instructions: the * method returns the data part of the instruction, or "" if the data * part is missing * - Attribute nodes: the method returns the attribute * value as string, or it raises [Not_found] if the attribute * is implied. * - Namespace nodes: the method returns the namespace * URI * -- * All node types. * *) method set_data : string -> unit (* * obj # [set_data] s * AUTO * This method sets the character string contained in * data nodes. * * Note: This method does not check whether the modified XML tree * is still valid. * Data nodes. * *) method node_type : node_type (* * obj # [node_type] * AUTO * Returns the type of [obj]: * - [T_element t]: The node is an element with type [t] * - [T_data]: The node is a data node * - [T_comment]: The node is a comment node * - [T_pinstr n]: The node is a processing instruction with * target [n] * - [T_super_root]: The node is a super root node * - [T_attribute n]: The node is an attribute with name [n] * - [T_namespace p]: The node is a namespace with normalized prefix [p] * -- * All node types. * * XXX: Where attribute and namespace nodes are discussed *) method position : (string * int * int) (* * obj # [position] * AUTO * Returns a triple [(entity,line,pos)] describing the * location of the element in the original XML text. This triple is * only available for elements, and only if the parser has been * configured to store positions (see parser option * [store_element_positions]). If available, [entity] describes * the entity where the element occurred, [line] is the line number * [>= 1], and [pos] is the byte position of the first character * of the element in the line. * * If unavailable, the method will return the triple [("?",0,0)]. * All node types. Note that the method will always return * [("?",0,0)] for non-element nodes. * *) method attribute : string -> Pxp_core_types.att_value (* * obj # [attribute] name * AUTO * Returns the value of the attribute [name]. * * If the parser is in validating mode, the method is able to return * values for declared attributes, and it raises [Not_found] for any * undeclared attribute. Note that it even returns a value if the * attribute is actually missing but is declared as [#IMPLIED] or * has a default value. * * If the parser (more precisely, the DTD object) is in * well-formedness mode, the method is able to return values for * defined attributes, and it raises [Not_found] for any * unknown attribute name. * * Possible return values are: * - [Implied_value]: The attribute has been declared with the * keyword [#IMPLIED], and the attribute definition is missing * in the attribute list of the element. * - [Value s]: The attribute has been declared as type [CDATA], * as [ID], as [IDREF], as [ENTITY], or as [NMTOKEN], or as * enumeration or notation, and one of the two conditions holds: * (1) The attribute value is defined in the attribute list in * which case this value is returned in the string [s]. (2) The * attribute has been omitted, and the DTD declares the attribute * with a default value. The default value is returned in [s]. * * Summarized, [Value s] is returned for non-implied, non-list * attribute values. * * Furthermore, [Value s] is returned for non-declared attributes * if the DTD object allows this, for instance, if the DTD * object specifies well-formedness mode. * - [Valuelist l]: The attribute has been declared as type * [IDREFS], as [ENTITIES], or [NMTOKENS], and one of the two * conditions holds: (1) The attribute value is defined in the * attribute list in which case the space-separated tokens of * the value are returned in the string list [l]. (2) The * attribute has been omitted, and the DTD declares the attribute * with a default value. The default value is returned in [l]. * * Summarized, [Valuelist l] is returned for all list-type * attribute values. * -- * Note that before the attribute value is returned, the value is * normalized. This means that newlines are converted to spaces, and * that references to character entities (i.e. [&#n;]) and * general entities (i.e. [&name;]) are expanded; if necessary, * the expansion is performed recursively. * All node types. However, only elements and attribute nodes * will return values, all other node types always raise [Not_found]. * *) method attribute_names : string list (* * obj # [attribute_names] * AUTO * Returns the list of all attribute names of this element. * In validating mode, this list is simply the list of declared * attributes. In well-formedness mode, this list is the list of * defined attributes. * All node types. However, only elements and attribute nodes * will return a non-empty list, all other node types always return * the empty list. * *) method attribute_type : string -> Pxp_core_types.att_type (* * obj # [attribute_type] name * AUTO * Returns the type of the attribute [name]. If the attribute * is declared, the declared type is returned. If the attribute is * defined but undeclared, the type [A_cdata] will be returned. * (The module [Pxp_types] contains the Caml type of attribute types.) * This method raises [Not_found] if the attribute is unknown. * All node types. However, only elements and attribute nodes * will return values, all other node types always raise [Not_found]. * *) method attributes : (string * Pxp_core_types.att_value) list (* * obj # [attributes] * AUTO * Returns the list of [(name,value)] pairs describing * all attributes (declared attributes plus defined attributes). * All node types. However, only elements and attribute nodes * will return non-empty values, all other node types always * return the empty list. * *) method required_string_attribute : string -> string (* * obj # [required_string_attribute] name * AUTO * Returns the value of the attribute [name] as string, * i.e. if the value of the attribute is [Value s], this method * will return simply [s], and if the value is [Valuelist l], * this method will return the elements of [l] separated by * spaces. If the attribute value is [Implied_value], the method * will fail. * All node types. However, only elements and attribute nodes * will return values, all other node types always fail. * *) method required_list_attribute : string -> string list (* * obj # [required_list_attribute] name * AUTO * Returns the value of the attribute [name] as string list, * i.e. if the value of the attribute is [Valuelist l], this method * will return simply [l], and if the value is [Value s], * this method will return the one-element list [[[s]]]. * If the attribute value is [Implied_value], the method * will fail. * All node types. However, only elements and attribute nodes * will return values, all other node types always fail. * *) method optional_string_attribute : string -> string option (* * obj # [optional_string_attribute] name * AUTO * Returns the value of the attribute [name] as optional string, * i.e. if the value of the attribute is [Value s], this method * will return [Some s], and if the value is [Valuelist l], * this method will return [Some s] where [s] consists of the * concatenated elements of [l] separated by spaces. If the * attribute value is [Implied_value], the method will return [None]. * All node types. However, only elements and attribute nodes * will return [Some] values, all other node types always return [None]. * *) method optional_list_attribute : string -> string list (* * obj # [optional_list_attribute] name * AUTO * Returns the value of the attribute [name] as string list, * i.e. if the value of the attribute is [Valuelist l], this method * will return simply [l], and if the value is [Value s], * this method will return the one-element list [[[s]]]. * If the attribute value is [Implied_value], the method * will return the empty list [[[]]]. * All node types. However, only elements and attribute nodes * will return non-empty values, all other node types always * return the empty list. * *) method id_attribute_name : string (* * obj # [id_attribute_name] * AUTO * Returns the name of the (at most one) attribute being * declared as type [ID]. The method raises [Not_found] if there * is no declared [ID] attribute for the element type. * All node types. However, only elements and attribute nodes * will return names, all other node types always raise [Not_found]. * *) method id_attribute_value : string (* * obj # [id_attribute_value] * AUTO * Returns the string value of the (at most one) attribute being * declared as type [ID]. The method raises [Not_found] if there * is no declared [ID] attribute for the element type. * All node types. However, only elements and attribute nodes * will return names, all other node types always raise [Not_found]. * *) method idref_attribute_names : string list (* * obj # [idref_attribute_names] * AUTO * Returns the names of the attributes being * declared as type [IDREF] or [IDREFS]. * All node types. However, only elements and attribute nodes * will return names, all other node types always return the empty * list. * *) method quick_set_attributes : (string * Pxp_core_types.att_value) list -> unit (* DEPRECATED METHOD! set_attributes does exactly the same. *) method set_attributes : (string * Pxp_core_types.att_value) list -> unit (* * obj # [set_attributes] al * AUTO * Sets the attributes of this element to [al]. * * Note that this method does not add missing attributes that are * declared in the DTD. It also never rejects undeclared attributes. * The passed values are not checked. * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method set_attribute : ?force:bool -> string -> Pxp_core_types.att_value -> unit (* * obj # [set_attribute] ~force n v * AUTO * Sets the attribute [n] of this element to the value [v]. * By default, it is required that the attribute [n] has already * some value. If you pass ~force:true, the attribute is added * to the attribute list if it is missing. * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method reset_attribute : string -> unit (* * obj # [reset_attribute] n * AUTO * If the attribute [n] is a declared attribute, it is set * to its default value, or to [Implied_value] if there is no default * (the latter is performed even if the attribute is [#REQUIRED]). * If the attribute is an undeclared attribute, it is removed * from the attribute list. * * The idea of this method is to simulate what had happened if [n] * had not been defined in the attribute list of the XML element. * In validating mode, the parser would have chosen the default * value if possible, or [Implied_value] otherwise, and in * well-formedness mode, the attribute would be simply missing * in the attribute list. * * Note: It is intentionally not possible to remove a declared * attribute. (However, you can remove it by calling * set_attributes, but this would be very inefficient.) * * Note: This method does not check whether the modified XML tree * is still valid. * Elements. * *) method attributes_as_nodes : 'ext node list (* * obj # [attributes_as_nodes] * AUTO * Returns all attributes (i.e. declared plus defined * attributes) as a list of attribute nodes with node type * [T_attribute name]. * * This method should be used if it is required for typing reasons * that the attributes have also type [node]. A common example * are sets that may both contain elements and attributes, as they * are used in the XPath language. * * The attribute nodes are read-only; any call to a method * modifying their contents will raise [Method_not_applicable]. * In order to get the value of such an attribute node [anode], * one can invoke the method [attribute]: * * [anode # attribute name] * * where [name] is the name of the attribute represented by * [anode]. This will return the attribute value as [att_value]. Of * course, the other attribute observers can be applied as well. * Furthermore, the method [data] will return the attribute value as * string. However, every attribute node only contains the value of the * one attribute it represents, and it does not make sense to pass * names of other attributes to the observer methods. * * The attribute nodes live outside of the regular XML tree, and * they are not considered as children of the element node. However, * the element node is the parent node of the attribute nodes * (i.e. the children/parent relationship is asymmetric). * * The method [attributes_as_nodes] computes the list of attribute * nodes when it is first invoked, and it will return the same list * again in subsequent invocations. * This method is only applicable to elements. * *) method set_comment : string option -> unit (* * obj # [set_comment] c * AUTO * Sets the comment string contained in comment nodes, if * [c = Some s]. Otherwise, this method removes the comment string * ([c = None]). * * Note that the comment string must not include the delimiters * []. Furthermore, it must not contain any character * or character sequence that are forbidden in comments, such * as ["--"]. However, this method does not check this condition. * Comment nodes. * *) method comment : string option (* * obj # [comment] * AUTO * Returns [Some text] if the node is a comment node and if * [text] is the comment string (without the delimiters []). Otherwise, [None] is passed back. * * Note: The [data] method also returns the comment string, but it * raises [Not_found] if the string is not available. * All node types. Note that the method will always return * [None] for non-comment nodes. * *) method normprefix : string (* * obj # [normprefix] * AUTO * For namespace-aware implementations of the node class, this * method returns the normalized prefix of the element or attribute. * If the object does not have a prefix, "" will be passed back. * * The normalized prefix is the part of the name before the * colon. It is normalized because the parser ensures that every * prefix corresponds only to one namespace. Note that the * prefix can be different than in the parsed XML source because * the normalization step needs to change the prefix to avoid * prefix conflicts. * Elements and attributes supporting namespaces. * *) method display_prefix : string (* * obj # [display_prefix] * AUTO * For namespace-aware implementations of the node class, this * method returns the display prefix of the element or attribute. * If the object does not have a prefix, "" will be passed back. * * The display prefix is the prefix in the XML text. Unlike * the normprefix, it is not unique in the document. * * Actually, this method does not return the real display prefix * that was found in the XML text but the most recently declared * display prefix bound to the namespace URI of this element or * attribute, i.e. this method infers the display prefix. The * result can be a different prefix than the original prefix * if the same namespace URI is bound several times in the * current namespace scope. * * This method is quite slow. * Elements and attributes supporting namespaces. * *) method localname : string (* * obj # [localname] * AUTO * For namespace-aware implementations of the node class, this * method returns the local part of the name of the element or * attribute. * * The local name is the part of the name after the colon, or * the whole name if there is no colon. * Elements and attributes supporting namespaces. * *) method namespace_uri : string (* * obj # [namespace_uri] * AUTO * For namespace-aware implementations of the node class, this * method returns the namespace URI of the element, attribute or * namespace. It is required that a namespace manager is available. * * If the node does not have a namespace prefix, and there is no * default namespace, this method returns "". * * The namespace URI is the unique name of the namespace. * Elements and attributes supporting namespaces; furthermore * namespace nodes. * *) method namespace_manager : namespace_manager (* * obj # [namespace_manager] * AUTO * For namespace-aware implementations of the node class, * this method returns the namespace manager. If the namespace * manager has not been set, the exception [Not_found] is raised. * * The namespace manager is an object that holds the mapping * from namespace prefixes to namespace URIs, and vice versa. * It is contained in the DTD. * Elements and attributes supporting namespaces; furthermore * namespace nodes. * *) method namespace_scope : namespace_scope (* * obj # [namespace_scope] * AUTO * Returns additional information about the namespace * structure in the parsed XML text. In particular, the namespace * scope describes the original (unprocessed) namespace prefixes * in the XML text, and how they are mapped to the namespace URIs. * * When printing XML text, the namespace scope may be used * to give the printer hints where to introduce namespaces, and * which namespace prefixes are preferred. * Elements and attributes supporting namespaces *) method set_namespace_scope : namespace_scope -> unit (* * obj # [set_namespace_scope] scope * AUTO * Sets the namespace scope object. It is required that * this object is connected to the same namespace manager as * the document tree. * Elements and attributes supporting namespaces *) method namespaces_as_nodes : 'ext node list (* * obj # [namespaces_as_nodes] * AUTO * Returns the namespaces found in the [namespace_scope] * object and all parent scope objects (except declarations that * are hidden by more recent declarations). The namespaces are * returned as node objects with type [T_namespace name] where * [name] is the normalized prefix. * * This method should be used if it is required for typing reasons * that the namespaces have also type [node]. A common example * are sets that may both contain elements and namespaces, as they * are used in the XPath language. * * The namespace nodes are read-only; any call to a method * modifying their contents will raise [Method_not_applicable]. * See the class [namespace_impl] below for more information * about the returned nodes. * * The namespace nodes live outside of the regular XML tree, and * they are not considered as children of the element node. However, * the element node is the parent node of the namespace nodes * (i.e. the children/parent relationship is asymmetric). * * The method [namespaces_as_nodes] computes the list of attribute * nodes when it is first invoked, and it will return the same list * again in subsequent invocations. * This method is only applicable to elements that * support namespaces. * *) (* -- namespace_info is withdrawn method namespace_info : 'ext namespace_info (* * obj # [namespace_info] * AUTO * Returns additional information about the namespace prefixes * in the parsed XML source. This method has been added for * better XPath conformance. Note that it is still experimental * and it is likely that it will be changed. * * This record is only available if the parser has been configured * to support namespaces, and if the parser has been configured * to set this record (requires a lot of memory). Furthermore, only * the implementation namespace_element_impl supports this method. * * This method raises [Not_found] if the [namespace_info] field has not * been set. * Elements supporting namespaces. * *) *) method dtd : dtd (* * obj # [dtd] * AUTO * Returns the DTD. * All node types. Note (1) that exemplars need not to have * an associated DTD, in which case this method fails. (2) Even * in well-formedness mode every node has a DTD object; * this object specifies well-formedness mode. * *) method encoding : Pxp_core_types.rep_encoding (* * obj # [encoding] * AUTO * Get the encoding which is always the same as the encoding of * the DTD. See also method [dtd]. (Note: This method fails, too, if * no DTD is present.) * All node types. Note that exemplars need not to have * an associated DTD, in which case this method fails. * *) method create_element : ?name_pool_for_attribute_values:Pxp_core_types.pool -> ?position:(string * int * int) -> ?valcheck:bool -> (* default: true *) ?att_values:((string * Pxp_core_types.att_value) list) -> dtd -> node_type -> (string * string) list -> 'ext node (* * obj # [create_element] ~name_pool_for_attribute_values ~position ~valcheck ~att_values * dtd ntype att_list * AUTO * Returns a flat copy of this element node with the following * modifications: * - The DTD is set to [dtd] * - The node type is set to [ntype] (which must be [T_element name]) * - The attribute list is set to the concatenation of * [att_list] and [att_values]; [att_list] passes attribute values * as strings while [att_values] passes attribute values as * type [att_value] * - The copy does not have children nor a parent * - The copy does not contain processing instructions. * - The position triple is set to [position] * -- * Note that the extension object is copied, too. * * If [valcheck = true] (the default), it is checked whether the * element type exists and whether the passed attributes match the * declared attribute list. Missing attributes are automatically * added, if possible. If [valcheck = false], any element type * and any attributes are accepted. * * If a [name_pool_for_attribute_values] is passed, the attribute * values in [att_list] are put into this pool. * * The optional arguments have the following defaults: * - [~name_pool_for_attribute_values]: No pool is used * - [~position]: The position is not available in the copy * - [~valcheck]: false * - [~att_values]: empty * -- * Elements. * type-node-ex-create-element * *) method create_data : dtd -> string -> 'ext node (* * obj # [create_data] dtd cdata * AUTO * Returns a flat copy of this data node with the following * modifications: * - The DTD is set to [dtd] * - The character string is set to [cdata] * -- * Note that the extension object is copied, too. * Data nodes. * type-node-ex-create-data * *) method create_other : ?position:(string * int * int) -> dtd -> node_type -> 'ext node (* * obj # [create_other] ~position dtd ntype * AUTO * Returns a flat copy of this node with the following * modification: * - The DTD is set to [dtd] * - The position triple is set to [position] * -- * Note that the extension object is copied, too. * * The passed node type [ntype] must match the node type * of [obj]. * Super root nodes, processing instruction nodes, * comment nodes * *) method local_validate : ?use_dfa:bool -> ?check_data_nodes:bool -> unit -> unit (* DEPRECATED NAME of validate_contents. *) method validate_contents : ?use_dfa:bool -> ?check_data_nodes:bool -> unit -> unit (* * obj # [validate_contents] ?use_dfa ?check_data_nodes () * AUTO * Checks that the subnodes of this element match the declared * content model of this element. The method returns [()] if * the element is okay, and it raises an exception if an error * is found (in most cases [Validation_error]). * * This check is always performed by the parser, such that * software that only reads parsed XML trees needs not call * this method. However, if software modifies the tree itself, * an invocation of this method ensures that the validation * constraints about content models are fulfilled. * * Note that the check is not performed recursively. * * - Option [~use_dfa]: If true, the deterministic finite automaton of * regexp content models is used for validation, if available. * Defaults to false. * - Option [~check_data_nodes]: If true, it is checked whether data * nodes only occur at valid positions. If false, these checks * are left out. Defaults to true. (Usually, the parser turns * this feature off because the parser already performs a similar * check.) * * See [classify_data_node] for details about what is checked. * -- * * In previous releases of PXP, this method was called [local_validate]. * All node types. However, there are only real checks for * elements; for other nodes, this method is a no-op. * *) method complement_attlist : unit -> unit (* * obj # [complement_attlist] () * AUTO * Adds attributes that are declared in the DTD but are * currently missing: [#IMPLIED] attributes are added with * [Implied_value], and if there is a default value for an attribute, * this value is added. [#REQUIRED] attributes are set to * [Implied_value], too. * * It is only necessary to call this method if the element is created * with ~valcheck:false, or the attribute list has been modified, * and the element must be validated. * Elements. * *) method validate_attlist : unit -> unit (* * obj # [validate_attlist] () * AUTO * Checks whether the attribute list of the element [obj] * matches the declared attribute list. The method returns [()] * if the attribute list is formed correctly, and it raises an * exception (usually a [Validation_error]) if there is an error. * * This check is implicitly performed by [create_element] unless * the option [~valcheck:false] has been passed. This means that it * is usually not necessary to call this method; however, if the * attribute list has been changed by [set_attributes] or if * [~valcheck:false] is in effect, the invocation of this method * ensures the validity of the attribute list. * * Note that the method complains about missing attributes even * if these attributes have been declared with a default value or as * being [#IMPLIED]; this method only checks the attributes but does * not modify the attribute list. If you know that attributes are * missing and you want to add them automatically just as * [create_element] does, you can call [complement_attlist] before * doing this check. * All node types. However, for non-element nodes this * check is a no-op. * *) method validate : unit -> unit (* * obj # [validate] () * AUTO * Calls [validate_contents] and [validate_attlist], and * ensures that this element is locally valid. The method * returns [()] if the element is valid, and raises an exception * otherwise. * All node types. However, for non-element nodes this * check is a no-op. * *) (* method keep_always_whitespace_mode : unit *) (* This method has been removed. You can now set the handling of * ignorable whitespace by a new Pxp_yacc.config option: * [drop_ignorable_whitespace] *) method write : ?prefixes:string list -> ?default:string -> Pxp_core_types.output_stream -> Pxp_core_types.encoding -> unit (* * obj # [write] ~prefixes stream enc * AUTO * Write the contents of this node and the subtrees to the passed * [stream] encoded as [enc]. The generated output is again XML. * The output style is rather compact and should not be considered * as "pretty printing". * * The namespace-aware nodes use a notation with normalized * prefixes. The namespace scope is ignored. * * Option [~prefixes]: The class [namespace_element_impl] interprets * this option and passes it recursively to subordinate invocations of * [write]. The meaning is that the normprefixes enumerated by this list * have already been declared by surrounding elements. The option * defaults to [] forcing the method to output all necessary prefix * declarations. * * Option [~default]: Specifies the normprefix that becomes the * default namespace in the output. * All regular node types (elements, data nodes, comments, * processing instructions, super root nodes). * *) method display : ?prefixes:string StringMap.t -> Pxp_core_types.output_stream -> Pxp_core_types.encoding -> unit (* * obj # [display] ~prefixes stream enc * AUTO * Write the contents of this node and the subtrees to the passed * [stream] encoded as [enc]. The generated output is again XML. * The output style is rather compact and should not be considered * as "pretty printing". * * The namespace-aware nodes try to follow the namespace scoping * found in the nodes. The generated namespace prefixes are * display prefixes. Missing prefixes are complemented, but this * is slow. * * Option [~prefixes]: The class [namespace_element_impl] interprets * this option and passes it recursively to subordinate invocations of * [display]. The mapping contains the declarations currently in * effect as pairs of [(prefix,uri)]. The option * defaults to [] forcing the method to output all necessary prefix * declarations. * All regular node types (elements, data nodes, comments, * processing instructions, super root nodes). * *) (* ---------------------------------------- *) (* internal methods: *) method internal_adopt : 'ext node option -> int -> unit method internal_set_pos : int -> unit method internal_delete : 'ext node -> unit method internal_init : (string * int * int) -> Pxp_core_types.pool option -> bool -> dtd -> string -> (string * string) list -> (string * Pxp_core_types.att_value) list -> unit method internal_init_other : (string * int * int) -> dtd -> node_type -> unit method dump : Format.formatter -> unit end ;; (* class type namespace_info: removed *) class [ 'ext ] data_impl : 'ext -> [ 'ext ] node (* * * 'ext [data_impl] * AUTO * This class is an implementation of [node] which * realizes data nodes. You can create a new object by * * [let exemplar = new data_impl ext_obj] * * which creates a special form of empty data node which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called a data exemplar. In order to get a working data node * that can be used in a node tree it is required to apply the method * [create_data] on the exemplar object. * *) class [ 'ext ] element_impl : 'ext -> [ 'ext ] node (* * * 'ext [element_impl] * AUTO * This class is an implementation of [node] which * realizes element nodes. You can create a new object by * * [let exemplar = new element_impl ext_obj] * * which creates a special form of empty element which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called an element exemplar. In order to get a working element * that can be used in a node tree it is required to apply the method * [create_element] on the exemplar object. * * Note that the class [element_impl] is not namespace-aware. * *) class [ 'ext ] comment_impl : 'ext -> [ 'ext ] node ;; (* * * 'ext [comment_impl] * AUTO * This class is an implementation of [node] which * realizes comment nodes. You can create a new object by * * [let exemplar = new comment_impl ext_obj] * * which creates a special form of empty element which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called an comment exemplar. In order to get a working element * that can be used in a node tree it is required to apply the method * [create_other] on the exemplar object, e.g. * * [let comment = exemplar # create_other dtd] * *) class [ 'ext ] super_root_impl : 'ext -> [ 'ext ] node ;; (* * * 'ext [super_root_impl] * AUTO * This class is an implementation of [node] which * realizes super root nodes. You can create a new object by * * [let exemplar = new super_root_impl ext_obj] * * which creates a special form of empty super root which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called a super root exemplar. In order to get a working node * that can be used in a node tree it is required to apply the method * [create_other] on the exemplar object, e.g. * * [let root = exemplar # create_other dtd] * *) class [ 'ext ] pinstr_impl : 'ext -> [ 'ext ] node ;; (* * * 'ext [pinstr_impl] * AUTO * This class is an implementation of [node] which * realizes processing instruction nodes. You can create a new object by * * [let exemplar = new pinstr_impl ext_obj] * * which creates a special form of empty node which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called a processing instruction exemplar. In order to get a working node * that can be used in a node tree it is required to apply the method * [create_other] on the exemplar object, e.g. * * [let pi = exemplar # create_other dtd] * *) val pinstr : 'ext node -> proc_instruction (* * * [pinstr] n * AUTO * Returns the processing instruction contained in a * processing instruction node. * This function raises [Invalid_argument] if invoked for a different node * type than T_pinstr. * *) class [ 'ext ] attribute_impl : element:string -> name:string -> Pxp_core_types.att_value -> dtd -> [ 'ext ] node ;; (* Creation: * new attribute_impl element_name attribute_name attribute_value dtd * Note that attribute nodes do intentionally not have extensions. * * Attribute nodes are created on demand by the first invocation of * attributes_as_nodes of the element node. Attribute nodes are * created directly and not by copying exemplar nodes, so you never * need to create them yourself. * * Attribute nodes have the following properties: * - The node type is T_attribute name. * - The parent node is the element node. * - The method "attributes" returns [ name, value ], i.e. such nodes * have a single attribute "name". To get the value, call * n # attribute name. * - The method "data" returns the string representation of the * attribute value. * - Attribute nodes are leaves of the tree. * * Attribute nodes are designed to be members of XPath node sets, and * are only useful if you need such sets. *) val attribute_name : 'ext node -> string (* * * [attribute_name] n * AUTO * Returns the name of the attribute contained in an attribute * node. Raises [Invalid_argument] if [n] does not have node type * [T_attribute]. * *) val attribute_value : 'ext node -> Pxp_core_types.att_value (* * * [attribute_value] n * AUTO * Returns the value of the attribute contained in an attribute * node. Raises [Invalid_argument] if [n] does not have node type * [T_attribute]. * *) val attribute_string_value : 'ext node -> string (* * * [attribute_string_value] n * AUTO * Returns the string value of the attribute contained in an attribute * node. Raises [Invalid_argument] if [n] does not have node type * [T_attribute]. * *) class [ 'ext ] namespace_element_impl : 'ext -> [ 'ext ] node (* * * 'ext [namespace_element_impl] * AUTO * This class is an implementation of [node] which * realizes element nodes. In contrast to [element_impl], this class * also implements the namespace methods. * You can create a new object by * * [let exemplar = new namespace_element_impl ext_obj] * * which creates a special form of empty element which already contains a * reference to the [ext_obj], but is otherwise empty. This special form * is called an element exemplar. In order to get a working element * that can be used in a node tree it is required to apply the method * [create_element] on the exemplar object. * *) (* namespace_element_impl: the namespace-aware implementation of element * nodes. * * This class has an extended definition of the create_element method. * It accepts element names of the form "normprefix:localname" where * normprefix must be a prefix managed by the namespace_manager. Note * that create_element does not itself normalize prefixes; it is expected * that the prefixes are already normalized. * * In addition to calling create_element, one can set the namespace scope * after creation (set_namespace_scope) to save the mapping of unprocessed * namespace prefixes to normalized prefixes. This is voluntary. * * Such nodes have the node type T_element "normprefix:localname". * * Furthermore, this class implements the methods: * - normprefix * - localname * - namespace_uri * - namespace_scope * - set_namespace_scope * - namespace_manager *) class [ 'ext ] namespace_attribute_impl : element:string -> name:string -> Pxp_core_types.att_value -> dtd -> [ 'ext ] node ;; (* namespace_attribute_impl: the namespace-aware implementation of * attribute nodes. *) class [ 'ext ] namespace_impl : (* dspprefix: *) string -> (* normprefix: *) string -> dtd -> [ 'ext ] node ;; (* Namespace objects are only used to represent the namespace declarations * occurring in the attribute lists of elements. *) val namespace_normprefix : 'ext node -> string val namespace_display_prefix : 'ext node -> string val namespace_uri : 'ext node -> string (* These functions return the normprefix, the display prefix, and the URI * stored in a namespace object. * If invoked for a different node type, the functions raise Invalid_argument. *) (********************************** spec *********************************) type 'ext spec constraint 'ext = 'ext node #extension (* * * 'ext [spec] * AUTO * The abstract data type specifying which objects are actually * created by the parser. * *) val make_spec_from_mapping : ?super_root_exemplar : 'ext node -> ?comment_exemplar : 'ext node -> ?default_pinstr_exemplar : 'ext node -> ?pinstr_mapping : (string, 'ext node) Hashtbl.t -> data_exemplar: 'ext node -> default_element_exemplar: 'ext node -> element_mapping: (string, 'ext node) Hashtbl.t -> unit -> 'ext spec (* * * [make_spec_from_mapping] * ~super_root_exemplar ~comment_exemplar ~default_pinstr_exemplar * ~pinstr_mapping ~data_exemplar ~default_element_exemplar * ~element_mapping * () * AUTO * Creates a [spec] from the arguments. Some arguments are optional, * some arguments are mandatory. * - [~super_root_exemplar]: Specifies the exemplar to be used for * new super root nodes. This exemplar is optional. * - [~comment_exemplar]: Specifies the exemplar to be used for * new comment nodes. This exemplar is optional. * - [~pinstr_exemplar]: Specifies the exemplar to be used for * new processing instruction nodes by a hashtable mapping target * names to exemplars. This hashtable is optional. * - [~default_pinstr_exemplar]: Specifies the exemplar to be used for * new processing instruction nodes. This exemplar will be used * for targets that are not contained in the [~pinstr_exemplar] * hashtable. This exemplar is optional. * - [~data_exemplar]: Specifies the exemplar to be used for * new data nodes. This exemplar is mandatory. * - [~element_mapping]: Specifies the exemplar to be used for * new element nodes by a hashtable mapping element types to * exemplars. This hashtable is mandatory (but may be empty). * - [~default_element_exemplar]: Specifies the exemplar to be used for * new element nodes. This exemplar will be used * for element types that are not contained in the [~element_mapping] * hashtable. This exemplar is mandatory. * -- * *) val make_spec_from_alist : ?super_root_exemplar : 'ext node -> ?comment_exemplar : 'ext node -> ?default_pinstr_exemplar : 'ext node -> ?pinstr_alist : (string * 'ext node) list -> data_exemplar: 'ext node -> default_element_exemplar: 'ext node -> element_alist: (string * 'ext node) list -> unit -> 'ext spec (* * * [make_spec_from_alist] * ~super_root_exemplar ~comment_exemplar ~default_pinstr_exemplar * ~pinstr_alist ~data_exemplar ~default_element_exemplar * ~element_alist * () * AUTO * Creates a [spec] from the arguments. This is a convenience * function for [make_spec_from_mapping]; instead of requiring hashtables * the function allows it to pass associative lists. * *) val create_data_node : 'ext spec -> dtd -> string -> 'ext node (* * * [create_data_node] spec dtd datastring * AUTO * Creates a new data node from the exemplar contained in [spec]. * The new node contains [datastring] and is connected with the [dtd]. * *) val create_element_node : ?name_pool_for_attribute_values:Pxp_core_types.pool -> ?position:(string * int * int) -> ?valcheck:bool -> ?att_values:((string * Pxp_core_types.att_value) list) -> 'ext spec -> dtd -> string -> (string * string) list -> 'ext node (* * [create_element_node] ~name_pool_for_attribute_values * ~position ~valcheck ~att_values spec dtd eltype * att_list * AUTO * Creates a new element node from the exemplar(s) contained in * [spec]: * - The new node will be connected to the passed [dtd]. * - The new node will have the element type [eltype]. * - The attributes of the new node will be the concatenation of * [att_list] and [att_values]; [att_list] passes attribute values * as strings while [att_values] passes attribute values as * type [att_value] * - The source position is set to [~position] (if passed) * - The [~name_pool_for_attribute_values] will be used, if passed. * - If [~valcheck = true] (the default), the attribute list is * immediately validated. If [~valcheck = false], the validation * is left out; in this case you can pass any element type and * and any attributes, and it does not matter whether and how * they are declared. * -- * *) val create_super_root_node : ?position:(string * int * int) -> 'ext spec -> dtd -> 'ext node (* * [create_super_root_node] ~position spec dtd * AUTO * Creates a new super root node from the exemplar contained in * [spec]. The new node is connected to [dtd], and the position * triple is set to [~position]. * * The function fails if there is no super root exemplar in [spec]. * *) val create_comment_node : ?position:(string * int * int) -> 'ext spec -> dtd -> string -> 'ext node (* * [create_comment_node] ~position spec dtd commentstring * AUTO * Creates a new comment node from the exemplar contained in * [spec]. The new node is connected to [dtd], and the position * triple is set to [~position]. The contents of the node are set * to [commentstring]. * * The function fails if there is no comment exemplar in [spec]. * *) val create_pinstr_node : ?position:(string * int * int) -> 'ext spec -> dtd -> proc_instruction -> 'ext node (* * [create_pinstr_node] ~position spec dtd pi * AUTO * Creates a new processing instruction node from the exemplar * contained in [spec]. The new node is connected to [dtd], and the * position triple is set to [~position]. The contents of the node are set * to [pi]. * * The function fails if there is no processing instruction exemplar in * [spec]. * *) val create_no_node : ?position:(string * int * int) -> 'ext spec -> dtd -> 'ext node (* Creates a T_none node with limited functionality * NOTE: This function is conceptually broken and may be dropped in the * future. *) val get_data_exemplar : 'ext spec -> 'ext node val get_element_exemplar : 'ext spec -> string -> (string * string) list -> 'ext node val get_super_root_exemplar : 'ext spec -> 'ext node val get_comment_exemplar : 'ext spec -> 'ext node val get_pinstr_exemplar : 'ext spec -> proc_instruction -> 'ext node (* These functions just return the exemplars (or raise Not_found). * Notes: * (1) In future versions, it may be possible that the element exemplar * depends on attributes, too, so the attlist must be passed * to get_element_exemplar * (2) In future versions, it may be possible that the pinstr exemplar * depends on the full value of the processing instruction and * not only on the target, so the full proc_instruction must be * passed to get_pinstr_exemplar. *) (*********************** Ordering of nodes ******************************) (* The functions compare and ord_compare implement the so-called * "document order". The basic principle is that the nodes are linearly * ordered by their occurence in the textual XML representation of the * tree. While this is clear for element nodes, data nodes, comments, and * processing instructions, a more detailed definition is necessary for the * other node types. In particular, attribute nodes of an element node * occur before any regular subnode of the element, and namespace nodes * of that element occur even before the attribute nodes. So the order * of nodes of * * is * 1. element "sample" * 2. attribute "a1" * 3. attribute "a2" * 4. element "subnode" * Note that the order of the attributes of the same element is unspecified, * so "a2" may alternatively be ordered before "a1". If there were namespace * nodes, they would occur between 1 and 2. * If there is a super root node, it will be handled as the very first * node. *) val compare : 'ext node -> 'ext node -> int (* * * [compare] n1 n2 * AUTO * Returns -1 if [n1] occurs before [n2], or +1 if [n1] occurs * after [n2], or 0 if both nodes are identical. * If the nodes are unrelated (do not have a common ancestor), the result * is undefined (Note: this case is different from [ord_compare]). * This test is rather slow, but it works even if the XML tree changes * dynamically (in contrast to [ord_compare] below). * *) type 'ext ord_index constraint 'ext = 'ext node #extension (* * * 'ext [ord_index] * AUTO * The type of ordinal indexes. * *) val create_ord_index : 'ext node -> 'ext ord_index (* * * [create_ord_index] startnode * AUTO * * Creates an ordinal index for the subtree starting at [startnode]. * This index assigns to every node an ordinal number (beginning with 0) such * that nodes are numbered upon the order of the first character in the XML * representation (document order). * Note that the index is not automatically updated when the tree is * modified. * *) val ord_number : 'ext ord_index -> 'ext node -> int (* Returns the ordinal number of the node, or raises Not_found. * Note that attribute nodes and namespace nodes are treated specially: * All attribute nodes for a certain element node have the _same_ * ordinal index. All namespace nodes for a certain element node * have the _same_ ordinal index. * (So ord_number x = ord_number y does not imply x == y for these * nodes. However, this is true for the other node types.) * It is not recommended to work with the ordinal number directly but * to call ord_compare which already handles the special cases. *) val ord_compare : 'ext ord_index -> 'ext node -> 'ext node -> int (* * * [ord_compare] idx n1 n2 * AUTO * * Compares two nodes like [compare]: * Returns -1 if [n1] occurs before [n2], or +1 if [n1] occurs * after [n2], or 0 if both nodes are identical. * If one of the nodes does not occur in the ordinal index, [Not_found] * is raised. (Note that this is a different behaviour than what [compare] * would do.) * * This test is much faster than [compare]. * *) (***************************** Iterators ********************************) (* General note: The iterators ignore attribute and namespace nodes *) val find : ?deeply:bool -> ('ext node -> bool) -> 'ext node -> 'ext node (* * * [find] ~deeply f startnode * AUTO * Searches the first node in the tree below [startnode] for which * the predicate f is true, and returns it. Raises [Not_found] * if there is no such node. * * By default, [~deeply=false]. In this case, only the children of * [startnode] are searched. * * If passing [~deeply=true], the children are searched recursively * (depth-first search). Note that even in this case [startnode] itself * is not checked. * * Attribute and namespace nodes are ignored. * *) val find_all : ?deeply:bool -> ('ext node -> bool) -> 'ext node -> 'ext node list (* * [find_all] ~deeply f startnode * AUTO * Searches all nodes in the tree below [startnode] for which * the predicate f is true, and returns them. * * By default, [~deeply=false]. In this case, only the children of * [startnode] are searched. * * If passing [~deeply=true], the children are searched recursively * (depth-first search). Note that even in this case [startnode] itself * is not checked. * * Attribute and namespace nodes are ignored. * *) val find_element : ?deeply:bool -> string -> 'ext node -> 'ext node (* * * [find_element] ~deeply eltype startnode * AUTO * Searches the first element in the tree below [startnode] * that has the element type [eltype], and returns it. Raises [Not_found] * if there is no such node. * * By default, [~deeply=false]. In this case, only the children of * [startnode] are searched. * * If passing [~deeply=true], the children are searched recursively * (depth-first search). Note that even in this case [startnode] itself * is not checked. * *) val find_all_elements : ?deeply:bool -> string -> 'ext node -> 'ext node list (* * * [find_all_elements] ~deeply eltype startnode * AUTO * Searches all elements in the tree below [startnode] * having the element type [eltype], and returns them. * * By default, [~deeply=false]. In this case, only the children of * [startnode] are searched. * * If passing [~deeply=true], the children are searched recursively * (depth-first search). Note that even in this case [startnode] itself * is not checked. * *) exception Skip (* * * [Skip] * AUTO * This exception can be used in the functions passed to * [map_tree], [map_tree_sibl], [iter_tree], and [iter_tree_sibl] * to skip the current node, and to proceed with the next node. * See these function for details. * *) val map_tree : pre:('exta node -> 'extb node) -> ?post:('extb node -> 'extb node) -> 'exta node -> 'extb node (* * * [map_tree] ~pre ~post startnode * AUTO * Maps the tree beginning at [startnode] to a second tree * using the following algorithm. * * [startnode] and the whole tree below it are recursively traversed. * After entering a node, the function ~pre is called. The result of * this function must be a new node; it must not have children nor a * parent. For example, you can pass * [~pre:(fun n -> n # orphaned_flat_clone)] * to copy the original node. After that, the children are processed * in the same way (from left to right) resulting in a list of * mapped children. These are added to the mapped node as its * children. * * Now, the ~post function is invoked with the mapped node as argument, and * the result is the result of the function (~post should return a root * node, too; if not specified, the identity is the ~post function). * * Both ~pre and ~post may raise [Skip] which causes that the node is * left out (i.e. the mapped tree does neither contain the node nor * any children of the node). * If the top node is skipped, the exception [Not_found] is * raised. * * For example, the following piece of code duplicates a tree, but * removes all comment nodes: * * [ map_tree ~pre:(fun n -> if n # node_type = T_comment then raise Skip else n # orphaned_flat_clone) startnode ] * * Attribute and namespace nodes are ignored. * *) val map_tree_sibl : pre: ('exta node option -> 'exta node -> 'exta node option -> 'extb node) -> ?post:('extb node option -> 'extb node -> 'extb node option -> 'extb node) -> 'exta node -> 'extb node (* * * [map_tree_sibl] ~pre ~post startnode * AUTO * Maps the tree beginning at [startnode] to a second tree * using the following algorithm. * * [startnode] and the whole tree below it are recursively traversed. * After entering a node, the function ~pre is called with three * arguments: some previous node, the current node, and some next node. * The previous and the next node may not exist because the current * node is the first or the last in the current list of nodes. * In this case, [None] is passed as previous or next node, resp. * The result of this function invocation must be a new node; * it must not have children nor a parent. For example, you can pass * [~pre:(fun prev n next -> n # orphaned_flat_clone)] * to copy the original node. After that, the children are processed * in the same way (from left to right) resulting in a list of * mapped children. * * Now, the ~post function is applied to the list of mapped children * resulting in a list of postprocessed children. (Note: this part * works rather differently than [map_tree].) ~post has three arguments: * some previous child, the current child, and some next child. * The previous and the next child are [None] if non-existing. * The postprocessed children are appended to the mapped node resulting * in the mapped tree. * * Both ~pre and ~post may raise [Skip] which causes that the node is * left out (i.e. the mapped tree does neither contain the node nor * any children of the node). * If the top node is skipped, the exception [Not_found] is * raised. * * Attribute and namespace nodes are ignored. * *) val iter_tree : ?pre:('ext node -> unit) -> ?post:('ext node -> unit) -> 'ext node -> unit (* * * [iter_tree] ~pre ~post startnode * AUTO * Iterates over the tree beginning at [startnode] * using the following algorithm. * * [startnode] and the whole tree below it are recursively traversed. * After entering a node, the function ~pre is called. Now, the children * are processed recursively. Finally, the ~post function is invoked. * * The ~pre function may raise [Skip] causing that the children * and the invocation of the ~post function are skipped. * If the ~post function raises [Skip] nothing special happens. * * Attribute and namespace nodes are ignored. * *) val iter_tree_sibl : ?pre: ('ext node option -> 'ext node -> 'ext node option -> unit) -> ?post:('ext node option -> 'ext node -> 'ext node option -> unit) -> 'ext node -> unit (* * * [iter_tree_sibl] ~pre ~post startnode * AUTO * Iterates over the tree beginning at [startnode] * using the following algorithm. * * [startnode] and the whole tree below it are recursively traversed. * After entering a node, the function ~pre is called with three * arguments: some previous node, the current node, and some next node. * The previous and the next node may be [None] if non-existing. * Now, the children are processed recursively. * Finally, the ~post function is invoked with the same three * arguments. * * The ~pre function may raise [Skip] causing that the children * and the invocation of the ~post function are skipped. * If the ~post function raises [Skip] nothing special happens. * * Attribute and namespace nodes are ignored. * *) (************************ Whitespace handling ***************************) type stripping_mode = [ `Strip_one_lf | `Strip_one | `Strip_seq | `Disabled ] (* * * [stripping_mode] * AUTO * The different ways how to strip whitespace from a single * data node: * - [`Strip_one_lf]: If there is a linefeed character at the beginning/at * the end, it will be removed. If there are more linefeed characters, * only the first/the last is removed. * (This is the SGML rule to strip whitespace.) * - [`Strip_one]: If there is a whitespace character at the beginning/at * the end, it will be removed. If there are more whitespace characters, * only the first/the last is removed. Whitespace characters are space, * newline, carriage return, and tab. * - [`Strip_seq]: All whitespace characters at the beginning/at the end are * removed. * - [`Disabled]: Do not strip whitespace. * -- * *) val strip_whitespace : ?force:bool -> ?left:stripping_mode -> ?right:stripping_mode -> ?delete_empty_nodes:bool -> 'ext node -> unit (* * * [strip_whitespace] ~force ~left ~right ~delete_empty_nodes * startnode * AUTO * * Modifies the passed tree in-place by the following rules: * - In general, whitespace stripping is not applied to nodes inside * an [xml:space="preserve"] region, unless [~force:true] is passed * to the function (default is [~force:false]). Only if whitespace * stripping is allowed, the following rules are carried out. * Note that the detection of regions with preserved whitespace takes * the parent nodes of the passed [startnode] into account. * - If applied to a data node, whitespace at the beginning of the node * is removed according to [~left], and whitespace at the end of the node * is removed according to [~right]. * - If applied to an element, whitespace at the beginning of the first * data subnode is removed according to [~left], and whitespace at the end * of the last data subnode is removed according to [~right]. Furthermore, * these rules are recursively applied to all subelements (but not to * other node types). * - If applied to the super root node, this node is treated as if it * were an element. * - Whitespace of other node types is left as-is, as whitespace occuring * in attributes. * - Option [~delete_empty_nodes] (default true): * If data nodes become empty after removal of whitespace, they are * deleted from the XML tree. * -- * * Defaults: * - [~force:false] * - [~left:`Disabled] * - [~right:`Disabled] * *) (****************************** normalization ****************************) val normalize : 'ext node -> unit (* * * [normalize] startnode * AUTO * Normalizes the tree denoted by [startnode] such that * neither empty data nodes nor adjacent data nodes exist. Normalization * works in-place. * *) (******************************** validation *****************************) val validate : 'ext node -> unit (* * * [validate] startnode * AUTO * Validates the tree denoted by [startnode]. In contrast to * [startnode # validate()] this function validates recursively. * *) (******************************* document ********************************) class [ 'ext ] document : ?swarner:Pxp_core_types.symbolic_warnings -> Pxp_core_types.collect_warnings -> Pxp_core_types.rep_encoding -> object (* Documents: These are containers for root elements and for DTDs. * * Important invariant: A document is either empty (no root element, * no DTD), or it has both a root element and a DTD. * * A fresh document created by 'new' is empty. *) method init_xml_version : string -> unit (* Set the XML version string of the XML declaration. *) method init_root : 'ext node -> string -> unit (* Set the root element. It is expected that the root element has * a DTD. * The second argument is the original name of the root element * (without namespace prefix processing). * Note that 'init_root' checks whether the passed root element * has the type expected by the DTD. The check takes into account * that the root element might be a virtual root node. *) method xml_version : string (* Returns the XML version from the XML declaration. Returns "1.0" * if the declaration is missing. *) method xml_standalone : bool (* Returns whether this document is declared as being standalone. * This method returns the same value as 'standalone_declaration' * of the DTD (if there is a DTD). * Returns 'false' if there is no DTD. *) method dtd : dtd (* Returns the DTD of the root element. * Fails if there is no root element. *) method encoding : Pxp_core_types.rep_encoding (* Returns the string encoding of the document = the encoding of * the root element = the encoding of the element tree = the * encoding of the DTD. * Fails if there is no root element. *) method root : 'ext node (* Returns the root element, or fails if there is not any. *) method raw_root_name : string (* The unprocessed name of the root element (second arg of * init_root) *) method add_pinstr : proc_instruction -> unit (* Adds a processing instruction to the document container. * The parser does this for PIs occurring outside the DTD and outside * the root element. *) method pinstr : string -> proc_instruction list (* Return all PIs for a passed target string. *) method pinstr_names : string list (* Return all target strings of all PIs. *) method write : ?default : string -> ?prefer_dtd_reference : bool -> Pxp_core_types.output_stream -> Pxp_core_types.encoding -> unit (* Write the document to the passed * output stream; the passed encoding used. The format * is compact (the opposite of "pretty printing"). * If a DTD is present, the DTD is included into the internal subset. * * Option [~default]: Specifies the normprefix that becomes the * default namespace in the output. * * Option [~prefer_dtd_reference]: If true, it is tried to print * the DTD as reference, i.e. with SYSTEM or PUBLIC identifier. * This works only if the DTD has an [External] identifier. If * the DTD cannot printed as reference, it is included as text. * The default is not to try DTD references, i.e. to always include * the DTD as text. *) method display : ?prefer_dtd_reference : bool -> Pxp_core_types.output_stream -> Pxp_core_types.encoding -> unit (* Write the document to the passed * output stream; the passed encoding used. The format * is compact (the opposite of "pretty printing"). * If a DTD is present, the DTD is included into the internal subset. * In contrast to [write], this method uses the display namespace * prefixes instead of the normprefixes. * * Option [~prefer_dtd_reference]: If true, it is tried to print * the DTD as reference, i.e. with SYSTEM or PUBLIC identifier. * This works only if the DTD has an [External] identifier. If * the DTD cannot printed as reference, it is included as text. * The default is not to try DTD references, i.e. to always include * the DTD as text. *) method dump : Format.formatter -> unit end ;; (* Printers for toploop: *) val print_node : 'ext node -> unit ;; val print_doc : 'ext document -> unit ;; (**********************************************************************) (* Experimental: event streams and node trees *) (**********************************************************************) exception Error_event of exn (* The event stream contains an E_error event *) type 'ext solid_xml = [ `Node of 'ext node | `Document of 'ext document ] val solidify : ?dtd:dtd -> config -> 'ext spec -> (unit -> event option) -> 'ext solid_xml (* Reads the event stream by calling the unit->event function, and * creates a node tree according to config, dtd, spec. * * The event stream may be either: * - A document event stream (as generated by `Entry_document). * In this case `Document d is returned. * - A content event stream (as generated by `Entry_content). * In this case `Node n is returned. * * Document streams contain a DTD. The found DTD is used for the * node tree. Content streams, on the contrary, do not contain DTDs. * In this case, an empty DTD is created (in well-formedness mode). * * The [dtd] argument overrides any DTD, no matter whether found * in the stream or freshly created. * * If the DTD allows validation, the returned tree is validated. * * The data nodes are not normalized unless the arriving data events * are already normalized. To get this effect, filter the stream * with Pxp_ev_parser.norm_cdata_filter before calling solidify. * * Ignorable whitespace is not automatically removed. To get this * effect, filter the stream with * Pxp_ev_parser.drop_ignorable_whitespace_filter before calling solidify. * * The uniqueness of ID attributes is not checked. *) val liquefy : ?omit_end: bool -> ?omit_positions:bool -> 'ext solid_xml -> ('a -> event option) (* The converse of [solidify]: The passed node or document is transformed * into an event stream. * * omit_end: If true, the E_end_of_stream event is omitted at the end. * Useful to concatenate several streams. Default: false. * omit_positions: If true, no E_position events are generated. * Default:false. *)