.\" Copyright (c) 2000, 2001 Udo Erdelhoff. All rights reserved. .\" Written for the FreeBSD German Documentation Project .\" .\" Redistribution and use in source and compiled forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" .\" 1. Redistributions of source code must retain the above .\" copyright notice, this list of conditions and the following .\" disclaimer as the first lines of this file unmodified. .\" .\" 2. Redistributions in compiled form must reproduce the above .\" copyright notice, this list of conditions and the following .\" disclaimer in the documentation and/or other materials provided .\" with the distribution. .\" .\" THIS DOCUMENTATION IS PROVIDED BY UDO ERDELHOFF "AS IS" AND ANY EXPRESS .\" OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE .\" DISCLAIMED. IN NO EVENT SHALL UDO ERDELHOFF BE LIABLE FOR ANY DIRECT, .\" INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES .\" (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR .\" SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, .\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING .\" IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF .\" THE POSSIBILITY OF SUCH DAMAGE. .\" .\" $Id: suppe.1,v 1.4 2002/11/24 09:42:08 ue Exp $ .Dd July 26, 2000 .Dt SUPPE 1 .Os FreeBSD .Sh NAME .Nm suppe .Nd SGML UnPretty Printer (Experimental) .Sh SYNOPSIS .Nm .\" .Op Fl Fl firstlevelindent Ar value .\" .Op Fl Fl debug .Op Fl Fl indent Ar count .Op Fl Fl linecount Ar offset .Op Fl Fl maxlinelength Ar count .\" .Op Fl Fl secondlevelindent Ar value .Sh DESCRIPTION Chapter 10 of the .Do .Fx Documentation Project Primer for New Contributors .Dc defines a set of guidelines for the formatting and indentation of documents written for the .Fx Documentation Project. It is easy to follow these rules while writing a new document, especially with the .Dq sgml-mode of .Xr emacs 1 and the scripts and settings for .Xr vim 1 that can be found in .Pa doc/share/examples/vim . Fixing the formatting and indentation of a large existing document, however, is another matter. .Pp The .Nm utility was written to reduce the amount of mindless labor involved in this process. It rebuilds the formatting and indentation of an SGML document in accordance with the style guidelines outlined in Chapter 10 of FDP Primer. .Pp The following arguments are supported by .Nm No : .Bl -tag -width "--maxlinelength count" .It Fl Fl indent Ar count Start with indentation set at column .Ar count instead of .Ar 0 . Use this option to format a section of a document .Pq like a single entry for the FAQ . .It Fl Fl linecount Ar offset Start the counter for the number of lines in the output with .Ar linecount instead of .Ar 0 . Use this option while fixing a section of a document to get more useful .Pq i.e. document-relative instead of section-relative line numbers in the error or warning messages. .It Fl Fl maxlinelength Ar count Maximum length of a line in the output. .El .Pp The arguments obey the usual rules of the .Xr Getopt::Long 3 package: .Bl -bullet -compact .It The second dash is optional. .It Argument names can be abbreviated to uniquess, i.e. .Pq Fl i , Fl l , Fl m . .It Arguments can specified more than once, the option value of the later argument prevails. .El .Pp Unformatted input is read from the stdin, the formatted output is written to the stdout, warning and error messages are written to the stderr. .Ss Basic Concepts The starting point for the development of .Nm was the reformatting of a 300 KByte document that violated almost every formatting and indentation guideline outlined in the FDP primer. The basic concept of .Nm was shaped by this starting point: .Nm does not attempt to .Dq fix the formatting and intendation instructions .Pq spaces, tabs, and newlines , .Nm .Em removes them completly and .Em reformats the document. There are three exceptions to this rule: .Bl -bullet -compat .It The FPI is not modified by .Nm .It Whitespace and or linebreaks are significant inside multi-line SGML comments and several special entities .Pq e.g., Aq programlisting . These sections of the input stream .Po Do protblocks Dc in suppe-speak Pc will keep the original formatting instructions. .It A full stop followed by two .Pq or more spaces or a newline is treated as the end of a sentence. This attribute will be conserved in the output. .El The formatting engine in .Nm is a combination of a rather braindead lex-like scanner, a minimal state machine, and a line-filling algorithm. Input is parsed into chunks; each chunks contains either a complete SGML tag, a complete SMGL comment, or character data. An SGML tag has two attributes: It is either an opening or a closing tag and it belongs to one of these three classes: .Bl -tag -width "``inline''" -compact .It Dq single A tag that cannot have any kind of content. .It Dq normal A tag for an element that cannot contain character data by itself .It Dq inline A tag for an element that can contain character data .El The basis for the classification are three lists of tags that are probably incomplete. An unknown tag causes a fatal error, the remaining unformatted input is sent to stdout and the program aborts. A closing .Dq single tag is evidence of a false entry within the lists of tags and causes a fatal error. .Pp Every opening .Dq normal or .Dq inline tag increases the current indentation level by two and every closing .Dq normal or .Dq inline tag decreases the current indentation level by two. .Dq Normal tags are printed immediately and on a seperate line. .Dq Single tags, .Dq inline tags, character data and SGML comments are joined to form complete SGML entities. The entity is passed to the line-filling algorithm when it is complete or if a .Dq normal tag is encountered. Precious areas outside inline elements are restored and printed immediately. Precious areas inside inline elements are restored and printed by the line-filling algorithm. In both cases, the opening tag is always set at column 0. .Sh DIAGNOSTICS All error messages are sent to the standard error. You should capture them in a seperate file .Ss Could not find the opening If the document contains something remotly resembling a Formal Public Identifier .Pq FPI , .Nm will parse it to get the type of document. If .Nm cannot find an opening tag matchting the document type, it will print this message and abort. .Ss Hey, inline protblock at XXX This warning message is sent if .Nm detects a precious entity inside an inline element. This usual cause is a .Aq programlisting element inside a .Aq para . This causes .Xr tidy 1 to issue a warning message because it has to insert .Aq Br tags into the HTML source. And that is the reason why .Nm complains about these constructs, too. Suggested course of action: Move the precious entity out of the inline element. .Ss Whoa, character data outside inline element in XXX .Ss Whoa, character data before normal tag in XXX These messages indicate that there is something seriously wrong with the document. In both cases, character data was found outside an inline element. The only difference is the type of tag that follows. The character data will be formatted like a legal inline element, but an in-depth check of the resulting SGML is a good idea. .Ss Hey, normal tag XXX inside inline element in XXX This is a variation of the .Dq inline protblock warning message. The suggestion course of action is to fix the source and close the .Dq inline element before this tag. .Ss Hey, XXX closed with It is possible to close the .Sq last tag by simple using .Aq / instead of .Aq /foo . This is considered bad style and a guideline to that effect will be included in the FDP primer. The safest course of action is to pipe the source through .Xr slashexpand 1 .Pa before you use .Nm on this source. The formatting engine contains a minimal fix for this problem which should result in correct formatting. This is just a workaround, use the stack-based correcting engine of .Xr slashexpand 1 to FIX the problem at the source. .Ss Hey, incomplete entity, bailing out If you have forgotten to close an entity, .Nm will reach the end of the file while looking for the end of the entity. The most frequent cause of this message is a typo in the line numbers when you are piping parts of a document through .Nm from within an editor. If you cannt find the error, try piping the source through .Xr slashexpand 1 and use its tag stack to detect the position of the problem. .Pp The .Nm utility exits 0 on success, and >0 if a unrecoverable error occurs. .Sh COMPATIBILITY The formatting and indentation should follow guidelines given in Chapter 10 of the .Fx Documentation Project Primer for New Contributor. .Sh SEE ALSO .Xr emacs 1 , .Xr perl 1 , .Xr slashexpand 1 , .Xr tidy 1 , .Xr vim 1 , .Xr Getopt::Long 3 .Sh HISTORY The first version of the .Nm utility was written in July 2000. .Sh AUTHORS The .Nm utility and this manual page were written by .An Udo Erdelhoff Aq ue@nathan.ruhr.de . .Sh BUGS Complex enough to guarantee the existence of some bugs. The program name contains two of them, this is .Em not a generic SGML formatting tool, but a rather restricted one. And everybody who was subjected to the end result of a .Dq dump caused by an unknown tag knows why this program has this name. It is neccessary to modify the source code of .Nm to add additional tags. This .Em will change in a later version.