Introduction | The authoring process |
Server application Block diagram. | Perl programming. |
Scheme programming. | C programming. |
Examples. | Source and Binary Code Availability. |
The generic Next script. | Limitations and future directions, support. |
Credits. | Copyright. |
This paper describes Chtml, which is a way to abstract html tricks from programming tricks by considering html files, or portions of html files, to be templates or chunks of html upon which simple substitutions are made. A programmer can use techniques described here directly or through another layer of abstraction (such as a component doing result.write inside an active server page, or document.write inside a client-side scripting environment, hence Chtml is not in opposition to those techniques and instead is potentially complimentary).
In fact the methods here have to do with generating arbitrary text, as seen by the use of a template in the chtml compiler mode which generates C code that can be statically linked. By specifying a template in some other language, such as Java, Javascript, Visual Basic, or Perl, it becomes possible to create data structures suitable for use by runtime library components written in other languages.
The goals of Chtml are to:
CHTML works by transforming static HTML documents residing on the server into dynamic HTML documents that are actually seen by the user. This is similar to but different from server-side-include (e.g. shtml) or web-sql/gsql mechanisms in that it:
An ultimate implementation method for this goal could be made using tools capable of full semantic parsing of a formal application of ISO Standard 8879:1986 - Standard Generalized Markup Language (http://www.sil.org/sgml/sgml.html ), such as the proposed ISO/DIS 10179.2, Document Style Semantics and Specification Language, DSSL ( http://www.jclark.com/dsssl/) which provides a good candidate formalism with its STTP (SGML Tree Transformation Process).
However, the implementation of Chtml had to be designed to be simple enough to meet strict time-to-market goals, and the author could not assume that tools were available that could manipulate SGML, as only basic HTML tools were generally available at the time. By adhering to a minimal set of conventions the abstractions needed to acheive the goals can be obtained, through the employment of simple string substitution techniques which can be be implemented quickly.
Note from 19-DEC-1997. A more likely future implementation for 1998 would be based on XML and an api into the Document Object Model. See http://www.w3.org/xml. However, it turns out that XML does not obsolete the utility of this package, which has now found use in making it easier to produce various XML output formats, and is especially helpful in that applications which were producing HTML can be converted to XML output with little effort. The formatting of email messages, both in plain text and in compound mime formats is also facilitated.
Note the capability for chtml templates to be compiled into data structures that are compiled by the C compiler and linked into efficient programs. It still remains a fact of life that embedding printf/output statements containing markup in C code is ugly and potentially extremely inefficient due to the need to compute and buffer all output before an accurate content-length can be calculated. With the chunked-encoding technique available in HTTP/1.1 the accurate content-length calculation is no longer needed to support persistent connections, but having the expected size of the page before downloading it is still a good user-interface feature in situations involving large files and/or slow network links or servers.
The authoring process of derivative works differs slightly in that html-capture techniques (such as the personal proxy server described in siod) might be used in reverse engineering an existing application in order to obtain the raw materials for the new html templates.
Derivative works may also be readily synthesized from idioms and components created for previous applications. In particular the chtml-link technique is used to build up libraries of reusable objects. See the description under the heading CHTML Link Example.
The key strings are chosen so as to be easy to identify in an HTML document. The external names provide the handles by which the CGI scripts can operate on the document in the following ways:<!--CHTML-INTERFACE fname :: __fname_value__ :: user visible datum. lname :: __lname_value__ :: user visible datum. username :: __username_value__ :: user visible datum. password :: __password_value__ :: passed along hidden input. signature :: __MD5_SIGNATURE__ :: hidden, verifies fields. -->
If the external name is empty, and the internal name is the string include then the comment field is taken as the name of a file from which to read lines to encorporate into the interface specification. This is a bit of a kludge but has proven to be convenient when many files are to be parsed against a standard header.
<!--CHTML-INTERFACE :: include :: standard-interface.txt -->
An object can also be bracketed using the pair BEGIN-OBJECT and END-OBJECT. This is used in the linking feature. Outside of that feature it has the effect of a region of text with a repeat count of 0.<TABLE BORDER CELLSPACING=0 CELLPADDING=5> <TR> <!--BEGIN-REPEATING-OBJECT-__COLUMNCOUNT__--> <TH ALIGN=TOP VALIGN=TOP>__COLUMN__</TH> <!--END-REPEATING-OBJECT--> </TR> <!--BEGIN-REPEATING-OBJECT-__ROWCOUNT__--> <TR ALIGN=TOP> <!--BEGIN-REPEATING-OBJECT-__COLUMNCOUNT__--> <TD VALIGN=TOP>__ITEM__</TD> <!--END-REPEATING-OBJECT--> </TR> <!--END-REPEATING-OBJECT--> </TABLE>
In order to save on screen space and eyestrain the less verbose words REPEAT, /REPEAT, and OBJECT, /OBJECT may be used in place of BEGIN-REPEATING-OBJECT, END-REPEATING-OBJECT, and BEGIN-OBJECT, END-OBJECT.
In order to make it easier to balance nested repeating objects you are allowed (in the C and Scheme versions) to specify an object name in the END-REPEATING-OBJECT call:
<!--BEGIN-REPEATING-OBJECT-__COLUMNCOUNT__--> <!--END-REPEATING-OBJECT-__COLUMNCOUNT__->
If you do not specify an ending object name then the most recently opened object will be closed, otherwise the object name must match the the currently opened object or an error will result. If you specify the verbose level of 2 to the chtml command then the line numbers of the various begin/end sections will be printed out.
$ chtml -v2 something.html
{Chtml Templates}------------> [Chtml Server Application] [ such as: ] {SQL style Servers} <--------> [ - registration/signup ] [ - polling, surveying ] {RPC style Servers} <--------> [ - role playing, games ] [ - info search ] {Other Files} <--------------> [------------------------] || || \/ {HTML, GIF, etc Files} -----------> [HTTP SERVER] || || \/ [WEB BROWSER CLIENT]
&ChtmlFilterFile($html_file, %assoc_array);Each key in the associative array is the name of an item specified on the left hand side of the CHTML-INTERFACE comment in the template file.
The value of each key in the associative array is the replacement text to be used in place of the internal key string from the template file (which may represent a repeating object count).
Note: When the value of a key (the replacement text) contains the ascii rubout character \177 (^?) ($chtml_rub) then it is specially handled during the filtering process. The value used in a single substitution will be the substring of everything from the beginning of the string up to the rubout character. The substring of everything after the rubout character is then stored back into the associative array as the value of the key to be used for the rest of the file processing. This mechanism is essential to the usefulness of the REPEATING-OBJECT construct, but can also be used in other situations.
(define *doc1* (load-chtml "filename1"))The representation of an optimized document is inductively defined to be:
(define *chunk1* (chtml-object 'table *doc1*))Or it can output a document to a stream:
(write-chtml stream hash-table *doc2*)The hash-table plays the same role as the associative array does in the perl implementation. It gives a mapping between external module interface names and substitution text and/or object repeat counts. The value of a key in this mapping may be a string, a number, or a chtml object which is handled in a recursive manner, allowing objects to be substituted into other objects. The value can also be a list or a lexically-scoped procedure that can be called upon to generate sequence of values. In this way a *chunk1* from *doc1* could be inserted into *doc2* to be used as an idiom once or multiple times; while the *chunk1* may itself have substitutions made into it from other bindings established in the hash table by the caller of write-chtml.
If the value of a key is a list then it is decremented as long as this would not make the list empty. An exception to the lookup convention is that if the key starts with the character "." then the rest of the name after the "." is used as the actual key and no list decrementing will take place.
As a consequences of the implementation in the lisp dialect SIOD there is available a canonical, fast loading, binary disk-file representation of compiled chtml; because any lisp object can be saved and restored using the fast-save and fast-load procedures. These are packed into a compilation command for Chtml:
(compile-chtml "filename" "output-filename")The resulting output file may be loaded using:
(car (load "output-filename" t))even in a SIOD environment without chtml.scm loaded. The swrite built-in-function may be instead of write-chtml.
The binary file format of data is based on one-byte opcodes followed by opcode-dependant arguments. Lengths are written as longs in the manner native to the cpu architecture. This format could be read by Chtml implementations in Perl and C/C++, in addition to being natural for scripts written in the SIOD dialect. Note that this binary format is not portable across different machine architectures.
opcode | type | format | Description |
---|---|---|---|
2 | number | DATA | DATA is a double, read sizeof(double). |
3 | symbol | LENGTH STRING | Print name of symbol follows as LENGTH more bytes |
13 | string | LENGTH STRING | string follows as LENGTH more bytes |
16 | array | LENGTH * | What follows are LENGTH more objects. |
A compiled file might contain the following data, displayed using the modern lisp printing conventions:
#(1 " " "<P><INPUT TYPE=submit VALUE=\"OK" title "\"></Center> " " " #(res_count "<TR><TD><INPUT TYPE=RADIO NAME=\"prinum\" VALUE=\"" title "\"> " "<TD>" fname "</TD><TD>" lname "</TD> " "<TD>(" fname ") ") "</TABLE> " "<P><INPUT TYPE=submit VALUE=\"OK" title "\"></Center> " " ")
Note: The scheme and therefore the chtml C versions support nested repeating objects, and also allow lines with an object marker comment to contain markup before and after the object marker. The Perl version is more restricted. Marker comments should be on an isolated line and nested objects are not supported.
The html templates are compiled using the chtml command, documention on the arguments is available using the unix man command or in the file chtml.txt.
The preparsed templates are loaded into memory at runtime, manipulated, and finally freed. Or the templates may be output as C code data structures to be compiled with the C compiler and statically or dynamically linked into the program.
Fundamental to using these templates is the concept of tabular sequenced data. There are many ways to represent tables and sequences and do output in C programs. If your programming environment already has suitable libraries and/or C++ classes for handling this then you can use those ways, the chtml api is flexible enough to handle them. But if you don't have strong support for strings and tables in your environment you should seriously consider using the string item library provided by the stritem.c module, the functions for which are also declared in the chtml.h file.
In general using chtml for output allows a program to do all "data/business/logical computations" before getting into the actual final output phase.
Here are three advantages of doing all the data computations needed for output ahead of the actual output phase:
On the other hand, if you do not want to do all the data computations before going to the output phase, for example, because you are implementing a time-consuming operation and you want to send the browser partial results; there are two obvious implementation possibilities:
In common applications we have found these idioms to be useful:
<!--CHTML-INTERFACE ?COLOR,GREEN,RED,BLUE :: __COLOR__ =.ITEM,.SELECTED_ITEM,SELECTED, :: xxxSELECTEDxxx '0 :: __DEBUG__ :: Set to 1 for debug output 0 otherwise. @debug_print,3000 :: __debug_print__ --> <!--BEGIN-REPEATING-OBJECT-__DEBUG__--> <!-- __debug_print__ --> <!--END-REPEATING-OBJECT-->
<!doctype html public "-//IETF//DTD HTML//EN//2.0"> <!-- name: homes.html purpose: example use of Chunks of HTML. author: george j. carrette $Id: chtml.html,v 1.40 1998/06/19 16:28:38 gjc Exp $ --> <!--CHTML-INTERFACE sitename ::__sitename__ :: name of web site. usercount ::__usercount__ :: number of users to list. username ::__username__ :: a list of user names. fullname ::__fullname__ :: a list of full names. urlname ::__urlname__ :: a list of urls. querytime ::__querytime__ :: for performance measurements. --> <html> <head><title>Home Pages on __sitename__</title></head> <body> <CENTER><H1>Home Pages on __sitename__</H1></CENTER> <P><CENTER><TABLE BORDER=1> <TR><TH ALIGN=LEFT>username</TH> <TH ALIGN=LEFT>Full Name</TH></TR> <!--BEGIN-REPEATING-OBJECT-__usercount__--> <TR> <TD><A href="__urlname__">__username__</A></TD> <TD>__fullname__</TD> </TR> <!--END-REPEATING-OBJECT--> </TABLE></CENTER> <!-- query took __querytime__ seconds cpu time. --> </BODY> </HTML>
#!/usr/local/bin/siod -v0,-m3 -*-mode:lisp-*- ;; name: homes-scm.cgi ;; purpose: illustrate chunks of html cgi application ;; author: george j. carrette ;; $Id: chtml.html,v 1.40 1998/06/19 16:28:38 gjc Exp $ (require'chtml.scm) (define (get-homes) (let ((item nil) (homes nil) (gecos nil)) (while (set! item (getpwent)) (if (not (access-problem? (string-append (cdr (assq 'dir item)) "/public_html") "r")) (begin (set! gecos (cdr (assq 'gecos item))) (set! homes (cons (list (cdr (assq 'name item)) (substring gecos 0 (string-search "," gecos))) homes))))) (qsort homes string-lessp car))) (define (main) (let ((h (cons-array 10)) (l (get-homes)) (form (load-chtml "homes.html"))) (hset h 'usercount (length l)) (hset h 'username (mapcar car l)) (hset h 'fullname (mapcar cadr l)) (hset h 'urlname (mapcar (lambda (x) (string-append "/~" (car x))) l)) (hset h 'querytime (car (runtime))) (hset h 'sitename (getenv "SERVER_NAME")) (writes nil "Content-type: text/html\n\n") (write-chtml nil h form)))
chtml link.html chtml link1.html chtml link2.html chtml :action=link link.html-bin link1.html-bin link2.html-bin chtmlt link.html-bin-bin
<!--CHTML-INTERFACE IDIOM1::__IDIOM1__ IDIOM2::__IDIOM2__ TITLE::__TITLE__ --> <html> <head> <TITLE>Example link technique</TITLE> </HEAD> <BODY> <H1>__TITLE__</H1> <P>Here will pull in IDIOM1: __IDIOM1__ <P>Here will pull in IDIOM2: __IDIOM2__ <P>That is all. </BODY> </HTML>
<!--CHTML-INTERFACE IDIOM1::$$IDIOM1$$ :: we define this IDIOM3::$$IDIOM3$$ :: we reference this X::__X__ Y::__Y__ --> <html> <head> <TITLE>Define Some Idioms</TITLE> </HEAD> <BODY> <H1>Define Some Idioms</H1> <!--BEGIN-OBJECT-$$IDIOM1$$--> [Actually, we expand into references to __X__ and __Y__ and also expect IDIOM3 = $$IDIOM3$$ to be available.] <!--END-OBJECT--> </BODY> </HTML>
<!--CHTML-INTERFACE IDIOM2::$$IDIOM2$$ :: we define this IDIOM3::$$IDIOM3$$ :: we define this too A::__A__ B::__B__ --> <html> <head> <TITLE>Define Some Idioms</TITLE> </HEAD> <BODY> <H1>Define Some Idioms</H1> <P> <!--BEGIN-OBJECT-$$IDIOM2$$--> [Idiom 2 expands into references to __A__ and __B__] <!--END-OBJECT--> <P> <!--BEGIN-OBJECT-$$IDIOM3$$--> [Idiom 3 is just this chunk of text] <!--END-OBJECT--> </BODY> </HTML>
The author would also like to thank Joan O'Brien and Laird Popkin for their helpful comments.
The C implementation of the CHTML api saw first use at Information Access Company, where Evan Morton, Lynne Dao, Tom Vancor, Leonid Gernovski, and Tim Strickland provided particularly good suggestions from the point of view of serious application writers under considerable time pressure to get new products out the door. The linking and inclusion features responded to some of their needs as application templates became larger and more complex over time.
<FORM action="next.cgi/homes.html-bin?fullname=somebody" METHOD="POST"> <INPUT TYPE="HIDDEN" NAME="sitename" VALUE="__SITE__"> <INPUT TYPE="HIDDEN" NAME="usercount" VALUE="2"> User1: <INPUT TYPE=TEXT NAME="username" VALUE="user1"> User2: <INPUT TYPE=TEXT NAME="username" VALUE="user2"> <INPUT TYPE="SUBMIT" VALUE="Click Here"> </FORM>
With a bit of javascript and java you could get a lot of real application work using next.cgi, even if that and perhaps a simple mailer are the only cgi scripts your web hosting service makes available to you.
The next.cgi can also be used to interactively test the output of a template given a set of inputs, for example, this tests the sgml validity of a result:
#!/bin/sh QUERY_STRING="sitename=TEST_SITENAME&usercount=3&\ username=u1&username=u2&username=u3&fullname=f1&fullname=f2&\ fullname=f3&querytime=100" REQUEST_METHOD="GET" PATH_INFO="/homes.html-bin" NEXT_NOTYPE=1 export QUERY_STRING REQUEST_METHOD PATH_INFO NEXT_NOTYPE ./next.cgi | nsgmls -s
Todo: Release netscape server nsapi and apache versions of next.
Also, document other scripts include, sql_sybase.cgi, cookie.cgi, sp_help.cgi provided with this distribution.
If the runtime overhead of page creation is a concern then simply use the techiques in make files and batch processes instead of using them in dynamic server applications.
It is important to use subroutines to hide the awkwardness of establishing sequences for substitution into the default selection values for <SELECT> style and other inputs. Or utilize the "?" construct with chtml_stritem_eval.
Finally the support for nested substitution of chunks of html is not fully developed.
This is unsupported software but if you have ideas or comments feel free to send email to gjc@delphi.com.
The distribution is in the form of a compressed tar file: chtml.tgz. You must have the gnu gunzip to decompress and some unix compatible tar utility to extract the individual files. Unix sources for gzip and tar may be obtained from gatekeeper.dec.com under pub/GNU.
An INFO-ZIP archive of sources chtml.zip is provided too.
To use the template compiler for the C version you need SIOD. The chtml.sh script sometimes helps people deal with the inflexibilities of Unix directory structures for installing SIOD in nonstandard ways.
The file chtml_i386.zip contains binaries which may be used with windows SIOD dll's. Note that the command to create the chtml.exe is:
The existence of the binaries in chtml_i386.zip is mostly just proof of the ability to run this stuff under windows with the VC++ compiler, because anyone using it with the VC++ compiler would obviously be able to recompile from sources at will, and the use of the perl and scheme versions is a portable no-brainer. Note that the WIN32 port has shown some problems with the code, particularly when it is run in debug mode with the standard VC++ debug assertions.> csiod :o=chtml.exe chtml-cmp.scm chtml.scm
Master location for all of these is under http://people.delphi.com/gjc.
ENHANCEMENTS COPYRIGHT (c) 1997 BY INFORMATION ACCESS COMPANY. ALL RIGHTS RESERVED. COPYRIGHT (c) 1995-1996 BY NEWS INTERNET SERVICES ALL RIGHTS RESERVED. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation, and that the name of News Internet Services not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. THIS SOFTWARE IS MADE AVAILABLE WITHOUT CHARGE, AS-IS. NEWS INTERNET SERVICES DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NEWS INTERNET BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.