Chapter 4 - HTMLDOC from the Command-Line

This chapter describes how to use HTMLDOC from the command-line to convert web pages and generate books.

Getting to the Command-Line on Windows

Do the following steps to access the command-line on Windows:

  1. Click on Start at the bottom left corner of your screen
  2. Click on All Programs
  3. Click on Accessories
  4. Click on Command Prompt

After you have clicked command prompt, your screen should look something like Figure 4-1.

Command prompt window
Figure 4-1: Command prompt window

To see what's in this directory, type the following command:

    dir ENTER

You now have a list of available files and directories that you can use. To access a different directory simply type cd and the name of the new directory. For example, type the following if you want to access a directory called Steve:

    cd Steve ENTER

The Basics of Command-Line Access

To convert a single web page type:

    htmldoc --webpage -f output.pdf filename.html ENTER

What Are All These Commands?

htmldoc is the name of the software.

--webpage is the document type that specifies unstructured files with page breaks between each file.

-f output.pdf is the file name that you will save all the documents into and also the type of file it is. In this example it is a PDF file.

filename.html is the name of the file that you want to be converted and the type of file it is. In this example it is a HTML file.

Try the following exercise: You want to convert the file myhtml.html into a PDF file. The new file will be called mypdf.pdf. How would you do this? (Don't worry, it's answered for you on the next line. But try first.)

To accomplish this type:

    htmldoc --webpage -f mypdf.pdf myhtml.html ENTER

Converting Multiple HTML Files

To convert more than one web page with page breaks between each HTML file, type:

    htmldoc --webpage -f output.pdf file1.html file2.html ENTER

All we are doing is adding another file. In this example we are converting two files: file1.html and file2.html.

Try this example: Convert one.html and two.html into a PDF file named 12pdf.pdf. Again, the answer is on the next line.

Your line command should look like this:

    htmldoc --webpage -f 12pdf.pdf one.html two.html ENTER

We've been using HTML files, but you can also use URLs. For example:

    htmldoc --webpage -f output.pdf http://slashdot.org/ ENTER

Generating Books

Type one of the following commands to generate a book from one or more HTML files:

    htmldoc --book -f output.html file1.html file2.html ENTER
    htmldoc --book -f output.pdf file1.html file2.html ENTER
    htmldoc --book -f output.ps file1.html file2.html ENTER

What are all these commands?

htmldoc is the name of the sofware.

--book is a type of document that specifies that the input files are structured with headings.

-f output.html is where you want the converted files to go to. In this case, we requested the file be a HTML file. We could have made it a PDF (-f output.pdf) or Postscript (-f ouput.ps), too.

file1.html and file2.html are the files you want to convert.

HTMLDOC will build a table of contents for the book using the heading elements (H1, H2, etc.) in your HTML files. It will also add a title page using the document TITLE text (you're going to learn about title files shortly) and other META information you supply in your HTML files. See Chapter 6 - HTML Reference for more information on the META variables that are supported.

Note:

When using book mode, HTMLDOC starts rendering with the first H1 element. Any text, images, tables, and other viewable elements that precede the first H1 element are silently ignored. Because of this, make sure you have an H1 element in your HTML file, otherwise HTMLDOC will not convert anything!

Setting the Title File

The --titlefile option sets the HTML file or image to use on the title page:

    htmldoc --titlefile filename.bmp ... ENTER
    htmldoc --titlefile filename.gif ... ENTER
    htmldoc --titlefile filename.jpg ... ENTER
    htmldoc --titlefile filename.png ... ENTER
    htmldoc --titlefile filename.html ... ENTER

HTMLDOC supports BMP, GIF, JPEG, and PNG images, as well as generic HTML text you supply for the title page(s).

Putting It All Together

    htmldoc --book -f 12book.pdf 1book.html 2book.html --titlefile bookcover.jpg ENTER

Take a look at the entire command line. Dissect the information. Can you see what the new filename is? What are the names of the files being converted? Do you see the titlepage file? What kind of file is your titlefile?

Figure it out? The new file is 12book.pdf. The files converted were 1book.html and 2book.html. A title page was created using the JPEG image file bookcover.jpg.

Chapter 8 - Command Line Reference digs deeper into what you can do with the the command line prompt.