This chapter describes how to use HTMLDOC from the command-line to convert web pages and generate books.
Do the following steps to access the command-line on Windows:
After you have clicked command prompt, your screen should look something like Figure 4-1.
Figure 4-1: Command prompt window
To see what's in this directory, type the following command:
dir ENTER
You now have a list of available files and directories that you can use. To access a different directory simply type cd and the name of the new directory. For example, type the following if you want to access a directory called Steve:
cd Steve ENTER
To convert a single web page type:
htmldoc --webpage -f output.pdf filename.html ENTER
htmldoc is the name of the software.
--webpage is the document type that specifies unstructured files with page breaks between each file.
-f output.pdf is the file name that you will save all the documents into and also the type of file it is. In this example it is a PDF file.
filename.html is the name of the file that you want to be converted and the type of file it is. In this example it is a HTML file.
Try the following exercise: You want to convert the file myhtml.html into a PDF file. The new file will be called mypdf.pdf. How would you do this? (Don't worry, it's answered for you on the next line. But try first.)
To accomplish this type:
htmldoc --webpage -f mypdf.pdf myhtml.html ENTER
To convert more than one web page with page breaks between each HTML file, type:
htmldoc --webpage -f output.pdf file1.html file2.html ENTER
All we are doing is adding another file. In this example we are converting two files: file1.html and file2.html.
Try this example: Convert one.html and two.html into a PDF file named 12pdf.pdf. Again, the answer is on the next line.
Your line command should look like this:
htmldoc --webpage -f 12pdf.pdf one.html two.html ENTER
We've been using HTML files, but you can also use URLs. For example:
htmldoc --webpage -f output.pdf http://slashdot.org/ ENTER
Type one of the following commands to generate a book from one or more HTML files:
htmldoc --book -f output.html file1.html file2.html ENTER htmldoc --book -f output.pdf file1.html file2.html ENTER htmldoc --book -f output.ps file1.html file2.html ENTER
htmldoc is the name of the sofware.
--book is a type of document that specifies that the input files are structured with headings.
-f output.html is where you want the converted files to go to. In this case, we requested the file be a HTML file. We could have made it a PDF (-f output.pdf) or Postscript (-f ouput.ps), too.
file1.html and file2.html are the files you want to convert.
HTMLDOC will build a table of contents for the book using the
heading elements (H1
, H2
, etc.) in
your HTML files. It will also add a title page using the
document TITLE
text (you're going to learn about
title files shortly) and other META
information you
supply in your HTML files. See Chapter 6 -
HTML Reference for more information on the META
variables that are supported.
Note:
When using book mode, HTMLDOC starts rendering with
the first |
The --titlefile
option sets the HTML file or
image to use on the title page:
htmldoc --titlefile filename.bmp ... ENTER htmldoc --titlefile filename.gif ... ENTER htmldoc --titlefile filename.jpg ... ENTER htmldoc --titlefile filename.png ... ENTER htmldoc --titlefile filename.html ... ENTER
HTMLDOC supports BMP, GIF, JPEG, and PNG images, as well as generic HTML text you supply for the title page(s).
htmldoc --book -f 12book.pdf 1book.html 2book.html --titlefile bookcover.jpg ENTER
Take a look at the entire command line. Dissect the information. Can you see what the new filename is? What are the names of the files being converted? Do you see the titlepage file? What kind of file is your titlefile?
Figure it out? The new file is 12book.pdf. The files converted were 1book.html and 2book.html. A title page was created using the JPEG image file bookcover.jpg.
Chapter 8 - Command Line Reference digs deeper into what you can do with the the command line prompt.