hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) NNNNAAAAMMMMEEEE hhttttpp--aannaallyyzzee - a fast log analyzer for web servers SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS hhttttpp--aannaallyyzzee [--{{hhddmmVV}}] [--33aaeeffnnvvxxyy] [--cc _c_f_g_f_i_l_e] [--oo _o_u_t_d_i_r] [--pp _p_r_i_v_d_i_r] [--ss _o_p_t,...] [--tt _n_u_m,...] [--uu _t_i_m_e] [--ww _h_i_t_s] [--FF _f_o_r_m_a_t] [--GG _s_u_f_f_i_x,...] [--HH _i_d_x_f_i_l_e,...] [--II _d_a_t_e] [--EE _d_a_t_e] [--OO _v_i_r_t_n_a_m_e,...] [--PP _p_r_o_l_o_g] [--RR _d_o_c_r_o_o_t] [--SS _s_r_v_n_a_m_e] [--TT _t_l_d_f_i_l_e] [--UU _s_r_v_u_r_l] [--WW _3_D_w_i_n] [_l_o_g_f_i_l_e[...]] DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN hhttttpp--aannaallyyzzee analyzes the logfile of a web server and creates a detailed statistics of the servers's access load and it's responses in graphical and tabular form. In auto-sense mode (default), hhttttpp--aannaallyyzzee tries to recognize the logfile format automatically. Supportet formats for logfiles are the _c_o_m_m_o_n _l_o_g_f_i_l_e _f_o_r_m_a_t (_C_L_F) and two forms of _e_x_t_e_n_d_e_d or _c_o_m_b_i_n_e_d _l_o_g_f_i_l_e _f_o_r_m_a_t_s, which are basically common format plus user- agent and referrer URL. Those formats are used by most popular web servers such as the Netscape Enterprise server, the NCSA httpd, the Apache server, the Ximati server and many others. hhttttpp--aannaallyyzzee has been highly optimized to process large logfiles at maximum speed. There are two modes of operation with different levels of detail in the logfile analysis: _S_h_o_r_t _s_t_a_t_i_s_t_i_c_s ("daily" mode, option --dd): hhttttpp--aannaallyyzzee generates a short summary of server usage per day. In this mode, it uses a history file to skip entries which have been processed already. By avoiding detailed analysis of the logfile entries, hhttttpp--aannaallyyzzee requires only a fraction of the time which would be required to generate a full statistics report. _F_u_l_l _s_t_a_t_i_s_t_i_c_s ("monthly" mode, option --mm): In this mode, the analyzer generates a full report for a whole month, which creates much more details than in the short statistics mode. The history file is used to produce a summary for the last 12 months without having to analyze the logfiles for those previous periods again. This is the default if no mode is specified explicitely. Usually, you run hhttttpp--aannaallyyzzee in full statistics mode only, since this also includes the short statistics. However, if your logfiles are rather large and if the analyzer causes significant load while generating the full statistics, you could run it in short statistics mode using very short update intervals (in the range of 30 minutes to some hours) to create an up-to-date statistics, and then run it in full statistics mode less often (for example once per day, per week or even per month) to generate a detailed report. The operation modes have been named after their maximum useable update intervals, namely "daily" and "monthly" mode for the short and full statistics respectively. Page 1 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) hhttttpp--aannaallyyzzee maintains a history of the results from previous periods. In short statistics mode, the history contains the daily summary from the first day of the current month until the previous day (yesterday). In full statistics mode, the history contains only summaries for previous months, not for the current one. This means that you should rotate the logfile at the first day of a new month after having generated a full statistics report for the previous month. If disk space is a concern, you can set up a scheme where the logfiles are rotated once per week or even once per day. In this case, to generate a full statistics report you have to feed aallll logfiles for a month into hhttttpp--aannaallyyzzee in ascending order of the date. If a detailed report for a previous month has been generated, you can save the corresponding logfile(s) somewhere and remove them from your production system. LLLLOOOOGGGGFFFFIIIILLLLEEEE FFFFOOOORRRRMMMMAAAATTTTSSSS hhttttpp--aannaallyyzzee understands the three most important logfile formats: CCoommmmoonn LLooggffiillee FFoorrmmaatt ((CCLLFF)) This format is supported by most web servers. The entries contain following information: dns-name - auth-user [date] "clf-request" clf-status ct-length where the fields have following meaning: _d_n_s-_n_a_m_e The full qualified domain name (FQDN) of the host accessing the server. If there is no FQDN available for the host, the IP number is logged by the server. - Unused. _a_u_t_h-_u_s_e_r The user ID provided by the client to access documents which require the user to authenticate itself. [_d_a_t_e] The date of the access as [DD/MMM/YYYY:HH:MM:SS +-ZZZZ]. "_c_l_f-_r_e_q_u_e_s_t" The request in format "method URI proto", where _m_e_t_h_o_d is one of GGEETT, HHEEAADD, PPOOSSTT, PPUUTT, BBRROOWWSSEE, OOPPTTIIOONNSS, DDEELLEETTEE or TTRRAACCEE; _U_R_I is the _U_n_i_f_o_r_m _R_e_s_o_u_r_c_e _I_d_e_n_t_i_f_i_e_r, and _p_r_o_t_o is the protocol parameter containing the HTTP version. _c_l_f-_s_t_a_t_u_s This is the numerical response code from the server. _c_t-_l_e_n_g_t_h This number reflects either the size of the document or the data actually sent over the wire depending on the server. CCoommbbiinneedd LLooggffiillee FFoorrmmaatt ((DDLLFF)) Page 2 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) Some server use the so-called _C_o_m_b_i_n_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t to add the referrer URL and user-agent (browser) to the logfile entries. It looks like the CLF format followed by the referrer URL and the user-agent, where the latter two fields are surrounded by double quotes: CLF "referrer_URL" "user_agent" Unfortunately, the double quotes sometimes appear in broken referrer URLs as, for example, in "http://www.some.host/document.html TARGET=newwin"" Since sometimes there are even referrer URLs which contain double quotes followed by blanks, those entries are not pareseable in an unambiguous way. Although hhttttpp--aannaallyyzzee recognizes the _C_o_m_b_i_n_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t automatically and tries to do it's best to parse the referrer URL correctly, the following format, which avoids this ambiguity should be preferred if possible. EExxtteennddeedd LLooggffiillee FFoorrmmaatt ((EELLFF)) The _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t contains also the user-agent and the referrer URL, but in the opposite order and without the surrounding double quotes: CLF user_agent referrer_URL If this _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t is used, hhttttpp--aannaallyyzzee searches backwards for the protocol specification of the referrer URL (to be precise, it looks for the colon in hhttttpp::) and then for the preceeding blank. This way, even broken referrer URLs which contain blanks are handled correctly in nearly every case. To select this format, just edit the configuration file of your web server manually and reverse the order of the user-agent and referrer URL fields in the logfile format specification (see the Online Documentation of hhttttpp--aannaallyyzzee for more examples). SSSSUUUUMMMMMMMMAAAARRRRYYYY RRRREEEEPPPPOOOORRRRTTTTSSSS Starting with version 2.0, hhttttpp--aannaallyyzzee generates all reports for a year in separate subdirectories to reduce the number of files created for the reports. Those subdirectories are named wwwwww_Y_Y_Y_Y, where _Y_Y_Y_Y is the year of the corresponding period. All filenames listed below are relative to the appropriate subdirectory unless stated otherwise. If you upgrade from the 1.9e version, use the _c_v_t__f_i_l_e_s script included in the distribution to convert old report files into the new directory structure (see the file _I_N_S_T_A_L_L for an example how to do this). There are two user interfaces to the reports, a conventional interface as in previous versions of the analyzer and a frames-based interface. Although the frames-based interface is the preferred method to browse through the statistics report, the navigation in the non-frames version has been improved in hhttttpp--aannaallyyzzee 22..00 by optionally using JavaScript for navigation control and different windows for the display of the reports: Page 3 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) _M_a_i_n _w_i_n_d_o_w This window is used by default for most reports such as the yearly, monthly, daily and weekly summaries, the _T_o_p _N lists or the overviews. If there are hotlinks in the _T_o_p _N lists, they are most often links to the corresponding pages or sites, which will show up in the _V_i_e_w_e_r _w_i_n_d_o_w if those links are followed. The hotlinks in the overviews point to the detailed lists, which are displayed in the _L_i_s_t _w_i_n_d_o_w. _N_a_v_i_g_a_t_i_o_n _w_i_n_d_o_w If JavaScript is enabled in your browser and a summary for a year or a month is loaded in the main window, a small window containing a navigation panel will pop up. If JavaScript is disabled, the navigation panel appears at the bottom of the full summary for the corresponding month in the _M_a_i_n window. In this case, use the _B_a_c_k button of your browser to go back to the navigation panel when browsing through the reports. _L_i_s_t _w_i_n_d_o_w This window is used for the complete lists of URLs, sites, browser types and referrer URLs. Displaying them in a separate window allows easy navigation and causes this rather large lists to be loaded only once if navigated through by following the links in the overviews. _V_i_e_w_e_r _w_i_n_d_o_w This window is used for external pages which are referred to through the hotlinks in the statistics reports. This way, you can visit the pages shown in the reports without having to go forth and back between the summary and the pages listed there. _3_D _w_i_n_d_o_w This window is used for the 3D (VRML) model of the statistics. If you have JavaScript enabled, the window's size will be set to the smallest possible size so that the 3D model fits. You can specify the dimensions of this window in the configuration file using the 33DDWWiinnSSiizzee directive. Do not close the windows if you switch reports, they will be re-used for the appropriate lists. SShhoorrtt ssttaattiissttiiccss mmooddee In short statistics ("daily") mode, hhttttpp--aannaallyyzzee writes the summary into the output file ssttaattss..hhttmmll and updates the daily summaries in the history file wwwwww--ssttaattss..hhiisstt. The short statistics report includes the following information (see the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _r_e_s_u_l_t_s for an explanation of the numbers): - the number of _h_i_t_s per day, - the number of _f_i_l_e_s per day, - the number of _p_a_g_e_v_i_e_w_s per day, - the number of _s_e_s_s_i_o_n_s per day, Page 4 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) - the amount of _d_a_t_a _s_e_n_t (KBytes) per day. FFuullll ssttaattiissttiiccss mmooddee In full statistics ("monthly") mode, hhttttpp--aannaallyyzzee updates the short summary in ssttaattss..hhttmmll and additionally creates the following files: iinnddeexx..hhttmmll in a wwwwww_Y_Y_Y_Y directory is the main page for a given year and contains the total number of _h_i_t_s, _f_i_l_e_s, _p_a_g_e_v_i_e_w_s, _s_e_s_s_i_o_n_s and _d_a_t_a _s_e_n_t per month in tabular and graphical form for the last 12 months. At the end of the year, this file reflects the values for the whole year, while the values for the last 12 months will be written into another index file in a new directory wwwwww_Y_Y_Y_Y. This page is displayed in the _M_a_i_n _w_i_n_d_o_w. ssttaattss_M_M_Y_Y..hhttmmll and ttoottaallss_M_M_Y_Y..hhttmmll contain the total summary for the month _M_M of year _Y_Y in tabular form. The file ttoottaallss_M_M_Y_Y..hhttmmll is the frames version of the report in ssttaattss_M_M_Y_Y..hhttmmll. In the conventional interface, this page is shown in the _M_a_i_n _w_i_n_d_o_w. jjssnnaavv..hhttmmll and nnaavv_M_M_Y_Y..hhttmmll Navigation panels for JavaScript-capable browsers, shown in the _N_a_v_i_g_a_t_i_o_n _w_i_n_d_o_w. ddaayyss_M_M_Y_Y..hhttmmll contains the number of hits, files, pageviews, sessions and data sent per day for the month _M_M of year _Y_Y. This report is shown in the _M_a_i_n _w_i_n_d_o_w. aavvllooaadd_M_M_Y_Y..hhttmmll shows a graphical representation of the average hits per weekday/hour and the top seconds, minutes, hours, and days of the current period. This report is displayed in the _M_a_i_n _w_i_n_d_o_w. ccoouunnttrryy_M_M_Y_Y..hhttmmll contains the list of all countries the visitors of your web server came from. This information is determined by analyzing the _t_o_p- _l_e_v_e_l _d_o_m_a_i_n (_T_L_D) of the hostname. If you have disabled domain name lookups in your web server to decrease response times of your server or if the host isn't configured in the Domain Name System (DNS) for whatever reason, hhttttpp--aannaallyyzzee cannot determine the country a visitor is coming from. All hosts without a name will show up as _U_n_r_e_s_o_l_v_e_d in the country list. Note: Sometimes, systems are intentionally not configured in the DNS, so a percentage of up to 30% for unresolved IP numbers is absolutely normal. The country report shows up in the _M_a_i_n _w_i_n_d_o_w. 33DDssttaattss_M_M_Y_Y..hhttmmll, 33DDssttaattss_M_M_Y_Y..wwrrll..ggzz, 33DDssttaattss_Y_Y_Y_Y..hhttmmll, 33DDssttaattss_Y_Y_Y_Y..wwrrll..ggzz are pre-requisites for the 3D models of the statistics in the _V_i_r_t_u_a_l _R_e_a_l_i_t_y _M_o_d_e_l_i_n_g _L_a_n_g_u_a_g_e (_V_R_M_L). Those models are created Page 5 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) if the option --33 was given at the invocation of hhttttpp--aannaallyyzzee. The yearly models are suitable only on graphic workstations, therefore they are created only if a prolog file is specified using either the option --PP or the VVRRMMLLPPrroolloogg directive in the configuration file. The file 33DDpprroolloogg..wwrrll is provided as an example of such a world. You can build your own worlds with embedded monthly models by creating your own prolog file. For the PC platform, the yearly model file is replaced by a stubs file named 33DDllooggoo..wwrrll..ggzz also included in the distribution. To view any of those models, you need a VRML2.0 compatible plug-in such as the free _C_o_s_m_o_P_l_a_y_e_r from Cosmo Software, which is currently available for Netscape Navigator and MSIE on IRIX and Win95/WinNT platforms. See _h_t_t_p://_c_o_s_m_o._s_g_i._c_o_m/ for more information about Cosmo Software. 3D models show up in the _3_D _w_i_n_d_o_w so that it can be compared to the graphs in the conventional reports. ttooppuurrll_M_M_Y_Y..hhttmmll, ttooppddoomm_M_M_Y_Y..hhttmmll, ttooppuuaagg_M_M_Y_Y..hhttmmll, ttoopprreeff_M_M_Y_Y..hhttmmll Those files contain the _T_o_p _T_e_n lists (actually it's _T_o_p _N, where _N is a configurable number) of the files requested, the domains, the browser types and the referrer URLs. The URLs shown in ttooppuurrll_M_M_Y_Y..hhttmmll are either the real URLs requested by the visitor or an _i_t_e_m (arbitrary text) you choosed to collect certain file names under (see the HHiiddeeUURRLL directive in the configuration file). The domains shown in ttooppddoomm_M_M_Y_Y..hhttmmll are either the second-level domains of the hosts accessing your server if the DNS name is available or an item you choosed to collect certain hostnames under (see the HHiiddeeSSyyss directive in the configuration file). Unresolved IP numbers show up as _U_n_r_e_s_o_l_v_e_d again. The browser types in ttooppuuaagg_M_M_Y_Y..hhttmmll are the different browser types (_u_s_e_r _a_g_e_n_t_s) which have been used by the users to access your web site. If possible, hhttttpp--aannaallyyzzee reduces the name of the browser to the model including the firsdt digit of the version number in this list. Otherwise, the full name as sent by the browser is used. The referrer URLs are the URLs of those web pages, which have a link to some page on your server, and which have been visited by the user before following them to reach your site. If the user did specify an URL manually in his browser, no referrer URL is logged. Also, the browser can choose to not send a referrer URL anyway. The referrer URL reports are displayed in the _M_a_i_n _w_i_n_d_o_w. ffiilleess_M_M_Y_Y..hhttmmll, ssiitteess_M_M_Y_Y..hhttmmll, aaggeennttss_M_M_Y_Y..hhttmmll, rreeffeerrss_M_M_Y_Y..hhttmmll Those files contain a complete overview of the files requested, the domains, the browser types and the referrer URLs, similar to the Top N lists. llffiilleess_M_M_Y_Y..hhttmmll, llssiitteess_M_M_Y_Y..hhttmmll, llaaggeennttss_M_M_Y_Y..hhttmmll, llrreeffeerrss_M_M_Y_Y..hhttmmll Those files contain the complete lists of all files requested, all domains, all browser types and all referrer URLs, similar to the previous reports, but sorted by item (if any) and hits. On frequently accessed sites, this lists can become rather large, so Page 6 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) they are shown in the _L_i_s_t _w_i_n_d_o_w. If you follow the links in the overviews, the appropriate items will be shown in this window so that the lists have to be loaded just once. rrffiilleess_M_M_Y_Y..hhttmmll contains all invalid URLs which caused the server to respond with a _C_o_d_e _4_0_4 (_N_o_t _f_o_u_n_d) status. If there are large number of hits for certain files the server couldn't find, it's probably due to missing inline images or other HTML objects embedded in other pages. This report is displayed in the _M_a_i_n _w_i_n_d_o_w. rrssiitteess_M_M_Y_Y..hhttmmll This file contains a list of reverse domains sorted by the top-level domain. This report is shonw in the _M_a_i_n _w_i_n_d_o_w. ffrraammeess..hhttmmll, hheeaaddeerr..hhttmmll This two files are required for the frames-based user interface. All other files are shared with the ones for the non-frames UI. In the frames-based UI, the _M_a_i_n window is inside the frame, while the _L_i_s_t window is still an external window. The _3_D _w_i_n_d_o_w may be inside the frame or an external window (see the 33DDWWiinnddooww directive). ggrr--iiccoonn..ggiiff This is a small icon displayed on the main page under the root directory for the statistics reports (option --oo or the OOuuttppuuttDDiirr directive in the configuration file), which is used as a switchboard to the various statistics (currently WWW stats only; more coming soon.). If no output directory has been specified when hhttttpp--aannaallyyzzee was run, the reports are created in the current directory. The files containing the detailed lists of files, hosts, browser types, and referrer URLs may optionally placed into a "private" subdirectory to be able to protect them by server authentication (see the option --pp and the PPrriivvaatteeDDiirr directive in the configuration file). IIIINNNNTTTTEEEERRRRPPPPRRRREEEETTTTAAAATTTTIIIIOOOONNNN OOOOFFFF RRRREEEESSSSUUUULLLLTTTTSSSS The statistics report contains among others the following information: - the number of hits, 304's, files, pageviews, sessions, data sent (in KB) - the amount of data requested, transferred, and saved by cache (in KB) - the number of unique URLs, sites, and sessions per month - the number of all response codes other than 200 (_O_K) - the average hits per weekday and for last week - the maximum/average hits per day and per hour - the number of hits, files, 304's, sites, data sent by day - the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period - the top 30 most commonly accessed URLs (hits, 304's, data sent) - the 10 least frequently accessed URLs (hits, 304's, data sent) - the top 30 client domains accessing your server most often Page 7 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) - the top 30 browser types - the top 30 referrer hosts - the overview/detailed list of all files requested - the overview/detailed list of all sites by domain and reverse domain - the overview/detailed list of all browser types - the overview/detailed list of all referrer URLs The following section describes the meaning of all those numbers in the summary report which are not self-explaining: HHiittss (color key: green) A hit is any response from the server on behalf of a request sent from a browser. This includes aannyy response from the server, not only text files or documents. If, for example, a HTML page has two images embedded, the server generates three hits if this page is requested: one hit for the HTML page itself and two hits for the two inline images. FFiilleess (color key: blue) If the user requests a document and the server successfully sends back a file for this request, this is counted as a Code 200 (_O_K) response. Any such response is counted for as a file. Again, "file" here means any kind of a file. CCooddee 330044 ((NNoott MMooddiiffiieedd)) (color key: yellow) A Code 304 (_N_o_t _M_o_d_i_f_i_e_d) response is generated by the server if a document hasn't been updated since the last time it was requested by the user and therefore there was no need to actually send the files for this document. This happens if the browser (or a caching proxy server between the browser and your web server) still has an up-to-date copy of the page in it's local storage (cache) and therefore can display the page without requesting the actual content. This technique is used to reduce network traffic, but it also causes an inaccuracy in the statistics reports regarding the number of visitors, because the browser or proxy usually sends only one such a conditional request per user session if it still holds an up-to-date copy of the file. However, the ratio between "files" and "304's" reflects the efficiency of overall caching mechanisms for at least those hits which made it's way to the server. PPaaggeevviieewwss (color key: magenta) Pageviews are all files which either have a text file suffix (._h_t_m_l, ._t_e_x_t) or which are directory index files. This number allows to estimate the number of "real" documents transmitted by your server. If defined correctly, the analyzer rates text files (documents) as pageviews. Those pageviews do not include images, CGI scripts, Java applets or any other HTML objects except all files ending with one of the pre-defined pageview suffixes, such as ..hhttmmll or ..tteexxtt. See also the PPaaggeevviieeww directive. OOtthheerr rreessppoonnsseess Page 8 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) There are much more responses than only Code 200 (_O_K) and Code 304 (_N_o_t _M_o_d_i_f_i_e_d) responses, especially in the coming standard, the HTTP 1.1 protocol specification. For example, the server could generate a Code 302 (_R_e_d_i_r_e_c_t_e_d) response if a page has moved, a Code 401 (_U_n_a_u_t_h_o_r_i_z_e_d _R_e_q_u_e_s_t) response if access to the document is denied or a Code 404 (_N_o_t _F_o_u_n_d) response if the requested page does not exist on this server. See the HTML specification at _h_t_t_p://_w_w_w._w_3._o_r_g/ for information about all valid responses from a web server. Note that hhttttpp--aannaallyyzzee does recognize HTTP/1.1 responses according to RFC2068. KKBByytteess ttrraannssffeerrrreedd (color key: orange) This is the amount of data sent during the whole summary period as reported by the server. Note that some servers do log the size of a document instead of the actual number of bytes transferred. While in most cases this is the same, if a user interrupts the transmission by pressing the browser's stop button before the page has been received completely, some servers (for example all Netscape web servers) do not log the amount of data transferred but the amount of data which _w_o_u_l_d have been transferred if the user would have completely loaded the page. KKBByytteess rreeqquueesstteedd This is the amount of data requested during the whole summary period. hhttttpp--aannaallyyzzee computes this number by summing up the values of _K_B_y_t_e_s _t_r_a_n_s_f_e_r_r_e_d and _K_B_y_t_e_s _s_a_v_e_d _b_y _c_a_c_h_e (see below). KKBByytteess ssaavveedd bbyy ccaacchhee The amount of data saved by various caching mechanisms such as in proxy servers or in browsers. This value is computed by multiplying the number of Code 304 (_N_o_t _M_o_d_i_f_i_e_d) requests per file with the size of the corresponding file. Note: Because hhttttpp--aannaallyyzzee can determine the size of a file only if the file has been requested at least once in the same summary period, the values for _K_B_y_t_e_s _s_a_v_e_d _b_y _c_a_c_h_e and _K_B_y_t_e_s _r_e_q_u_e_s_t_e_d are just approximations of the real values. UUnniiqquuee UURRLLss Unique URLs are the number of all different, valid URLs requested in a given summary period. This shows you the number of all different files requested at least once in the corresponding summary period. UUnniiqquuee ssiitteess This is the sum of all unique hosts accessing the server during a given _t_i_m_e-_w_i_n_d_o_w . The time-window is hardwired to the length of the current month. This means that if a host accesses your server very often, it gets counted only once during the whole month. Only the sum of the unique hosts per month is listed in the statistics report. SSeessssiioonnss (color key: red) Similar to unique sites, this is the number of Page 9 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) unique hosts accessing the server during a given _t_i_m_e-_w_i_n_d_o_w. This time-window is one day by default for backward compatibility, but it can be changed with the option --uu or the SSeessssiioonn directive in the configuration file. For example, if the time-window is two hours, all accesses from a certain host in less than 2 hours after the first access from this host are lumped together into one session. All following accesses more than 2 hours apart from the first access will be counted as a new session. This way you may get an estimated number of how many sessions are started on different sites to access your server. OOOOPPPPTTTTIIIIOOOONNNNSSSS --hh print a short help list explaining the usage of the options. Use --hhhh to print an even more detailed help. --dd (_d_a_i_l_y _m_o_d_e) generate a short statistics report for the current month only. If a history file exists, the values for the previous days will be read from this history file and the corresponding logfile entries are skipped. If the history file does not exist, the whole logfile will be processed and a history file will be created (unless --nn is also given). --mm (_m_o_n_t_h_l_y _m_o_d_e) generate a full statistics report for a whole month. Although the values from the history file are used usually to create a summary page for the last 12 months, the actual logfile entries always have preceedence over any records in the history file unless the option --ee is also given. The option --mm includes --dd --VV (_v_e_r_s_i_o_n) print the version number of hhttttpp--aannaallyyzzee and exit immediately. --33 create a 3D (VRML) model of the statistics in addition to the tabular reports. To view such a model, you need a VRML2.0-compatible plug-in like _C_o_s_m_o_P_l_a_y_e_r from Cosmo Software, which is currently available for IRIX and Win95/WinNT platforms. --aa ignore all URLs which required authentication. If your statistics report is available to the public, you probably do not want to have secret URLs listed in the report. --ee use the history file even if it contains expired data. If this option is present and you analyze the log entries for several months at once (either in different files or in one single logfile), hhttttpp-- aannaallyyzzee uses the values recorded in the history file for previous months which are present there and therefore skips all logfile entries up to the first day of a month not recorded in the history (usually the current month). This is useful if you rotate your logfile once per quarter and want to have the analyzer skip all entries for a previous month which has been completely processed already some time before. Page 10 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) --ff create a frames-based user interface for the reports in addition to the conventional (non-frames) interface. To view this frames-based version of the report, your browser must support JavaScript. --nn (_n_o _u_p_d_a_t_e) do not update the history file. Useful to generate statistics for previous months (before the last month) without overwriting the current state of the history. Since the history is used to create the report for the last 12 months, this option can be used to not mess up the actual statistics report when analyzing an older period. On the other side, the numbers for this last 12 months on the main page for the current year are not updated, while the numbers on the main page of the corresponding year being analyzed will be updated according to the actual logfile entries read by hhttttpp--aannaallyyzzee. --vv (verbose) comment ongoing processing. Warnings are printed only in verbose mode. If your statistics report contains zero hits, try this option to see whether the logfile were corrupted or whether they are in unrecognized format. If you double --vv, hhttttpp--aannaallyyzzee will print a dot for each new day discovered in the logfile. --xx don't comprise images under the item "All images", but list any filename literally. Normally, hhttttpp--aannaallyyzzee collects the values of all images (*._g_i_f, *._j_p_g, *._i_e_f, *._p_c_d, *._r_g_b, *._x_b_m, *._x_p_m, *._x_w_d, *._t_i_f) under the item "All images" to avoid cluttering up the lists with lots of image URLs. If --xx is given, each image URL is listed literally unless matched by an explicit HHiiddeeUURRLL directive in the configuration file. --cc _c_f_g_f_i_l_e use _c_f_g_f_i_l_e as the configuration file. By using a configuration file, hhttttpp--aannaallyyzzee allows you to pre-define frequently used options and to define the grade of details in the reports. See the section _C_o_n_f_i_g_u_r_a_t_i_o_n _F_i_l_e for a description of the configuration file settings, which are called _d_i_r_e_c_t_i_v_e_s in the following text. --oo _o_u_t_d_i_r use this directory to create the HTML files of the report in. If no directory is specified, the files are created in the current directory. See also the OOuuttppuuttDDiirr directive. --pp _p_r_i_v_d_i_r place the detailed list of URLs, sites, browsers and referrer URLs into this directory. Useful if you grant public access to your statistics reports and want to restrict access to certain lists to your staff only. You need to define an authentication scheme in your web server for this to work correctly. hhttttpp--aannaallyyzzee just places the output files in this subdirectory. If the name of the directory does not start with a `/', it is considered relative to the output directory specified with --oo. See also the PPrriivvaatteeDDiirr directive. Page 11 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) --FF _f_o_r_m_a_t use this logfile format. Valid values for _f_o_r_m_a_t are aauuttoo for auto- sensing the logfile format, ccllff for the _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t, or ddllff and eellff for the two supported forms of the _C_o_m_b_i_n_e_d/_E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t. See the section _L_o_g_f_i_l_e _F_o_r_m_a_t_s above for a description of the formats supported by hhttttpp--aannaallyyzzee. --GG _s_u_f_f_i_x,... rate files ending with _s_u_f_f_i_x as pageviews (text files). The suffix ..hhttmmll is pre-defined by default. You can add 9 more suffixes here, for example ..sshhttmmll, ..tteexxtt and ..hhttmm. The suffix must start with a '.' (dot). Note that each suffix will require another lookup in this table at an early stage of processing, so many suffixes will increase the processing time of hhttttpp--aannaallyyzzee by a significant amount. To specify more than one suffix with a single --GG option, use commas to separate them. See also the PPaaggeeVViieeww directive. --HH _i_d_x_f_i_l_e,... define an additional directory index filename other than _i_n_d_e_x._h_t_m_l. hhttttpp--aannaallyyzzee truncates the URLs containing an index filename so that they merge with `/' or their "base URL", respectively. For example, the "base URL" of /_d_i_r/_i_n_d_e_x._h_t_m_l is /_d_i_r/. The index filename iinnddeexx..hhttmmll is pre-defined already. You can add 9 more names for directory index files here, for example _W_e_l_c_o_m_e._h_t_m_l, _h_o_m_e._h_t_m_l or any other filename defined in the web server's configuration file. Note that each name will require another lookup in this table at an early stage of processing, so many names will increase the processing time of hhttttpp--aannaallyyzzee. See also the IInnddeexxFFiilleess directive. --II _d_a_t_e skip all logfile entries until this day (exclusive). The date may be specified as _D_D/_M_M/_Y_Y_Y_Y _o_r _M_M/_Y_Y_Y_Y , where _M_M is the number or the name of a month. Note that in full statistics mode, _D_D defaults to the first day of the month if absent. If you specify any other day in this mode, unpredictable results may occur. For example, -I Feb _r_e_s_t_r_i_c_t_s _t_h_e _a_n_a_l_y_s_i_s _t_o _t_h_e _F_e_b_r_u_a_r_y _o_f _t_h_e _c_u_r_r_e_n_t _y_e_a_r. --EE _d_a_t_e skip all logfile entries starting from this day on (inclusive). The date format is the same as in --II. To restrict analysis to a certain period, specify the starting date using --II and the first date to be ignored using --EE. For example, -I Jan/98 -E Feb/98 restricts the analysis to January 1998. --OO _v_i_r_t_n_a_m_e,... define additional virtual names for this server. hhttttpp--aannaallyyzzee uses those names to hide certain referrer URLs (_t_h_e _s_e_l_f _r_e_f_e_r_r_e_r _U_R_L_s) in the statistics report. The server's primary name is pre-defined already. See also the VViirrttuuaallNNaammeess directive. --PP _p_r_o_l_o_g use _p_r_o_l_o_g as the prolog file for a yearly VRML model (optional). Page 12 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) The file 33DDpprroolloogg..wwrrll is included in the distribution as an example. Note that the resulting VRML model for a whole year is suitable only for viewing on a graphic workstation. The monthly VRML models do not need a prolog file and can be viewed on any platform without problems. See also the VVRRMMLLPPrroolloogg directive. --RR _d_o_c_r_o_o_t restrict the logfile analysis to the given Document Root. Intented for use with (software-) virtual servers which have an own subdirectory as their document root. If _d_o_c_r_o_o_t is prefixed by a `!', the analysis is restricted to all subdirectories of the "real" server except for _d_o_c_r_o_o_t. There may be only one directory name given at any time. See also the DDooccRRoooott directive in the configuration file. --SS _s_r_v_n_a_m_e use _s_r_v_n_a_m_e for the server name. Useful if the analyzer runs on another system than the web server is running on. If no server name is defined, hhttttpp--aannaallyyzzee uses either the _u_n_a_m_e (_2) or the _g_e_t_h_o_s_t_n_a_m_e (_2) function to determine the name. On most Unix System V platforms, _u_n_a_m_e returns the nodename only (for example, _h_o_s_t), while _g_e_t_h_o_s_t_n_a_m_e usually returns the full qualified domain name (FQDN, for example, _h_o_s_t._m_y._d_o_m_a_i_n) if the DNS is set up properly. See also the SSeerrvveerrNNaammee directive in the configuration file. --TT _t_l_d_f_i_l_e use _t_l_d_f_i_l_e for the list of valid top-level domains (TLDs). This list currently includes all ISO two-letter country domains, the well-known domains ..nneett, ..iinntt, ..oorrgg, ..ccoomm, ..eedduu, ..ggoovv, ..mmiill, ..aarrppaa, ..nnaattoo, and the upcoming new _C_O_R_E top-level domains ..ffiirrmm, ..iinnffoo, ..sshhoopp, ..aarrttss, ..wweebb, ..rreecc, and ..nnoomm. The length of a top-level domain in the TLD file may not exceed 6 characters (if you need to add longer domain names, use the AAddddDDoommaaiinn directive in the configuration file). hhttttpp--aannaallyyzzee uses it's built-in defaults, if no TLD file is given. See also the TTLLDDFFiillee directive and the sample file TTLLDD included in the source distribution. --UU _s_r_v_u_r_l define _s_r_v_u_r_l as the server URL which should be used as a prefix for the hotlinks in the URL list. Useful if the statistics report is created on a different system than the server is running on and for virtual hosts. See also the SSeerrvveerrUURRLL directive. --WW _3_D_w_i_n define the window for the VRML model. The keyword _3_D_w_i_n may be either eexxtteerrnn or iinntteerrnn for display of the VRML model in a new, external window or in the lower half of the main frame respectively (meaningful only in the frames-based interface). --ss _o_p_t,... suppress certain lists in the report. _o_p_t may be one of: AAVVLLooaadd to suppress the average load report (top seconds/minutes/hours), Page 13 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) UURRLLss to suppress the overview of URLs/items and the detailed lists, UURRLLLLiisstt to suppress the detailed list of URLs, CCooddee440044 to suppress the detailed list of Code 404 _N_o_t _F_o_u_n_d responses, SSiitteess to suppress the overview of domains and the hostname lists, RRSSiitteess to suppress the overview of reverse domains, SSiitteeLLiisstt to suppress the detailed list of hostnames, AAggeennttss to suppress the overview of browser types and the detailed lists, RReeffeerrrreerr to suppress the overview of referrers and the detailed lists, CCoouunnttrryy to suppress the country list, PPaaggeevviieewwss to suppress pageview rating (304's are shown instead), GGrraapphhiiccss to suppress all graphs and pie charts, HHoottlliinnkkss to suppress the hypertext links to the pages in the URL lists, IInntteerrppooll to suppress interpolation of the graphs. You can specify more than one _o_p_t with a single --ss option by separating them with a `,' as in: -s Agents,Referrer,Hotlinks.. SSeeee aallssoo tthhee SSuupppprreessss directive. --tt _n_u_m define the size of certain lists. _n_u_m is either a positive number or the value 0 to suppress the corresponding list. You specify the list by appending one of the following characters to the number shown here as '#' (note that the characters are case-sensitive): #UU set the number of entries in the Top N URL list (default: 30), #LL set the number of entries in the least N URL list (default: 10). #SS set the number of entries in the Top N domain list (default: 30), #AA set the number of entries in the Top N agent/browser list (default: 30), #RR set the number of entries in the Top N referrer URL list (default: 30), #dd set the number of entries in the Top N days table (default: 5), #hh set the number of entries in the Top N hours table (default: 24), #mm set the number of entries in the Top N minutes table (default: 5), #ss set the number of entries in the Top N seconds table (default: 5), #FF set the font size for text in detailed lists (default: 2), #HH set the font size for headers in detailed lists (default: 3). #NN set the size of the navigation frame (default: 120) The list of least frequently accessed URLs is generated only if the number of all unique URLs is greater than the sum of the entries in the top URL lists regardless of the list's size actually defined. Use commas to sepcify more than only one _n_u_m per --tt option. See also the directives beginning with TToopp** in the configuration file. --uu _t_i_m_e define the time-window for counting _s_e_s_s_i_o_n_s. See _S_e_s_s_i_o_n_s in the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _r_e_s_u_l_t_s for an explanation of this term. --ww _h_i_t_s set the noise-level to _h_i_t_s. If a noise-level is defined, all URLs, sites, agents and referrer URLs with hits below this level are collected under the item _N_o_i_s_e in the Top N lists and overviews to avoid cluttering up those reports. See also the NNooiisseeLLeevveell directive. Page 14 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) _l_o_g_f_i_l_e(_s) This are the name(s) of the logfile(s) to process. If more than one file is given, they are processed in the order in which their names appear on the command line. hhttttpp--aannaallyyzzee checks for the existance of all files before processing them. If a `-' is specified as the filename, standard input is read. If no file is given, the analyzer either processes the default logfile specified in the configuration file or the standard input. CCCCOOOONNNNFFFFIIIIGGGGUUUURRRRAAAATTTTIIIIOOOONNNN FFFFIIIILLLLEEEE The option --cc allows to define a configuration file which contains pre- defined, server-specific default settings for hhttttpp--aannaallyyzzee. Command line options always take preceedence over the definitions in this configuration file. The configuration file contains a single directive per line. Except for IInnddeexxFFiilleess, PPaaggeeVViieeww, AAddddDDoommaaiinn, IIggnn**, VViirrttuuaallNNaammeess, and HHiiddee**, each directive may appear only once in the configuration file. Following a directive field there are one or two value fields, which must be separated from the directive and each other by one or more tabulators. Blanks are considered a part of the string for the third field only if there is such a field. All directive names are case-insensitive. 33DDWWiinnSSiizzee _w_i_d_t_hx_h_e_i_g_h_t Defines the size of the 3D window. Useful for Netscape Navigator 3.X, which displays scrollbars in the 3D window with standard size (520x420 pixels). Example: 3DWinSize 540x450 33DDWWiinnddooww _k_e_y_w_o_r_d Defines the 3D window the VRML model is displayed in (same as option --WW). The _k_e_y_w_o_r_d may be eexxtteerrnn (default) or iinntteerrnn for display of the VRML model in a new, external window or in the lower half of the main frame respectively. Example: 3DWindow intern AAddddDDoommaaiinn _d_o_m_a_i_n _s_t_r_i_n_g Add names to the domain table causing certain _d_o_m_a_i_n_s to be allocated to some "artificial" top-level-domain _s_t_r_i_n_g. Do not use wildcards here, they are ignored anyway. This directive is useful to collect certain hostnames, for example the local network or the hosts of world-wide operating online services, under some arbitrary _s_t_r_i_n_g (item) instead of under the country they seem to originate from. Example: AddDomain .compuserve.com CompuServe CCuussttLLooggooWW _i_m_a_g_e _s_r_v_u_r_l and CCuussttLLooggooBB _i_m_a_g_e _s_r_v_u_r_l Define images for use as customer logos in the statistics report. This feature is available only in the commercial version of the analyzer. You will have to create two logos, approx. 72x72 pixels in size, one for use on pages with white background (CCuussttLLooggooWW) and another one for use on pages with black background (CCuussttLLooggooBB). Page 15 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) Example: CustLogoW btn/mycompany_sw.gif http://www.mycompany.com/ CustLogoB btn/mycompany_sb.gif http://www.mycompany.com/ DDeeffaauullttMMooddee _m_o_d_e The default operation mode of hhttttpp--aannaallyyzzee. The value field contains either the keyword ddaaiillyy or mmoonntthhllyy for short or full statistics summaries (see also options --dd and --mm). If left undefined, the default is now mmoonntthhllyy. Example: DefaultMode daily DDooccRRoooott _d_o_c_r_o_o_t Restricts logfile analysis to the given Document Root (same as option --RR). Intented for use with (software-) virtual servers which have their documents under a certain subdirectory. If _d_o_c_r_o_o_t is prefixed by a `!', analysis is restricted to all directories except for this subdirectory. There may be only one directory name given at any time. Example: DocRoot /customer/ FFoonnttSSiizzee _s_i_z_e and HHeeaaddSSiizzee _s_i_z_e The font size for text (default: 2) and headers (default: 3) in detailed lists. Example: FontSize 3 HeadSize 4 HHTTMMLLPPrreeffiixx _p_r_e_f_i_x and HHTTMMLLTTrraaiilleerr _t_r_a_i_l_e_r The HTML _p_r_e_f_i_x and _t_r_a_i_l_e_r to be printed after the header section and at the end of the page. If defined, the HHTTMMLLPPrreeffiixx string must include the
tag. If a _f_i_l_e_n_a_m_e is given instead of the _p_r_e_f_i_x or _t_r_a_i_l_e_r, the HTML code is taken from this file. Example: HTMLPrefix HTMLTrailer Back to the internal page. HHiiddeeAAggeenntt _a_g_e_n_t _s_t_r_i_n_g Hide certain browsers under an arbitrary _s_t_r_i_n_g (item). Useful to map a browser's name to an arbitrary _s_t_r_i_n_g (item). Since only the leading part of the browser type is compared against _a_g_e_n_t, there is no need to specify wildcards. In fact, a wildcard suffix is removed from the string, while a wildcard prefix is taken literal. Example: HideAgent Mozilla/4.0 (compatible; MSIE 4. MSIE 4.* HHiiddeeRReeffeerr _r_e_f_e_r_r_e_r _s_t_r_i_n_g Hide certain referrer URLs under an arbitrary _s_t_r_i_n_g (item). Useful to map different referrer URLs for a given host to a common name. Since only the leading string of the referrer URL is compared against _r_e_f_e_r_r_e_r, there is no need to specify wildcards. As in HHiiddeeAAggeenntt, a wildcard suffix is removed from the string, while a wildcard prefix is taken literal. If the second argument contains a string in square brackets, this defines the CGI parameter which specifies the search key for search engines. In this case, the search key will be extracted from the argument list and prominently displayed after the Page 16 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) name of the search engine/web server. See the file ssaammppllee..ccoonnff for more examples on how to use HHiiddeeRReeffeerr. Example: HideRefer http://search.yahoo.com Yahoo [p=] HideRefer http://altavista.digital.com/ AltaVista [q=] HHiiddeeSSyyss _h_o_s_t_n_a_m_e _s_t_r_i_n_g Hide a _h_o_s_t_n_a_m_e under an arbitrary _s_t_r_i_n_g (item). The string may contain blanks. If the first character of _s_t_r_i_n_g is a `[', _t_h_i_s _i_t_e_m _i_s _s_u_p_p_r_e_s_s_e_d _i_n _t_h_e _T_o_p _N lists. Hidden items are accounted for separately, but in the summary they are collected under the description defined with this directive. You may use the wildcard character `*' as either a prefix or as a suffix of the _h_o_s_t_n_a_m_e (as in **..hhoosstt..ccoomm and 119922..116688..1122..**). Hostnames are case-insensitive. When building the list of countries, hhttttpp--aannaallyyzzee determines the country from the top-level domain given in _h_o_s_t_n_a_m_e. If _h_o_s_t_n_a_m_e is an IP number, you can add the country's top-level domain in square brackets to the _s_t_r_i_n_g. Example: HideSys *.mycompany.com MY COMPANY HideSys 192.168.12.* MY COMPANY [COM] HHiiddeeUURRLL_u_r_l _s_t_r_i_n_g Hide an _U_R_L under an arbitrary _s_t_r_i_n_g (item). The string may contain blanks. If the first character of _s_t_r_i_n_g is a `[', _t_h_i_s _i_t_e_m _i_s _s_u_p_p_r_e_s_s_e_d _i_n _t_h_e _T_o_p _N lists. Hidden items are accounted for separately, but in the summary they are collected under the description defined with this directive. You may use the wildcard character `*' as either a prefix or as a suffix of the _U_R_L (as in **..mmaapp and //ssuubbddiirr//**). URLs are case-sensitive. Note, that images are hidden automatically under the term _A_l_l _i_m_a_g_e_s unless --xx was specified. See the ssaammppllee..ccoonnff file included in the distribution for more examples. Example: HideURL /newsletter/* MyCompany's Monthly Newsletter HideURL /robots.txt [Robot control file] IIggnnUURRLL _u_r_l and IIggnnSSyyss _h_o_s_t_n_a_m_e Ignore entries with a specific URL or accesses from a certain system. You may use the wildcard character `*' as either a prefix or as a suffix of the URL or the hostname (as in **..ggiiff ,//ssuubbddiirr//ffiillee** and **..hhoosstt..ccoomm). Note that all URLs/hostnames are compared against any entry in this list while hhttttpp--aannaallyyzzee reads the logfile as opposed to HHiiddeeUURRLL/HHiiddeeSSyyss, which are later looked up for when all URLs/hostnames have been reduced to the set of unique URLs/hostnames. Therefore, using IIggnnUURRLL/IIggnnSSyyss will increase processing time of hhttttpp--aannaallyyzzee by a significant amount. Example: IgnURL *.gif,*.jpg,*.jpeg IInnddeexxFFiilleess _i_d_x_f_i_l_e[,_i_d_x_f_i_l_e...] Defines additional directory index files (same as option --HH). hhttttpp-- aannaallyyzzee truncates the URLs containing an index filename so that they merge with `/' or their "base URL", respectively. For example, the "base URL" of /_d_i_r/_i_n_d_e_x._h_t_m_l is /_d_i_r/. You can define up to 9 more names in addition to the pre-defined name _i_n_d_e_x._h_t_m_l; common names Page 17 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) are _W_e_l_c_o_m_e._h_t_m_l and _h_o_m_e._h_t_m_l, but any other name can be defined in the web server's configuration file. Note that each name requires a linear lookup in this table at a very early stage of processing, so many entries will increase the processing time. Example: IndexFiles Welcome.html,home.html,index.htm LLooggFFiillee _f_i_l_e_n_a_m_e The name of the server's logfile. If you define a default name for the logfile, this file is processed if no other filenames are explicitely specified on the command line. Without such a definition, hhttttpp--aannaallyyzzee always reads _s_t_d_i_n if no other filename is given. Example: LogFile /usr/ns-home/www/logs/access LLooggFFoorrmmaatt _f_o_r_m_a_t use this logfile format. Valid values for _f_o_r_m_a_t are aauuttoo for auto- sensing the logfile format, ccllff for the _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t, or ddllff and eellff for the two supported forms of the _C_o_m_b_i_n_e_d/_E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t. See the section _L_o_g_f_i_l_e _F_o_r_m_a_t_s above for a description of the formats supported by hhttttpp--aannaallyyzzee. Example: LogFormat clf NNaavvWWiinnSSiizzee _w_i_d_t_hx_h_e_i_g_h_t Defines the size of the navigation window which pops up in the conventional interface if JavaScript is enabled. Useful if the browser displays scrollbars when the default size of 420x190 is used. Example: NavWinSize 440x200 NNaavviiggFFrraammee _s_i_z_e Defines the size of the navigation frame in pixels. Useful if the browser displays scrollbars when the default size of 120 pixels is used. Example: NavigFrame 140 NNooiisseeLLeevveell _h_i_t_s set the noise-level to _h_i_t_s. If a noise-level is defined, all URLs, sites, agents and referrer URLs with hits below this level are collected under the item _N_o_i_s_e in the Top N lists and overviews to avoid cluttering up those reports. Example: NoiseLevel 7 OOuuttppuuttDDiirr _d_i_r_e_c_t_o_r_y The name of the directory where the output files should be created (same as option --oo). If left undefined, output files are created in the current directory. Example: OutputDir /usr/www/htdocs/stats PPaaggeeVViieeww _s_u_f_f_i_x[,_s_u_f_f_i_x...] define additional pageview suffixes (same as option --GG). All files with a certain _s_u_f_f_i_x are rated as pageviews (text files, documents). You can define up to 9 more pageview suffixes in addition to the Page 18 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) pre-defined suffix ._h_t_m_l. Common text file suffixes are ..sshhttmmll, ..tteexxtt and ..hhttmm, but any other suffix can be defined in the web server's configuration file. Note that each suffix requires a linear lookup in this table at a very early stage of processing, so many entries will increase the processing time by a significant amount. Example: PageView .shtml,.text,.htm PPrriivvaatteeDDiirr _d_i_r_e_c_t_o_r_y The name of a private directory where the overviews and detailed list of URLs, sites, browsers, and referrer URLs should be created (same as option --pp). Access to this private directory may be granted to staff only by using server authentication. Pathnames not beginning with a `/' are relative to OOuuttppuuttDDiirr. Note that you have to turn on authentication in the server's configuration file also in order to secure this subdirectory. Example: PrivateDir lists RReeggIInnffoo _c_u_s_t_o_m_e_r__n_a_m_e _r_e_g_i_s_t_r_a_t_i_o_n__I_D Defines the customer's name and the registration ID, which are both shown on the main page in the summary report. Example: RegInfo MyCompany 3745JMJZ00000311300000682344 RReeppoorrttTTiittllee _t_i_t_l_e The document title and header to use in the statistics report. hhttttpp--aannaallyyzzee appends the server's name to this string. Example: ReportTitle WWW Access usage for SSeerrvveerrNNaammee _s_r_v_n_a_m_e The official name of the server (same as option --SS). Defaults to the current system name. Useful if the analyzer runs on another system the the web server is running on. Example: ServerName www.mycompany.com SSeerrvveerrUURRLL _s_r_v_u_r_l The URL of the server to be used for hot links in URL lists (same as option --UU). Useful if the reports for your web server are published on another server, for example on an internal developement machine. Also necessary for (software-) virtual servers to have hhttttpp--aannaallyyzzee generate correct hypertext links in the reports. Example: ServerURL http://www.mycompany.com SSeessssiioonn _t_i_m_e The time-window for counting _s_e_s_s_i_o_n_s. See _S_e_s_s_i_o_n_s in the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _r_e_s_u_l_t_s for an explanation of this term. Example: Session 4 hours SSuupppprreessss _o_p_t_i_o_n(_s) Suppress certain lists in the reports (same as --ss). _o_p_t_i_o_n may be one of: AAVVLLooaadd to suppress the average load report (top seconds/minutes/hours), UURRLLss to suppress the overview of URLs/items and the detailed lists, Page 19 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) UURRLLLLiisstt to suppress the detailed list of URLs, CCooddee440044 to suppress the detailed list of Code 404 _N_o_t _F_o_u_n_d responses, SSiitteess to suppress the overview of domains and the hostname lists, RRSSiitteess to suppress the overview of reverse domains, SSiitteeLLiisstt to suppress the detailed list of hostnames, AAggeennttss to suppress the overview of browser types and the detailed lists, RReeffeerrrreerr to suppress the overview of referrers and the detailed lists, CCoouunnttrryy to suppress the country list, PPaaggeevviieewwss to suppress pageview rating (304's are shown instead), GGrraapphhiiccss to suppress all graphs and pie charts, HHoottlliinnkkss to suppress the hypertext links to the pages in the URL lists, IInntteerrppooll to suppress interpolation of the graphs. You may specify more than one argument to SSuupppprreessss by separating them with a `,'. Example: Suppress Country,Interpol TTLLDDFFiillee _f_i_l_e_n_a_m_e use _f_i_l_e_n_a_m_e for the list of top-level domains (same as option --TT). This list includes all ISO two-letter country domains, the well-known domains ..nneett, ..iinntt, ..oorrgg, ..ccoomm, ..eedduu, ..ggoovv, ..mmiill, ..aarrppaa, ..nnaattoo, and the upcoming new _C_O_R_E top-level domains ..ffiirrmm, ..iinnffoo, ..sshhoopp, ..aarrttss, ..wweebb, ..rreecc, and ..nnoomm. The length of a domain in the TLD file may not exceed 6 characters. hhttttpp--aannaallyyzzee uses it's built-in defaults, if no TLD file is given. Example: TLDFile /usr/local/lib/http-analyze/TLD TToopp{DDaayyss,,HHoouurrss,,MMiinnuutteess,,SSeeccoonnddss,,UURRLLss,,SSiitteess,,AAggeennttss,,RReeffeerrss}, LLeeaassttUURRLLss Defines the size of certain Top N tables and lists. If set to zero, the corresponding list will be suppressed. Example: TopURLs 20 LeastURLs 0 TopDays 7 TopHours 12 VViirrttuuaallNNaammeess _n_a_m_e,... The list of additional virtual names of this server. hhttttpp--aannaallyyzzee uses those names to construct a list of _s_e_l_f _r_e_f_e_r_r_e_r _U_R_L_s, which will appear when a HTML page causes the browser to request inline images. Those self referrers are suppressed from the list of referrer URLs. The entries in the list of referrer URLs therefore gives a good impression about external pages referencing your site. The server's name is set as a self referrer by default (using the SSeerrvveerrUURRLL or SSeerrvveerrNNaammee directive, whatever is specified. If you run virtual web servers, you would typically specify each virtual hostname here. Example: VirtualNames www.customer.com,customer.com VirtualNames www.othername.com,othername.com VVRRMMLLPPrroolloogg _f_i_l_e The name of a prolog file for a yearly VRML model (same as option --PP). Pathnames not beginning with a `/' are relative to OOuuttppuuttDDiirr. Page 20 (printed 5/31/98) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) IIIIRRRRIIIIXXXX 5555....3333 ((((LLLLooooccccaaaallll CCCCoooommmmmmmmaaaannnnddddssss)))) hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((8888LLLL)))) Example: VRMLProlog 3Dprolog.wrl EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS After successful compilation of hhttttpp--aannaallyyzzee you can create a statistics report before you choose to install the program permanently. To do so, create a subdirectory for the output files to avoid cluttering up the directory and install the required files using the hhaa--sseettuupp utility: http-analyze setup ------------------ 1) Set up an analyzer configuration for a virtual web server 2) Install the required files in a statistics output directory 3) Brand your copy of http-analyze with the registration ID 4) Exit Please select a function (1-4) [1]: 2 Install required files for http-analyze --------------------------------------- This script copies the required files (3D*, btn/*) into the statistics ... Name of the HTML output directory: testd Directory testd doesn't exist, create it (y/n) [y]: