panix.user.html FAQ

Logs and Analysis

Webalizer man page

The Webalizer is a web server log file analysis program which produces usage statistics in HTML format for viewing with a browser. The results are presented in both columnar and graphical format, which facilitates interpretation. Yearly, monthly, daily and hourly usage statistics are presented, along with the ability to display usage by site, URL, referrer, user agent (browser) and country (user agent and referrer are only available if your web server procduces Combined log format files).

The Webalizer supports CLF (common log format) log files, as well as Combined log formats as defined by NCSA and others, and variations of these which it attempts to handle intelligently.

RUNNING THE WEBALIZER

The Webalizer was designed to be run from a Unix command line prompt or as a cron job. Once executed, the general flow of the program is:

INCREMENTAL PROCESSING

Version 1.2x of The Webalizer adds incremental run capability. Simply put, this allows processing large log files by breaking them up into smaller pieces, and processing these pieces instead. What this means in real terms is that you can now rotate your log files as often as you want, and still be able to produce monthly usage statistics without the loss of any detail. Basically, The Webalizer saves and restores all internal data in a file named webalizer.current. This allows the program to 'start where it left off' so to speak, and allows the preservation of detail from one run to the next. The data file is placed in the current output directory, and is a plain ascii text file that can be viewed with any standard text editor. It's location and name may be changed using the IncrementalName configuration keyword.

Some special precautions need to be taken when using the incremental run capability of The Webalizer. Configuration options should not be changed between runs, as that could cause corruption of the internal data stored. For example, changing the MangleAgents level will cause different representations of user agents to be stored, producing invalid results in the user agents section of the report. If you need to change configuration options, do it at the end of the month after normal processing of the previous month and before processing the current month. You may also want to delete the webalizer.current file as well.

The Webalizer also attempts to prevent data duplication by keeping track of the timestamp of the last record processed. This timestamp is then compared to current records being processed, and any records that were logged previous to that timestamp are ignored. This, in theory, should allow you to re-process logs that have already been processed, or process logs that contain a mix of processed/not yet processed records, and not produce duplication of statistics. The only time this may break is if you have duplicate timestamps in two seperate log files...any records in the second log file that do have the same timestamp as the last record in the previous log file processed, will be discarded as if they had already been processed. There are lots of ways to prevent this however, for example, stopping the web server before rotating logs will prevent this situation. This setup also necessitates that you always process logs in chronological order, otherwise data loss will occur as a result of the timestamp compare.

COMMAND LINE OPTIONS

The Webalizer supports many different configuration options that will alter the way the program behaves and generates output. Most of these can be specified on the command line, while some can only be specified in a configuration file. The command line options are listed below, with references to the corresponding configuration file keywords.

General Options
       -h      Display  all  available  command  line options and
               exit program.

       -v -V   Display program version and exit program.

       -d      Debug.  Display debugging information  for  errors
               and warnings.

       -g      GMTTime.   Use  GMT  instead of local timezone for
               reports.

       -i      IgnoreHist.  Ignore history.   USE  WITH  CAUTION.
               This will cause The Webalizer to ignore any previ-
               ous monthly history file only.   Incremental  data
               (if present) is still processed.

       -p      Incremental.  Preserve internal data between runs.

       -q      Quiet.  Supress informational messages.  Does  not
               supress warnings or errors.

       -Q      ReallyQuiet.  Supress all messages including warn-
               ings and errors.

       -T      TimeMe.  Force display of  timing  information  at
               end of processing.

       -c file Use configuration file file.

       -n name Hostname.  Use the hostname name.

       -o dir  OutputDir.  Use output directory dir.

       -t name ReportTitle.  Use name for report title.

       -F      LogType.   Specify that the log being processed is
               an ftp log, instead of a web server log.  Log must
               be in standard xferlog format.

       -f      FoldSeqErr.  Fold out of sequence log records back
               into analysis, by treating as  if  they  were  the
               same date/time as the last good record.  Normally,
               out of sequence log records are simply ignored.

       -Y      CountryGraph. Supress country graph.

       -G      HourlyGraph.  Supress hourly graph.

       -x name HTMLExtension.  Defines  HTML  file  extension  to
               use.   If not specified, defaults to html.  Do not
               include the leading period.

       -H      HourlyStats.  Supress hourly statistics.

       -L      GraphLegend.  Supress color coded graph legends.

       -l num  GraphLines.  Specify number of  background  lines.
               Default  is  2.   Use  zero  ('0')  to disable the
               lines.

       -P name PageType.  Specify file extensions that  are  con-
               sidered   pages.    Sometimes   referred   to   as
               pageviews.

       -m num  VisitTimeout.  Specify the Visit  timeout  period.
               Must  be  given  in  HHMMSS format.  Default is 30
               minutes (3000).

       -I name IndexAlias.  Use the filename  name  as  an  addi-
               tional alias for index..
       -M num  MangleAgents.   Mangle  user agent names according
               to the mangle level specified by num.  Mangle lev-
               els are:

               5   Browser name and major version.

               4   Browser name, major and minor version.

               3   Browser  name, major version, minor version to
                   two decimal places.

               2   Browser name, major  and  minor  versions  and
                   sub-version.

               1   Browser name, version and machine type if pos-
                   sible.

               0   All informaiton (left unchanged).
Hide Options
       -a name HideAgent.  Hide user agents matching name.

       -r name HideReferrer.  Hide referrer matching name.

       -s name HideSite.  Hide site matching name.

       -u name HideURL.  Hide URL matching name.
Table size options
       -A num  TopAgents.  Display the top num user agents table.

       -R num  TopReferrers.    Display  the  top  num  referrers
               table.

       -S num  TopSites.  Display the top num sites table.

       -U num  TopURLs.  Display the top num URL's table.

       -C num  TopCountries.   Display  the  top  num   countries
               table.

       -e num  TopEntry.   Display the top num entry pages table.

       -E num  TopExit.  Display the top num exit pages table.

CONFIGURATION FILES

Configuration files are standard ascii text files that may be created or edited using any standard editor. Blank lines and lines that begin with a pound sign ('#') are ignored. Any other lines are considered to be configurgation lines, and have the form "Keyword Value", where the 'Keyword' is one of the currently available configuration keywords defined below, and 'Value' is the value to assign to that particular option. Any text found after the keyword up to the end of the line is considered the keyword's value, so you should not include u anything after the actual value on the line that is not actually part of the value being assigned. The file sample.conf provided with the distribution contains lots of useful documentation and examples as well. General Configuration Keywords

       LogFile name
               Use log file named name.  If none specified, STDIN
               will be used.

       LogType name
               Specify log file  type  as  name.  Values  can  be
               either web or ftp, with the default being web.

       OutputDir dir
               Create output in the directory dir.  If none spec-
               ified, the current directory will be used.

       HistoryName name
               Filename to use for  history  file.   Relative  to
               output  directory  unless  absolute  name is given
               (ie:  starts  with  '/').  Defaults   to   'webal-
               izer.hist' in the standard output directory.

       ReportTitle name
               Use  the  title  string name for the report title.
               If none specified, use the default of (in english)
               "Usage Statistics for ".

       Hostname name
               Set  the hostname for the report as name.  If none
               specified, an attempt will be made to  gather  the
               hostname  via  a  uname(2)  system  call.  If that
               fails, localhost will be used.

       UseHTTPS [  yes | no ]
               Use https:// on links  to  URLS,  instead  of  the
               default http://, in the 'Top URL's' table.

       Quiet [ yes | no ]
               Supress informational messages.  Warning and Error
               messages will not be supressed.

       ReallyQuiet [ yes | no ]
               Supress all messages, including Warning and  Error
               messages.

       Debug [ yes | no ]
               Print  extra debugging information on Warnings and
               Errors.

       TimeMe [ yes | no ]
               Force timing information at end of processing.

       GMTTime [ yes | no ]
               Use GMT (UTC) time instead of local  timezone  for
               reports.

       IgnoreHist [ yes | no ]
               Ignore  previous  monthly  history file.  USE WITH
               CAUTION.  Does not prevent Incremental  file  pro-
               cessing.

       FoldSeqErr [ yes | no ]
               Fold  out of sequence log records back into analy-
               sis by treating them  as  if  they  had  the  same
               date/time  as the last good record.  Normally, out
               of sequence log records are ignored.

       CountryGraph [ yes | no ]
               Display Country Usage Graph in output report.

       HourlyGraph [ yes | no ]
               Display Hourly Graph in output report.

       HourlyStats [ yes | no ]
               Display Hourly Statistics in output report.

       PageType name
               Define the file extensions to consider as a  page.
               If  a  file is found to have the same extension as
               name, it will be  counted  as  a  page  (sometimes
               called a pageview).

       GraphLegend [ yes | no ]
               Allows   the  color  coded  graph  legends  to  be
               enabled/disabled.

       GraphLines num
               Specify the number of background  reference  lines
               displayed  on  the  graphs  produced.   Disable by
               using zero ('0'), default is 2.

       VisitTimeout num
               Specifies the visit timeout value.  Default is  30
               minutes.   A visit is determined by looking at the
               difference in time between the  current  and  last
               request  from  a specific site.  If the difference
               is greater or equal  to  the  timeout  value,  the
               request is counted as a new visit.

       IndexAlias name
               Use name as an additional alias for index.*.

       MangleAgents num
               Mangle user agent names based on mangle level num.
               See the -M command line switch for  mangle  levels
               and  their  meaning.   The  default  is  0,  which
               doesn't mangle user agents at all.

       Incremental [ yes | no ]
               Enable Incremental mode processing.

       IncrementalName name
               Filename to use for incremental data.  Relative to
               output  directory unless an absolute name is given
               (ie:  starts  with  '/').   Defaults  to   'webal-
               izer.current' in the standard output directory.

Top Table Keywords
       TopAgents num
               Display the top num User Agents table. Use zero to
               disable.

       TopReferrers num
               Display the top num Referrers table. Use  zero  to
               disable.

       TopSites num
               Display  the top num Sites table. Use zero to dis-
               able.

       TopKSites num
               Display the top num Sites (by KByte)  table.   Use
               zero to disable.

       TopURLs num
               Display  the  top num URLs table. Use zero to dis-
               able.

       TopKURLs num
               Display the top num URLs (by  KByte)  table.   Use
               zero to disable.

       TopCountries num
               Display  the  top  num Countries in the table. Use
               zero to disable.

       TopEntry num
               Display the top num Entry Pages in the table.  Use
               zero to disable.

       TopExit num
               Display  the top num Exit Pages in the table.  Use
               zero to disable.

       TopSearch num
               Display the top num Search Strings in  the  table.
               Use zero to disable.

       Hide/Ignore/Group/Include Keywords

       HideAgent name
               Hide User Agents that match name.

       HideReferrer name
               Hide Referrers that match name.

       HideSite name
               Hide Sites that match name.

       HideURL name
               Hide URL's that match name.

       IgnoreAgent name
               Ignore User Agents that match name.

       IgnoreReferrer name
               Ignore Referrers that match name.

       IgnoreSite name
               Ignore Sites that match name.

       IgnoreURL name
               Ignore URL's that match name.

       GroupAgent name [Label]
               Group  User Agents that match name.  Display Label
               in 'Top Agent' table if given (instead of name).

       GroupReferrer name [Label]
               Group Referrers that match name.  Display Label in
               'Top Referrer' table if given (instead of name).

       GroupSite name [Label]
               Group  Sites  that  match  name.  Display Label in
               'Top Site' table if given (instead of name).

       GroupURL name [Label]
               Group URL's that match  name.   Display  Label  in
               'Top URL' table if given (instead of name).

       IncludeSite name
               Force  inclusion  of sites that match name.  Takes
               precedence over Ignore# keywords.

       IncludeURL name
               Force inclusion of URL's that match  name.   Takes
               precedence over Ignore# keywords.

       IncludeReferrer name
               Force  inclusion  of  Referrers  that  match name.
               Takes precedence over Ignore# keywords.

       IncludeAgent name
               Force inclusion of User Agents  that  match  name.
               Takes precedence over Ignore* keywords.

HTML Generation Keywords
       HTMLExtension text
               Defines  the  HTML file extension to use.  Default
               is html.  Do not include the leading period!

       HTMLPre text
               Insert text at the very beginning of the generated
               HTML  file.   Defaults to a standard html 3.2 DOC-
               TYPE record.

       HTMLHead text
               Insert text within the  block of  the
               HTML file.

       HTMLBody text
               Insert text in HTML page, starting with the 
               tag.  If used, the first line must be a 
               tag.  Multiple lines may be specified.

       HTMLPost text
               Insert  text  at  top (before horiz. rule) of HTML
               pages.  Multiple lines may be specified.

       HTMLTail text
               Insert text at bottom of the HTML page.  The  text
               is  top and right aligned within a table column at
               the end of the report.

       HTMLEnd text
               Insert text at the very end of the HTML page.   If
               not specified, the default is to insert the ending
                and  tags.  If used, you must  sup-
               ply these tags yourself.

FILES

       webalizer.conf      Default    configuration   file.    Is
                           searched for in the current  directory
                           and  if not found, in the /etc/ direc-
                           tory.

       webalizer.hist      Monthly history file for  previous  12
                           months.  (can be changed)

       webalizer.current   Current  state  data file (Incremental
                           processing).  (can be changed)