Panix, New York's first
Internet Service Provider


panix.user.html FAQ

Logs and Analysis

pwstat 3.1.3


The purpose of pwstat3 is to produce a meaningful statistics report from the contents of a XCLF weblog, as presented by either getclogs or pwlog3. (Note. For POLF weblogs, use pwstat.) By default, the report is generated in HTML format, for web display. It includes the following information:

  1. Total traffic (in files and bytes delivered) from your site during the reporting period.
  2. Analysis of traffic by date.
  3. Analysis of traffic by hour of the day.
  4. Analysis of traffic by archive name; i.e., filename.
  5. Analysis of traffic by archive type; i.e., the filename extension. Exact archive type breakdown could be difficult to interpret if, for example, you are making significant use of CGI scripts to deliver your site's content, as such will be listed under "CGI".
  6. Analysis of traffic by requesting top-level domain; e.g., gov, com, uk, etc. (Note: See the -r option.)
  7. Analysis of traffic by requesting continent, as determined by examination of the requesting domain. (Note: See the -r option.) The non-country top-level domains are determined as follows: com is treated as international and reported as "Commercial", edu although possibly international is treated as US and reported as "North America", gov is reported as "North America", mil is reported as "North America", net is treated as international and reported as "Network", and org is treated as international and reported as "Noncommercial Organization",
  8. Analysis of traffic by requesting reversed sub-domain; e.g., requests from America On-Line users are reported as com.aol.*. (Note: See the -r option.)
  9. List of URLs most frequently referring visitors to your site.
  10. List of domains most frequently referring visitors to your site.

The information in these sections is fairly self-explanatory, but it is perhaps worth stating here that in all but the total-traffic section, the information is separated into several columns, including:

Requests
This many files were delivered serving the requests described on this line.
%Reqs
The information on this line constitutes so many percent of all the file requests made during the reporting period.
Bytes Sent
This many bytes were delivered serving the requests described on this line.
%Byte
The information on this line constitutes so many percent of the total bytes delivered during the reporting period.

Note.   In the raw weblogs, the hosts requesting web pages are identified by IP number rather than by hostname. You can use the -r option for pwstat3 to have pwstat3 try to resolve IP numbers to the matching hostnames. However, hostname resolution can be a lengthy process, even for moderately sized weblogs. Use at your own risk.

Running pwstat3

If you have already run getclogs or pwlog3, the procedure for obtaining webstats is to simply type:

pwstat3 logfilename > statfilename

If you do not specify an input file name, pwstat3 will automatically call getclogs for you. In other words, instead of typing

getclogs > logfilename
pwstat3 logfilename > statfilename

you can just type

pwstat3 > statfilename

You can also create a statistical extract from more than one input log file by typing

pwstat3 logfile1name logfile2name > statfilename

Options

The complete list of pwstat3's options is included in its help message, which you can obtain by typing

pwstat3 -h

Notable among these options are:

pwstat3 -A
Include in the report a list of the top N user agents (browers/robots) (default is 25).
pwstat3 -b pattern
Include only requests from machines which include this pattern (a Perl regexp). Note: If the -r option is used, this test is made after the IP-to-hostname conversion is attempted,
pwstat3 -B pattern
Omit requests from machines which include this pattern (a Perl regexp). Note: If you specify any combination of the -b, -B, -m and -M options, only one of them will be evaluated. Preference is in the order just given (i.e., -b always wins)
pwstat3 -d somedate
Omit requests before the spcified date. The format of the date must be YYYY:MM:DD; for example, to obtain a report limited to requests on or after August 15, 1995, you would replace somedate with 1995:08:15. Note: Remember that the only requests which will be checked against the specified date are those from the log file(s) you've specified.
pwstat3 -D somedate
Omit requests after the specified date.
pwstat3 -f pattern
Include only requests for filenames which include this pattern (a Perl regexp).
pwstat3 -F pattern
Omit requests for filenames which include this pattern (a Perl regexp).
pwstat3 -g
"Smash" the filenames of graphics, reducing any filename with extension bmp, gif, jpg, jpeg or png to (gfx) This is handy if you have directories full of GIFs and JPEGs that you don't want to see listed individually in your stats.
pwstat3 -j N
In the list of URLs which most frequently referred visitors to your site, include only the N most frequent URLs. If this option is not specified, then the default is 25. If you do not want this section included in your pwstat3 report, then specify pwstat3 -j 0.
pwstat3 -J N
In the list of domains which most frequently referred visitors to your site, include only the N most frequent domains. If this option is not specified, then the default is 25. If you do not want this section included in your pwstat3 report, then specify pwstat3 -J 0.
pwstat3 -k
In the list of URLs which most frequently referred visitors to your site, exclude URLS which match this pattern (a Perl regex). This option is most useful when you want to exclude referrals from within your own domain. For example, if your domain were www.skatecity.com, then you exclude self-referrals by specifying pwstat3 -k 'www\.skatecity\.com'.
pwstat3 -K
In the list of domains which most frequently referred visitors to your site, exclude domains which match this pattern (a Perl regex).
pwstat3 -l
Execute getclogs -o and use the result as input for pwstat3. This results in pwstat3 output based on the previous getclogs reporting period. This option is ignored if you specify an input log filename.
pwstat3 -L
-L Trim CGI parameters from referring URLs.
pwstat3 -m
Omit any request coming from any *.panix.com and *.access.net host.
pwstat3 -M
Omit any request coming from outside the *.panix.com and *.access.net domains.
pwstat3 -o
In the reversed sub-domain section of the report, the last portion of a computer name is normally lopped off; e.g., gatekeeper.nytimes.com would just be reported as com.nytimes.* as would all requests from everyone else in the nytimes.com domain. To force hostnames to be completely reported, invoke the -o option.
pwstat3 -P
If no logfile(s) specified, use STDIN rather than call getclogs.
pwstat3 -q list
Filter log entries by usage type, where "list" can be one or more of c, u, or f. If c, then we want corporate web hits included; if u, include personal web hits; and if f, include ftp transfers. Note: Most Panix users do not have both corporate and personal web traffic, but corporate users may want to use this option to generate separate reports for their web and ftp traffic.
pwstat3 -r
Turns on IP-to-hostname resolving. In the raw weblogs, the machines requesting your webpages are normally identified by IP number, and to turn that number into a computer name, a host lookup must be performed.

See the pwstat page for more information on why you should not use this option unless you really need to know the domains and subdomains of the computers visiting your site.
pwstat3 -s N
Execute getclogs -sN and use the result as input for pwstat3. The N is an integral value indicating the number of bytes at the beginning of the getclogs report to ignore/skip. This option is ignored if you specify an input log filename.
pwstat3 -t
Generate a text-only report. The default is an HTML report.
pwstat3 -u
Normally, unresolved IP numbers are listed in the domain and reversed sub-domain sections of the pwstat3 report as simply "Unresolved". To force all IP numbers to be individually reported in the reversed sub-domain section, invoke this option.
pwstat3 -U
The -u option will likely result in more data than you want, but perhaps you still want some sort of guess-timate of the number of different sites visiting your webpages. The -U option will force partial reporting of unresolved IP numbers, ignoring the last number in the four-number sequence. For example, the IP number 166.84.197.198 would be listed as 166.84.197.*, as would all other machines in the 166.84.197.* network that happened to visit your site.
pwstat3 -v
Verbose display; i.e., announce file openings, errors, etc. on STDERR.
pwstat3 -W
"Fix" short webserver names in URLs (e.g., foo.com becomes www.foo.com).
pwstat3 -y scheme
The pwstat3 output includes near the top a line that says "Approx. Cost of External Transmissions $12.34". This cost is by default calculated using the formula for personal web service. However, the various levels of corporate web service have different cost formulas, but pwstat3 has no way of knowing which to use unless you tell it. Thus, you may specify one of the following schemes: personal, corporate, basic, standard or deluxe. (Note: Panix assesses monthly charges on your total traffic. If you have invoked any pwstat3 options which cause it to skip log entries, then the value calculated will not correspond with what you are actually charged.)

Last modified: Thursday, 02-Sep-2004 21:07:07 EDT
rbs, askanas