|
The goals of pwlog include reducing the length and mystery of a typical getlogs output.
The default action of pwlog is to abbreviate the directory path in filenames and to drop the referral information. As of this writing, a "hit" as reported by getlogs includes "/htdocs/userdirs/userid" at the start of a file in personal webspace and "/htdocs/userid" for a file in corporate webspace. pwlog abbreviates these as "(u)" and "(c)", respectively. Thus, using the same example log information as in the getlogs description, invoking pwlog without any of its options set would result in:
3819 WWW 182 1998:09:01:01:24:58 (u)/Skate 209.240.199.53 301 www1 3819 WWW 203 1998:09:01:03:25:47 (u)/Skate 204.244.93.232 301 www2 3819 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt 166.84.197.198 200 ftp 3819 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi 24.112.48.33 200 web4 3819 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi 24.112.48.33 302 web4 3819 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 web4 3819 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt 204.123.9.47 200 web4 3819 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 web4 3819 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 195.133.10.89 200 web4 3819 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 195.133.10.89 200 web4 3819 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 195.133.10.89 200 web4 3819 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 web4 |
If you have already run getlogs, the procedure for creating output like the above example can most simply be done by just typing:
pwlog logfilename > newlogfilename
However, you can get pwlog to call getlogs for you. In fact, if you specify no input log file name, that automatically happens. In other words, instead of typing
getlogs > logfilename
pwlog logfilename > newlogfilename
you can instead just type
pwlog > newlogfilename
Most of the more useful features of pwlog are only available via its option switches. A complete list and a short help message can be obtained by typing
pwlog -h
Noteable among these options are:
WWW 182 1998:09:01:01:24:58 (u)/Skate 209.240.199.53 301 WWW 203 1998:09:01:03:25:47 (u)/Skate 204.244.93.232 301 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt 166.84.197.198 200 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi 24.112.48.33 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi 24.112.48.33 302 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF 24.112.48.33 200 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt 204.123.9.47 200 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 195.133.10.89 200 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 195.133.10.89 200 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 195.133.10.89 200 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 |
WWW 182 1998:09:01:01:24:58 (u)/Skate proxy-226.iap.bryant.webtv.net 301 WWW 203 1998:09:01:03:25:47 (u)/Skate kam1d40.dial.uniserve.ca 301 FTP 9600 1998:09:01:04:16:05 (f)/pub/incoming/newfile.txt rbs.dialup.access.net 200 WWW 15130 1998:09:01:05:23:41 (c)/blur/index.cgi pc-403.on.rogers.wave.ca 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 43 1998:09:01:05:23:43 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 191 1998:09:01:05:23:44 (c)/blur/banner.cgi pc-403.on.rogers.wave.ca 302 WWW 43 1998:09:01:05:23:44 (c)/blur/gfx/spacer.GIF pc-403.on.rogers.wave.ca 200 WWW 162 1998:09:01:06:04:56 (c)/skatecity/robots.txt vscooter.av.pa-x.dec.com 200 WWW 4301 1998:09:01:06:18:17 (c)/blur/article.cgi 193.13.129.79 200 WWW 6665 1998:09:01:07:11:49 (c)/skatecity/ah/ 89.10.133.195.dynamic.dialup.ru 200 WWW 1343 1998:09:01:07:11:52 (c)/skatecity/ah/gfx/uchronia.sml.GIF 89.10.133.195.dynamic.dialup.ru 200 WWW 911 1998:09:01:07:11:54 (c)/skatecity/ah/gfx/intro.GIF 89.10.133.195.dynamic.dialup.ru 200 WWW 9723 1998:09:01:08:08:51 (c)/blur/resources/reviews.cgi 155.78.124.187 200 |
NOTE. This section is obsolete. If you need you logs in Common Log Format, use getclogs to obtain them.
It may be that you have obtained some handy-dandy third-party stats program which you'd like to use, but you can't because the output from getlogs and the above described output from pwlog aren't in "common log format", which most such programs require. If so, there are two additional pwlog options which you will find of use:
209.240.199.53 - - [01/Sep/1998:01:24:58 -0500] "HEAD (u)/Skate HTTP/1.0" 301 182 204.244.93.232 - - [01/Sep/1998:03:25:47 -0500] "HEAD (u)/Skate HTTP/1.0" 301 203 166.84.197.198 - - [01/Sep/1998:04:16:05 -0500] "FTP (f)/pub/incoming/newfile.txt FTP/X.X" 200 9600 24.112.48.33 - - [01/Sep/1998:05:23:41 -0500] "GET (c)/blur/index.cgi HTTP/1.0" 200 15130 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "HEAD (c)/blur/banner.cgi HTTP/1.0" 302 191 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 204.123.9.47 - - [01/Sep/1998:06:04:56 -0500] "GET (c)/skatecity/robots.txt HTTP/1.0" 200 162 193.13.129.79 - - [01/Sep/1998:06:18:17 -0500] "GET (c)/blur/article.cgi HTTP/1.0" 200 4301 195.133.10.89 - - [01/Sep/1998:07:11:49 -0500] "GET (c)/skatecity/ah/ HTTP/1.0" 200 6665 195.133.10.89 - - [01/Sep/1998:07:11:52 -0500] "GET (c)/skatecity/ah/gfx/uchronia.sml.GIF HTTP/1.0" 200 1343 195.133.10.89 - - [01/Sep/1998:07:11:54 -0500] "GET (c)/skatecity/ah/gfx/intro.GIF HTTP/1.0" 200 911 155.78.124.187 - - [01/Sep/1998:08:08:51 -0500] "GET (c)/blur/resources/reviews.cgi HTTP/1.0" 200 9723 |
209.240.199.53 - - [01/Sep/1998:01:24:58 -0500] "HEAD (u)/Skate HTTP/1.0" 301 182 "http://www.xs4all.nl:80/~lowlevel/skate/linx.html" "UNKNOWN" 204.244.93.232 - - [01/Sep/1998:03:25:47 -0500] "HEAD (u)/Skate HTTP/1.0" 301 203 "-" "UNKNOWN" 166.84.197.198 - - [01/Sep/1998:04:16:05 -0500] "FTP (f)/pub/incoming/newfile.txt FTP/X.X" 200 9600 "-" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:41 -0500] "GET (c)/blur/index.cgi HTTP/1.0" 200 15130 "http://www.yahoo.ca/Recreation/Sports/Skating/Inline_Skating/Magazines/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:43 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "HEAD (c)/blur/banner.cgi HTTP/1.0" 302 191 "http://www.skating.com/" "UNKNOWN" 24.112.48.33 - - [01/Sep/1998:05:23:44 -0500] "GET (c)/blur/gfx/spacer.GIF HTTP/1.0" 200 43 "http://www.skating.com/" "UNKNOWN" 204.123.9.47 - - [01/Sep/1998:06:04:56 -0500] "GET (c)/skatecity/robots.txt HTTP/1.0" 200 162 "-" "UNKNOWN" 193.13.129.79 - - [01/Sep/1998:06:18:17 -0500] "GET (c)/blur/article.cgi HTTP/1.0" 200 4301 "http://altavista.digital.com/cgi-bin/query?pg=q&kl=XX&q=%22Salomon+inline%22" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:49 -0500] "GET (c)/skatecity/ah/ HTTP/1.0" 200 6665 "http://www.yahoo.com/Arts/Humanities/Literature/Genres/" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:52 -0500] "GET (c)/skatecity/ah/gfx/uchronia.sml.GIF HTTP/1.0" 200 1343 "http://www.skatecity.com/ah/" "UNKNOWN" 195.133.10.89 - - [01/Sep/1998:07:11:54 -0500] "GET (c)/skatecity/ah/gfx/intro.GIF HTTP/1.0" 200 911 "http://www.skatecity.com/ah/" "UNKNOWN" 155.78.124.187 - - [01/Sep/1998:08:08:51 -0500] "GET (c)/blur/resources/reviews.cgi HTTP/1.0" 200 9723 "http://www.hotbot.com/?SW=web&SM=MC&MT=Rollerblade%2bReviews&DC=10&DE=2&RG=NA&_v=2" "UNKNOWN" |
One warning about this conversion process: Besides the non-availability of user agent information, getlogs also does not include the request method (GET, POST or HEAD) and so pwlog will make an educated guess when converting to common log format. Basically, it assumes that all web requests are GETs unless there is a return code in the 300s. In that case, pwlog decided that it's a HEAD. It will not assign the POST method to any entry in the log, which is of course quite wrong if you have a lot of CGI scripts running. This should not be a problem when you are running stats, but we include the warning here just so that you know.
Also, pwstat does not recognize Common Log format.
The way that the -r option in the pwlog and pwstat programs determines the machine names corresponding to the IP numbers in the weblogs is to do a host lookup for each number. However, since most people who hit a good website hit it more than once, doing a lookup for every single entry in a log file would be needlessly repetitious. Thus, the pwlog and pwstat programs maintain a file of matching IP numbers and hostnames, and they check in this file for a match before actually executing an IP lookup. At present, this is done for every single user who executes pwlog and pwstat; there is no Panix-wide common file which all pwlog and pwstat users can access. The name of the hostfile is .pwhosts, and you will find your copy in your home (login) directory.
The process of converting IP numbers to hostnames can be incredibly slow, whether it occurs in pwlog or in pwstat. In fact, it can be downright maddening if you have a popular site. Lookup time for just a couple days worth of hits on my own pages can take over an hour. clay once reported that it took about 10 hours to resolve the new hostnames seen during a week of traffic to his site, and that was back in late 1995, when web traffic was a fraction of what it is now.
Persons with popular sites will also find that their .pwhosts file can get pretty large. Mine, for example, was up to 475 kb by the summer of 1995, after only a few months of traffic to my pages. If you have a popular set of pages, it wouldn't be too long before your copy of .pwhosts was into the megabytes. At that point, it's time to ask if you really need to know the names of all the machines visiting your site.
All this said, you may understand why your time is better spent (and less computing time and disk space wasted) if you do not invoke the -r option in either pwlog or pwstat