|
The purpose of pwlog3 is to enhance the readability and manageability of XCLF weblogs, as rendered by the getclogs program,
When invoked without any command-line options, pwlog3 basically copies the input to the output, with possibly some minimal uniformization of the URL data. The usefulness of pwlog3 comes from the transformations it performs in response to the various command-line options.
If you have already run getclogs and saved the output to a file, you can use this file as input to pwlog3 by typing:
pwlog3 logfilename > newlogfilename
However, if you do not specify an input log file name, pwlog3 will automatically call getclogs for you. In other words, instead of typing
getclogs > logfilename
pwlog3 logfilename > newlogfilename
you can simply type
pwlog3 > newlogfilename
The usefulness of pwlog3 comes from the features available via its option switches. A complete list of options and a short help message can be obtained by typing
pwlog3 -h
The options are as follows:
Invoking this option on the example XCLF log would result in the following output:
ip68-2-201-90.ph.ph.cox.net - - [01/Aug/2004:00:29:43 -0400] "GET /www.speedskating.com/wl/show.rp HTTP/1.1" 200 17617 "http://www.google.com/search?hl=en&ie=UTF-8&q=speed+skating+in+phoenix&btnG=Google+Search" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707" ip68-2-201-90.ph.ph.cox.net - - [01/Aug/2004:00:29:43 -0400] "GET /www.speedskating.com/css/doz.css HTTP/1.1" 200 3553 "http://www.speedskating.com/wl/show.rp?id=inline_clubs/united_states" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707" ip68-2-201-90.ph.ph.cox.net - - [01/Aug/2004:00:29:43 -0400] "GET /www.speedskating.com/gfx/logo/ssk030130a.gif HTTP/1.1" 200 830 "http://www.speedskating.com/wl/show.rp?id=inline_clubs/united_states" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707" yahoobb219018208079.bbtec.net - guest [01/Aug/2004:00:29:46 -0400] "GET /www.skatecity.com/ HTTP/1.1" 200 1148 "http://www.google.com/search?q=SKATE&btnG=Google+%E6%A4%9C%E7%B4%A2&hl=ja&ie=UTF-8&c2coff=1" "Opera/7.23 (Windows 98; U) [ja]" yahoobb219018208079.bbtec.net - guest [01/Aug/2004:00:29:46 -0400] "GET /www.skatecity.com/gfx/rhs_infobahn.gif HTTP/1.1" 200 739 "http://www.skatecity.com/" "Opera/7.23 (Windows 98; U) [ja]" yahoobb219018208079.bbtec.net - guest [01/Aug/2004:00:29:46 -0400] "GET /www.skatecity.com/gfx/rhs_speedskating.gif HTTP/1.1" 200 666 "http://www.skatecity.com/" "Opera/7.23 (Windows 98; U) [ja]" |
The way that the -r option in the pwlog and pwstat programs determines the machine names corresponding to the IP numbers in the weblogs is to do a host lookup for each number. However, since most people who hit a good website hit it more than once, doing a lookup for every single entry in a log file would be needlessly repetitious. Thus, the pwlog and pwstat programs maintain a file of matching IP numbers and hostnames, and they check in this file for a match before actually executing an IP lookup. At present, this is done for every single user who executes pwlog and pwstat; there is no Panix-wide common file which all pwlog and pwstat users can access. The name of the hostfile is .pwhosts, and you will find your copy in your home (login) directory.
The process of converting IP numbers to hostnames can be incredibly slow, whether it occurs in pwlog or in pwstat. In fact, it can be downright maddening if you have a popular site. Lookup time for just a couple days worth of hits on my own pages can take over an hour. clay once reported that it took about 10 hours to resolve the new hostnames seen during a week of traffic to his site, and that was back in late 1995, when web traffic was a fraction of what it is now.
Persons with popular sites will also find that their .pwhosts file can get pretty large. Mine, for example, was up to 475 kb by the summer of 1995, after only a few months of traffic to my pages. If you have a popular set of pages, it wouldn't be too long before your copy of .pwhosts was into the megabytes. At that point, it's time to ask if you really need to know the names of all the machines visiting your site.
All this said, you may understand why your time is better spent (and less computing time and disk space wasted) if you do not invoke the -r option in either pwlog or pwstat