panix.user.html FAQ
Logs and Analysis
getlogs, written at Panix, is a simple log-extracting utility
for POLF (Panix Oldstyle Log Format) weblogs.
The original author was Liz Reynolds. The basic function of the script
is to print a listing of traffic on the user's webpages during
the current calendar month.
The following is a short
example of some actual getlogs output.
Notice that output mixes together data pertaining to HTML and text pages,
GIF graphics, CGI calls, and personal, corporate and FTP pages.
3819 WWW 182 1998:09:01:01:24:58 /export/httpd/htdocs/userdirs/rbs/Skate 209.240.199.53 301 http://www.xs4all.nl:80/~lowlevel/skate/linx.html www1
3819 WWW 203 1998:09:01:03:25:47 /export/httpd/htdocs/userdirs/rbs/Skate 204.244.93.232 301 - www2
3819 FTP 9600 1998:09:01:04:16:05 /pub/incoming/newfile.txt 166.84.197.198 200 - ftp
3819 WWW 15130 1998:09:01:05:23:41 /export/httpd/htdocs/rbs/blur/index.cgi 24.112.48.33 200 http://www.yahoo.ca/Recreation/Sports/Skating/Inline_Skating/Magazines/ web4
3819 WWW 43 1998:09:01:05:23:43 /export/httpd/htdocs/rbs/blur/gfx/spacer.GIF 24.112.48.33 200 http://www.skating.com/ web4
3819 WWW 43 1998:09:01:05:23:43 /export/httpd/htdocs/rbs/blur/gfx/spacer.GIF 24.112.48.33 200 http://www.skating.com/ web4
3819 WWW 191 1998:09:01:05:23:44 /export/httpd/htdocs/rbs/blur/banner.cgi 24.112.48.33 302 http://www.skating.com/ web4
3819 WWW 43 1998:09:01:05:23:44 /export/httpd/htdocs/rbs/blur/gfx/spacer.GIF 24.112.48.33 200 http://www.skating.com/ web4
3819 WWW 162 1998:09:01:06:04:56 /export/httpd/htdocs/rbs/skatecity/robots.txt 204.123.9.47 200 - web4
3819 WWW 4301 1998:09:01:06:18:17 /export/httpd/htdocs/rbs/blur/article.cgi 193.13.129.79 200 http://altavista.digital.com/cgi-bin/query?pg=q&kl=XX&q=%22Salomon+inline%22 web4
3819 WWW 6665 1998:09:01:07:11:49 /export/httpd/htdocs/rbs/skatecity/ah/index.html 195.133.10.89 200 http://www.yahoo.com/Arts/Humanities/Literature/Genres/ web4
3819 WWW 1343 1998:09:01:07:11:52 /export/httpd/htdocs/rbs/skatecity/ah/gfx/uchronia.sml.GIF 195.133.10.89 200 http://www.skatecity.com/ah/ web4
3819 WWW 911 1998:09:01:07:11:54 /export/httpd/htdocs/rbs/skatecity/ah/gfx/intro.GIF 195.133.10.89 200 http://www.skatecity.com/ah/ web4
3819 WWW 9723 1998:09:01:08:08:51 /export/httpd/htdocs/rbs/blur/resources/reviews.cgi 155.78.124.187 200 http://www.hotbot.com/?SW=web&SM=MC&MT=Rollerblade%2bReviews&DC=10&DE=2&RG=NA&_v=2 web4
|
What's being reported here?
- ownerid:
- The userid of the owner of the pages listed in the report,
i.e., the person who just executed getlogs.
In this example, the userid is 3819, which corresponds to the login "rbs".
- usage:
- Method of downloading. In the example above, note that an "FTP"
entry has snuck into the list. This will happen if you have a
corporate account and have arranged for anonymous ftp service.
- bytes:
- Number of bytes transferred. Note that if getlogs was run
without the -a flag,
the output will not reflect the bytes transferred by way of the
Squids.
- timestamp:
- The time at which the access took place, in NYC local time.
- filename:
- Name of file downloaded. This is the path from the HTML document
root directory to the location of the file. The entry
containing "/htdocs/userdirs/rbs/Skate/"
indicates a hit on the page
http://www.panix.com/~rbs/Skate/. Similarly,
"/htdocs/rbs/" indicates a hit on user rbs's corporate webspace.
- host:
- The ID of the machine which requested the webpage. This information
tells you only the machine; you cannot find out, for example,
the e-mail address of the person who made the request.
You'll note that the machine ID is a sequence of numbers, what is
referred to as an IP number. In most cases, the IP number can be
translated into a hostname; you can use
pwlog with the -r flag
to have this translation performed on the output from getlogs.
At some organizations especially concerned about security, the IP number
(and the corresponding hostname)
may refer to an intermediary gatekeeper computer
rather than the actual computer which the requesting person is using
As an example, the IP number 199.181.175.201 would mean a hit by
the machine gatekeeper.nytimes.com; since that is a firewall machine,
the actual requester could be anywhere in the nytimes.com domain.
- status:
- A code which reflects the completion status of handling the file request,
with 200 meaning no error - i.e. the document was served properly.
Other codes which you may see are:
- 301/302 Redirected Request:
This most often
happens when someone requests a directory index file, but hasn't
completely specified the URL. The server sends back the correct URL
and the browser then makes the correct request for that. In the example
above, this happens when a directory index request is missing the
trailing slash. Another possible cause is a CGI script which
returns a redirection URL ("Location: http://www.foo.com\n\n")
instead of HTML or other content.
- 304 Not Modified Request:
Browser asked whether the file had been changed since a previous
request, and finding that it hadn't, did not download another copy.
- 400 Bad Requests
- 401 Unauthorized Requests
- 403 Forbidden Requests
- 404 Not Found Requests
- 4xx Client Errors: There was an error serving this request, the
error resulting from something on the client end. Client errors outside
the 400-404 range are rare.
- 5xx Server Errors: There was an error serving this request, the
error resulting from something on the server end. The most likely cause of
such an error is a buggy CGI script, but there are a number of other
possible reasons.
- referrer:
- The URL of the webpage which pointed the browser to the given file on your
site. Note that this datum is often not available, and sometimes may
be incorrect. The latter case is most likely to arise when someone
is viewing a page, and then manually types your URL in. This may result
in the page they were looking at before being logged as the referring
URL, even if it contains no links to the page on your site.
- server:
- The name of the actual Panix machine that served the request. In
general, for personal websites this name will be www1 or www2,
and for corporate websites it should remain mostly constant, unless
Panix staff has been trying to even the load on the webservers
or for other technical reasons.
The last line of any getlogs output is the total number of bytes
contained in the weblog output you received. This is useful if you want
to later invoke getlogs with the -s flag, to make it
continue where the previous one left off.
getlogs output is sent to "standard output", i.e., your
terminal screen. To request that it be sent to a file, you need to use the
redirection operator, ">"; e.g.,
getlogs > logfilename
getlogs, without any command-line parameters, returns data only
for the current month - i.e. all the transfers that occurred between
just after midnight of the first day of the current month and the time when the
most recent hourly
weblog-processing occurred. (It does not return just a report
of all traffic since the last time you ran getlogs.) The -o
flag causes getlogs to return data for the previous month.
Options
The following options are available with getlogs:
- getlogs -o
- Retrieve the logs for the preceding reporting period; i.e., last month.
Very useful at the beginning of the month when the logs have just been
reset but you need to get a report including the last day or two of
the preceding month.
- getlogs -c
- Retrieve the logs only from the web accelerators
(Squids), rather than the "main"
web servers.
- getlogs -a
- Retrieve logs from the web accelerators
(Squids) and the web servers. This
option is useful for obtaining the count of bytes transfered
and verifying it against the billing records, but will not give
an accurate hit count.
- getlogs -s N
- Omits the first N characters of the weblogs.
Useful if you only want to look at data which have accumulated
since the last time you ran getlogs.
Deficiencies:
Note that getlogs only returns IP numbers (not hostnames)
of the requestors' machines. You can use pwlog
with the -r flag to attempt to resolve those IP numbers into
hostnames.
Last modified:
Thursday, 02-Sep-2004 21:02:27 EDT
rbs, askanas