Danny's Weblog
Introduction
This weblog exists for the following reasons:
-
Vanity
-
Experimenting with website design features such as optimizing Google searches
-
It provides an annotated system of favorites links that I can access from
anywhere
I hope you find it useful. You should be aware of the following hints
on navigation:
-
Postings are presented in REVERSE chronological order
-
For a clickable list of topics, refer to the "Site map" in the left-hand
navigation pane
-
If you click an upper-level topic, the system will *also* display lower-level
topics, up to some limit
-
Changes in my site setup are listed under the "Chrome" topic.
I don't want people to download my entire site or hit the site frequently
for any other reason. Do not set a newsreader, or any other robot,
to hit the site more than once per day. Please consider the date-range
links and subject links in the sidebar: you can probably get all you
want in a single hit. If you are blocked you will probably not be
re-enabled.
I have now put in various kludges so that the screen
display and printed output are reasonably
clean in both Firefox and IE. I decided to make the banner image
centered as that continues to look OK when the body text is
resized, but I could not find a way to do that in .css and had
to put in an HTML align=center; oh well.
Now that the print output via css is so good I will remove
the link to create a printed version of each page, although I
will prbably leave the code undisturbed.
I am also making good progress on the download limit feature.
I had never been quite satisfied with some aspects of the layout so over
the last couple of days I have dinked with it quite a lot using css.
Most elements are now controlled only by css, although I have not tried
to replace the basic table structure.
Although it is still not perfect, the pages now seem to respond much
better to resizing (except that for some reason under IE6 changing the
text size seems to have no effect – wtf?), given that I do not like
text to display with too many characters per line. For that reason, if
you try to increase page width, you just get more space on the right
margin.
Additionally, I have set up a separate print css which makes a big
improvement in the print output. As well as getting rid of the sidebar,
it again responds better to resizing, at least under Firefox.
Also, it finally implements a feature I have always wanted: while
the screen display shows a nice short version of a link (so that
the text display is not messed up), the printed version prints the
full url (because it does no good to hover the mouse over a printed
page). At least, it prints the full url up to the page width.
– Oops – it doesn't work in IE; the URL is completely missing. Oh
well; I can't address that tonight.
Several features are probably not implemented very cleanly yet. I
really need to re-read the Blosxom docs on how to implement plugins,
as I have scattered some of the code needed for the recent
modifications all over the place (which makes it hard to install
updated plugins or indeed Blosxom itself).
I heard recently that Google thinks that all pages with the same
title must be the same page. Perhaps this explains why Google
only shows two hits for my site. So I resolved to make the titles
vary.
Now, if you view a single posting, the title is the path to the
posting, plus the title of that posting.
If you view a folder, the title is just the path. It's pointless
or misleading to title the whole string of postings by the title
of a single one. Anyway, I tell Google not to log those pages.
If you view a date range, the title is blank – oh well.
Also, I have made considerable progress towards a much less lame
download limit scheme: ie, I can at least track downloads on the
fly, and will put in the limit when I can figure out something
that makes sense.
I made some more improvements. Last I checked, my main page shows
0 errors and my page for the whole of 2006 shows 4 errors. I
probably won't do any more on this, unless I miraculously figure
out a clean fix for my quote-handling problems. (If you see
any pages with obvious quote problems please let me know.)
I fixed a "div inside p" problem with horizontal rules, and also
cleaned up some display formatting which didn't actually work by
converting to css.
Additionally, I checked pretty much the whole site for mistakes in
quote handling, and fixed the original files (backdating them
with touch -t so that they continue to show up on the same date)
so that they don't trigger errors in my lame quote handling code.
Additionally, I had made a mistake yesterday in formatting my
story headings just using styles. Doing so means that the pages
could not be parsed for semantic content (if anyone cares) so
I went back to using an h3 tag.
...and I just fixed the color of links, which I noticed had gone
to black. I wonder when that started? I put it back to blue in
the .css.
I've been intending to fix some of the bugs shown by the W3C validator:
validator.w3.org
[http://validator.w3.org/]
for *years*, and today I finally had a hack attack.
The number of errors detected has fallen from over a hundred on some
pages down to a handful. The following lists what I figured out so
far:
1. Most of my effort was on handling the quotes problem. For a long
time my quotes were not properly nested either inside or around
paragraph tags. I've fixed most appearances of this bug but the
fix is very lame and several conditions still cause it.
2. Additionally, I was just not aware that some tags are not allowed
to be inside other tags. For instance, div inside p. The start of the
div block causes an implicit close of the p block and then the
dangling close-p causes an error. Maybe this is why a lot of people
never close their p blocks.
Incidentally, this bug also afflicts code which I did not write. For
instance, the plugin which changes a bunch of hyphens to a horizontal
rule does so by inserting a div block, but that block gets wrapped in
paragraph tags like everything else.
3. One of my Blosxom plugins (categories) was buggy and was inserting
unnecessary ul tags. I installed a new version.
4. While initially setting up the templates I had thrown in a lot
of tag parameters which W3C, not to mention the browser, does not
like.
5. Also, the story template had numerous div/p problems.
The appearance of some pages has changed slightly, especially around
quoted text.
In addition to the above, I have changed the title tag so that it
varies from page to page. This apparently helps Google to realize
that your website has more than one page...
I have also changed my shell (setting "ignoreeof") so that I can
use ˆD (end of file) to set the end of expected parameters to a
CGI program. (That took a surprising length of time to figure out;
the tcsh does not seem to respond to ˆD according to spec but
actually swallows it unless you are at an empty prompt. The workaround
if you don't want to set ignoreeof is to put in an empty string
on the command line.)
Trackbacks are a feature of most blogging software; when someone puts
a link to your site on his site, he can click a trackback link
that you provide on that page which somewhat automagically informs
your software.
Unfortunately spammers like to use trackbacks as a way to make you
host a link to their site, thus increasing their Google PageRank.
Although I had never implemented the feature of automatically adding
trackbacks to my blog pages, I have been noticing a huge number of
hits to my trackback pages (huge relative at least to my pitiful
number of real hits). I assumed it would die down when spammers
realized that not only does my software not publish trackbacks, but
anyway my site has zero PageRank and therefore is useless to
them, but then I noticed that the attempted comment spam to my
writeback pages was random text; in other words, for whatever reason
(and the intelligence agencies have a good motive) someone is
just attempting to destroy the trackback feature.
Since as far as I can tell I have never received a *single* valid
trackback, I have lazily decided to just disable them and I see
no reason to ever re-enable them.
Wikipedia article on trackbacks:
en.wikipedia.org
[http://en.wikipedia.org/wiki/Trackback]
Discussion of trackback spam. Some of the posters make the point that
the spammer may insert random text just to check whether the site
is running a moderation filter, but I like the paranoid conspiracy
idea much better:
photomatt.net
[http://photomatt.net/2005/01/05/trackback-spam/]
A long time ago I was experimenting to try and fix long lines by
setting the size of table elements. (It turns out browsers don't
do what I wanted).
Anyway, I absent-midedly left a "width" spec in that sometimes caused
problems. Fixed – I hope that doesn't break anything.
Incidentally, I also changed the top banner as according to my logs
people were clicking on it too much. Now it doesn't look so
clickable.
I'm happy to be getting a bunch of hits to my Thai-language folder,
but when I checked what people were seeing I realized there was a
layout problem: a couple of the files had wide lines, and the browser
dutifully forces a wide screen display for the entire page.
I fixed the guilty pages and I hope the people who saw the wide
version weren't too put off.
I've had a link that says it produces all the articles for the
current month for some time.
A couple of days ago I noticed it was actually producing the 50 most
recent articles. I've fixed that now (probably).
Note that "current month" means "dated this calendar month" not
"dated over the last 30 days".
Up till now I have displayed the full URL for every link.
This tends to mess up the screen display when a link is
very long: because it has no whitespace, the browser
refuses to wrap it and forces a very wide text column.
I did this for two reasons:
1. It allows you to read the link directly off a printed page
2. I was not sure how to implement sensing whether the page was
being displayed in print mode
3. Even when you are viewing the page onscreen, it can be
nice to see all the links without having to mouse over them
(eg Lynx).
I finally decided to change the display because the excessively
wide text column was really bugging me and probably very few
people ever print pages out. I would still like to implement
a feature where it would sense print mode, butr actually I'm not sure
what to do then even if I can sense it. Really I would like to
implement print mode in css anyway.
Someone who has evidently not checked my Google PageRank decided to
post some spam. Little did he know I would immediately detect it.
He hit the following writeback pages, with the comments below.
Oh well. I suppose I'd better disable writebacks for a while.
Notes:
1. Clearly his comments are completely generic. They are also partly
nonsensical and ungrammatical.
2. They include various links – the whole point of the spam. My
system presents links from writebacks as plain text anyway, so having
these nonfunctional links on my pages wouldn't help him even if
my PageRank were ten times better than it is.
3. The really interesting thing, as you can see, is that he links to
*multiple websites* and posts from *multiple ips*. At a guess, he is
a script kiddie who has used a vulnerability to install his software
on those websites, and has used a vulnerability to take over
multiple user machines. However he is actually clueless about the internet,
as his attempt to use my absolutely negligible SiteRank clearly
shows!
4. His pattern of posting is a little strange. I don't understand why
he hits some pages over and over again. I also don't understand why
the user agent is different each time. Maybe his software automatically
flips it for each new posting, in order to make the pattern harder
to see.
5. Indeed, on a more heavily trafficked site (a site with *any* traffic)
his attempted defacement would have been hard to spot.
6. *Do not follow* any of his links. Although this type of spam is
usually used to increase the PageRank of the linked sites, he could
well have installed exploits on those pages which will turn your browser
into his bitch.
Asia/Cambodia/Miscellaneous/visitor01.wb
Asia/Cambodia/Miscellaneous/wetbathrooms01.wb
Asia/Cambodia/Miscellaneous/ppareas01.wb
Asia/Cambodia/Miscellaneous/oldvc01.wb
Asia/Cambodia/Miscellaneous/capital02.wb
name: Jordan Chapman
skys.jp/blog/archives/200504/06-1228.php
title: Jordan Chapman
comment: I really liked your comments here. I hope you're going to
update your s ite soon. bring heavy cream just to a boil:
www.snowhill.org/weblog/Jason /000940.html , I finished the 6th ball
excerpt:
blog_name:
name: Christian Jones
www.cosmicbuddha.com/blog/archives/ 001169.html
title: Christian Jones
comment: Excellent! I enjoyed reading your material. hours drive
from where: www.hookt-up.com/wordpress/?p=567 , Small brain blog
excerpt:
blog_name:
name: austin adams
www.wnyprogressreport.wnymedia.net/ ?p=2
title: benjamin armstrong
comment: very interesting! i liked it! amazing 3d effect:
skys.jp/blog/ar chives/200504/06-1228.php , port abuayar
excerpt:
blog_name:
name: Adam Baumann
www.allucher.com/sato_blog/archives/2005/04/ post_110.html
title: Sean Cole
comment: It's been a long time since I so enjoyed reading posts
in the net. Two thumbs up! So without further delays:
www.allucher.com/sato_blog/archives /2005/04/post_110.html ,
Small brain blog
excerpt:
blog_name:
name: Zachary Jones
www.hookt-up.com/wordpress/ ?p=567
title: Christian Adams
comment: Just letting you know - your site is fantastic! bring
heavy cream just to a boil: mooshoopork.net/ pork/index.php?p=154 ,
So without further dela ys
excerpt:
blog_name:
2005-10-08| 02:04:00| 222.107.19.143| Mozilla/4.0
(compatible; MSIE 5.01; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/visitor01.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 02:04:13| 222.107.19.143| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; Hotbar 3.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/visitor01.writeback|
2005-10-08| 02:06:59| 219.250.217.228| Mozilla/4.0
(compatible; MSIE 5.5; Windows 98)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/wetbathrooms01.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 02:07:03| 219.250.217.228| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/wetbathrooms01.writeback|
2005-10-08| 02:17:29| 219.93.174.106| Mozilla/4.0
(compatible; MSIE 6.0; Windows 98)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/ppareas01.writeback|
2005-10-08| 02:17:41| 198.20.55.71| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/ppareas01.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 02:17:47| 198.20.55.71| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/ppareas01.writeback|
2005-10-08| 02:35:03| 216.75.82.242| Mozilla/4.0
(compatible; MSIE 6.0; Windows 98)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/oldvc01.writeback|
2005-10-08| 02:35:06| 210.0.200.2| Mozilla/4.0
(compatible; MSIE 5.5; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/oldvc01.writeback|
www.pani x.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 02:35:08| 210.0.200.2| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/oldvc01.writeback|
2005-10-08| 02:42:46| 80.58.4.107| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/capital02.writeback|
2005-10-08| 02:43:14| 204.196.142.41| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/capital02.writeback|
2005-10-08| 02:43:24| 80.58.11.42| Mozilla/4.0
(compatible; MSIE 5.01; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/capital02.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 02:43:57| 211.116.211.86| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/capital02.writeback|
2005-10-08| 02:44:01| 203.83.75.26| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/capital02.writeback|
2005-10-08| 03:04:33| 222.119.57.208| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/barsigns02.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 03:04:36| 222.119.57.208| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/barsigns02.writeba ck|
2005-10-08| 03:12:12| 162.40.91.34| Mozilla/4.0
(compatible; MSIE 6.0; Windows 98)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
2005-10-08| 03:12:19| 220.70.4.93| Mozilla/4.0
(compatible; MSIE 6.0; Windows 98)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
2005-10-08| 03:12:33| 216.75.82.242| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/
2005-10-08| 03:13:21| 216.75.82.242| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)|
/%7Edannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
2005-10-08| 03:14:26| 220.73.107.241| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
2005-10-08| 03:15:19| 207.248.240.118| Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)|
/~dannyw/weblog/Asia/Cambodia/Miscellaneous/bags01.writeback|
A few weeks ago I hacked my main blosxom.cgi file so that the
rss story template code included a call to the "foreshortened"
plugin, instead of the main $body variable – so that the .rss
"description" field contained just the start of the story, not
the entire body.
Today I used wget from the command line, and the full body was
still there!
Musing about time travel and aliens, I then realized that if you
get to the .../index.rss page from the main .html page,
the .html is actually a link to "...//index.rss" (note the
two forward slashes). Normally my .htaccess file intercepts a
call to the main page and transfers it to a special "short"
version (for the latest 15 stories, instead of 50),
but the regex does *not* match *two* slashes. So when
I tested .../index.rss from the main page, it always went to
the ordinary blosxom.cgi, which was fixed, not the "short"
version, which was unchanged...
Anyhow I imagine most people will have been receiving the long
(full-text) version of my .rss up till now – but now they will
get the nice short version (just 5k).
A couple of days ago I absent-mindedly deleted my links page:
www.panix.com
[http://www.panix.com/~dannyw/weblog/nolist/links01.html]
In my filesystem it's actually called links01.txt, and I had
been editing a *separate* file *also* called links01.txt.
Bad Danny.
Panix has a file server with the "snapshot" feature, so I when I
realized the next day what had happened I thought I could find
it easily – but I couldn't find it!
When I begged Panix for help, they promptly explained that the
snapshot system stores the symbolic link to public_html as *a
link to the current version of public_html*, not as a link to the
snapshot of public_html! They were able to cd to the right directory
(somehow... as I think about it I'm less and less sure how that
actually works...) and get me back the file.
It reminds me of my old web host where if you were in public_html
and went up a directory, you landed in the users directory of
the webserver, not your own home directory. Confusion ensued.
Since I set up this site, I have left the format of my rss feed as
the default provided by Blosxom: ie the contents of the "description"
field were *the entire contents of the posting*. To me, this never
seemed like the way rss is really supposed to work, but I let it go
(I wasn't confident that I knew what people *wanted* as a feed,
as for instance I have actually never set up an rss *client*, partly
because I think polling is so stupid).
When Beth complained about getting error messages from my RSS a couple
of days ago, I decided to clean up the RSS output. I installed the
"foreshortened" plugin, which creates a variable containing the first
sentence of the story, and put in a call to that plugin in blosxom.cgi
(not blosxom-short.cgi, which handles requests for the blog
homepage only, because it does not handle *.rss requests).
It now seems to be working – although I did have to munge the
foreshortened plugin slightly, because I use a plugin which outputs
some HTML character entities in the stories, so I just added
a regex which deleted them.
I am hoping the cleaned-up XML will fix Beth's problem. I am also
hoping it will save bandwidth both for me and for my users (both
of you).
Incidentally, I'm guessing that most people who grabbed my RSS
feed had *no idea* they actually *already* held the full text
of my articles. Their clients probably trim the text in each
description field *anyway*. They may have noticed that the
feed could be as much as 100 kB, and just figured "oh well,
XML is a bloated format" – that's what I always say.
I have now put in the html for the link to my privacy policy, so
that it should show up on all my .html pages. I still haven't
checked it under IE, but we'll see...
I still do not really understand how a "privacy policy" is supposed
to work, except that it is intended to make it harder for the
average user to implement effective security. Still, I went over to
http://www.p3pwiz.com
Now I have installed the files they generated at (hopefully) the
right places:
Human readable policy:
www.panix.com
[http://www.panix.com/~dannyw/privacy.html]
Machine-readable policy:
www.panix.com
[http://www.panix.com/~dannyw/weblog/w3c/p3p.xml]
I think I probably have to add a header to each page, along these
lines, but have not done so yet:
<link rel="P3Pv1" href="http://www.panix.com/~dannyw/weblog//w3c/p3p.xml">
The validator said there were no format errors:
www.w3.org
[http://www.w3.org/P3P/validator.html]
I haven't actually tried it with IE yet. I have a nasty feeling that
this privacy policy stuff (like robots.txt) was intended to cover
entire domains, not picayune user webpages like mine... but we'll
see. Hopefully that link in the header will get people where they
need to go.
Since I am not in fact a huge e-commerce site which sells all my users'
info to the highest bidder, I don't think you have to worry too much. On
the other hand, in order to cover my very minor efforts to play with
cookies and track users I had to assent to privacy clauses which basically
covered me for selling everything up to the spleen of your
unborn child. It confirms
my impression that the privacy policy scheme was intended to confuse
and stupefy regular users – it certainly did me.
Incidentally, when I just rechecked the cookies stored in my
browser, I see that "p3pprivacy.com", which sent me to p3pwiz,
actually set a bunch of cookies with *no hostname set*. I wonder
if they cover that in *their* privacy policy? Bwahaha.
I had left the Blosxom cookie module enabled in the plugins directory
although it did not seem to be doing anything.
Today I realized that if I navigated around the website something was
causing the browser to store cookies with the name of the current
directory and a strange hex hash as data. After suspecting my own
cookie generating code for a long time, it occurred to me to disable
the cookies module, and the strange excess cookies stopped.
I might have left it in, except that those strange cookies would
*never go away*. A user who extensively browsed the site might well
exceed the maximum number of cookies.
Additionally – not that this was very important – the "path" was
set to "/", meaning any *other* Panix user website could access my
cookies off the user's browser – suboptimal.
In an attempt to figure out how cookies work I have just gone ahead and
added them to this site, *without* figuring out IE's "privacy policy"
thing, which delayed me for a long time:
www.panix.com
[http://www.panix.com/~dannyw/weblog/Chrome/cookies01.html]
I have not figured out how to use Blosxom's cookies plugin. It looks
right now as if panix's Apache setup does not allow you to rewrite
the header – which contains the cookies – without naming the
script nph something. At any rate, any attempt to actually call the
cookies plugin gets me a 500 error. So instead, I am using a
completely separate Perl script that just gets called from any
.html page.
I don't actually *use* them for anything much, yet. You can certainly
still browse the site with cookies turned off – but your IE may
complain about the absence of a privacy policy.
As well as fixing yet another .htaccess bug (you can't have multiple
RewriteRules depending from a single RewriteCond!!) I tried to
clean up the main template page to use css for font sizes and
general colors, instead of inline definitions.
The pages now look slightly different but it is not significant.
I have not checked the pages in IE yet so they may look a little odd...
My previous attempts to restrict robots using .htaccess did not work: I guess I had
gotten the syntax wrong so everything I tried made the server give a 500 error. I
kept thinking I could fix it but after a while it was just too embarrassing, so
I never admitted it before.
I recently found a much more helpful page than anything I had seen before:
httpd.apache.org
[http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html]
I had previously been searching for terms like "htaccess" because it seemed
to me I was looking for how to use variables inside htaccess. I guess that
such things are actually *only* used inside htaccess blocks relating
to mod_rewrite. I may have glanced at the page before but there seems
to be a lot of forbidding stuff before you get to the meat of the syntax,
so I may have skipped right by it!
Anyhow, there are a lot of essential details, eg the difference between
%1 and $1 variables, and how to do ORs and ANDs. At present my changes
to .htaccess seem to be working OK (although there are a few more
unwanted access modes that I still haven't nailed down). It does seem
to be disallowing things called ".*bot.*" from viewing .*\.prn pages,
for instance.
Since I never understood it, and as far as I could tell nobody
ever used it, I have disabled the trackbacks feature, which
hopefully will cut down on Google hits by a factor of 25%.
My original article on this topic:
www.panix.com
[http://www.panix.com/~dannyw/weblog/Chrome/limit12.html]
I checked out the referer on one of the hits:
icq.hot-news.org/fromwmv.html
This brought up a page where the text of my article had been scrambled
together with many random words, apparently so it would be impossible for
me to find the text by a Google search. It did include a link
to my original page at this site. Then that page closed itself.
It took me to a page which displayed in a window with *no controls*. I couldn't
resize it, or get to any other IE features. To give you some idea, the title
of the window was: "Original amateur and swinger sex photos and videos".
When I hover over the tab in the taskbar, IE reveals:
connect2cash.biz/new2/hta.php?account=adv367
(I intend this link to be non-clickable)
The page says eg "Les membres de Adult Friendfinder proche de Phom Penh" and
"Find a real sex partner in Phnom Penh now!" There are many pictures that
show a *lot* of skintone... none of it, as far as I can tell, Asian. I'm
assuming that whoever did this probably added my text to many, many similar
websites.
This is the sort of thing I feared. Some twerp uses my text to entice the
hapless to his stupid sex website. In one way, it's cool that his site
detects that this workstation is in Phnom Penh. On the other hand..
what kind of lamer lives in Phnom Penh and has to look for *internet* porn??
Also, he is very *unlikely* to be interested in my site when he haplessly
arrives at it (even more than most people).
Incidentally, I wish Google would clamp down on this sort of thing. I suppose
it tries: by checking the sample text surrounding the search term, you can
usually be warned by text like:
gangbangers ferrari hot action two-on-one Windows XP SP2 problems freesex babes
Still, many search terms these days produce *dozens* of hits like that before
you get to anything useful.
Later: when I closed the window with no controls (other than a close button)
it brought me back to the original. For your delight, here is the text
which somehow enticed hundreds of hits to my site:
My laptop how to remove drm from wmv seems how to remove drm from wmv underpowered and glitchy for video. I had a green day - boulevard of broken dreams lyrics badexperience with how to remove drm from wmv the Quicktime player which came how to remove drm from wmv with my Minoltadigital how to remove drm from wmv camera: it how to remove drm from wmv was amazingly intrusive, inappropriate lyrics search and when I triedto remove how to remove drm from wmv it, it kept reinstalling how to remove drm from wmv itself how to remove drm from wmv like how to remove drm from wmv malware.When how to remove drm from wmv I finally got how to remove drm from wmv rid of how to remove drm from wmv it, it had how to remove drm from wmv left so ask jeeves google agreement much poop in theregistry that I was toxic instrumental britney spears unable to play my camera's how to remove drm from wmv Quicktime how to remove drm from wmv recordingsfor how to remove drm from wmv several months. (I'm not even sure how to remove drm from wmv what how to remove drm from wmv fixed that problem how to remove drm from wmv . nursing jokes I how to remove drm from wmv thinkafter I installed a ebay auction builder service pack, something mapquest new zealand which had not fixedthe problem when I tried bittorrent sites it before finally worked.)I *still* don't how to remove drm from wmv have how to remove drm from wmv a way how to remove drm from wmv of converting Quicktime format how to remove drm from wmv toanything that my music lyrics editing how to remove drm from wmv
I guess a *lot* of people want to "remove drm from wmv".
As I promised, I am (lackadaisically) getting around to splitting out
a lot of the features from the normal page format, such as the list
of links which used to appear at the left of every page:
www.panix.com
[http://www.panix.com/~dannyw/weblog/nolist/links01.html]
My original intention in providing all this stuff on every page was
to make it evident, for someone who reaches one of my pages via a
search, that there is a lot more stuff on the site. As far as I can
tell, however, these links are not used often enough to be worth the
extra download time on every single page.
Eventually I intend to get rid of the left column entirely and just have
a single clickable image for site navigation. However, I want to retain
the feature that the site can be navigated in a text browser like
lynx, so it may not be as clean as that.
Over the last few days the overload limit has triggered two or three
times. Unusually, this seems to have been because of, not the usual
Googlebot scanning, but actual people interested in ppics02 for some
reason. I haven't figured out why because there seem to be many
different referers. Also, I can't see why there would be a lot of
interest in it. All I can think is that people are in general more
interested in images than text – especially my opinions – and maybe
I absent-mindedly named an image "cuteyoungcambodianboys.jpg"or
something. Judge for yourself:
www.panix.com
[http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/pppics02.html]
I just got an email from panix saying they've increased the free
download limit, so I'm going to go in an enable it.
I've noticed that Google does not seem to have a record of all my pages,
presumably because it always gets cut off before it spiders my entire
tree. Maybe the new limit will help.
This plugin makes it easy to mark certain files and directories so
that they are not displayed in the normal chronological order,
although they are accessible directly. They also show up in the
tree view of my site.
I want to use this feature for files which I need to display in
Blosxom for the sake of consistency, but where their *date* may
be misleading: eg files which I continuously *update* rather than
filing a single time under a certain date. I want
to move a lot of the sidebars which I currently display as part
of *every* page onto separate pages, to save download time;
in particular I want to move the external links section to a
separate page so that I can arrange and describe them better.
Over the last few days the Googlebot has downloaded enough pages to
exceed my server's capacity limit twice. This is irritating enough,
but as it happens most of the downloads were not to real information
pages (well, *I* like to think of them as real information) but to
ancillary stuff like the .trackback and .writeback features. Although
these are marked "do not index", by the time Google sees that it's
already done the download.
While fuming about that (and the way that Google, like the phone
company, never responds to complaints), it occurred to me that there
is a partial fix for that using .htaccess.
The .htaccess file is checked by Apache every time it gets a file request,
and controls the response in many interesting ways. (I have referred to
this before.)
I have now (attempted to) set up my .htaccess file so that whenever
a request arrives [for a page which I do not want Google to index,
such as the writeback and trackback features – 2005-02-27]
and the "user-agent" is set to "*bot*", Apache just
sends back a *very short* page saying that this page is not for
bots. This should cut down on the bandwidth load considerably (not
to mention the CPU time; although I am not charged for this, I do
actually feel guilty about it, because if this server were all mine
I would put a lot more effort into minimizing CPU load).
Good overview of .htaccess:
apache-server.com
[http://apache-server.com/tutorials/ATusing-htaccess.html]
Some hints on handy uses:
www.edevcafe.com
[http://www.edevcafe.com/viewdoc.php?eid=92]
I may well need to tinker with the .htaccess setup a bit so I
apologize if anyone tries to use a feature which I have accidentally
disabled.
When I checked my (very short) server logfile today I noticed
that someone had come in to this page at my site:
www.panix.com
[http://www.panix.com/~dannyw/weblog/2004/08/07#wxpkey01]
from the following page:
www.ccdigest.com
[http://www.ccdigest.com/news/53930.html]
I took a look at the page. The page doesn't actually *show*
that the material is ripped from my site; they just include
a link to it. The page has a link to a Google AD url: so
they want people to find "their" page with *my* info and
get Google ad money for it.
I suppose I would be more agitated about this if most of *my*
page didn't consist of something which *I* had ripped out
of Slashdot.
It's quite interesting that Google does *not* reveal the
existence of this link:
Your search - link:www.panix.com
[http://www.panix.com/~dannyw/weblog/2004/08/07#wxpkey01] - did not match any documents.
(I also tried the search without the "#wxpkey01" – same answer.)
I don't see why Google *wouldn't* return this page: after all
the ccdigest.com site makes its money from people who come in via
Google, so surely they would make the page searchable. Hmmm.
According to my logs I would probably have lost only
7.12 hits during the period, but for about a day the
entire panix.com domain was hijacked by some sort of
bad guy.
Link to Slashdot discussion:
it.slashdot.org
[http://it.slashdot.org/article.pl?sid=05/01/19/017229]
Link to a webpage produced by Panix to explain the situation:
www.panix.com
[http://www.panix.com/hijack-faq.html]
In general I was impressed by how many commentators made the
point that panix is highly respected.
Someone on Slashdot correctly pointed out that since the
bad guys owned the domain, they could have set up dummy
mailservers which did nothing but record the username and
password of people who attempted to download mail from them.
It is certainly possible to set your email client to do
encrypted logins, but actually I don't know the details of
what happens there and it may well be possible for a
server which just is pointed to by the current fraudulent
dns to grab your account info (ie, the encryption scheme may
only encrypt the data on the link *between* your client
and the mailserver).
Personally I would like to get *multiple* logins with any account,
some of which can *only* be used for email, some for shell,
etc.
In my own case I don't think I was compromised because I read
email via ssh and *pay attention* to the warning strings that
give the server ID. Hmmm... I wonder if an attacker can somehow
replay those? Gulp...
For a long time I have had a link to Slashdot, the forum website for
computer geeks, on my blog pages. Recently it stopped working
for me. Instead of opening up a regular browser window, it opens up
a grey warning box for "File Download", saying "Some files can harm
your computer..." and offering to download the filename "slashdot"
from domain "slashdot.org".
I wondered if I had been tightening security settings on the client
too much but experimenting got nowhere. I figured out however that
I *could* reach eg apple.slashdot.org correctly.
Then I noticed that this current machine shows slashdot.org as a
*trusted* site. When I checked, there were a bunch of obvious trojan
and adware sites in the "trusted" group. Even after I deleted them
(and Slashdot was not among them) it still was shown as "trusted"!
I changed the settings under "trusted" to my usual suspicious level,
but I think this machine (in an internet cafe) is hosed. However,
I have been noticing this problem on several *other* machines. I
saw nothing obviously wrong in the process list. "netstat -an"
also showed nothing suspicious.
I changed the link to "main.slashdot.org" just for my own
convenience. I'm guessing that on this machine, all page requests
are going through some sort of redirector which is not correctly
programmed to handle urls which have no machine name in front
of the domain name.
My faithful fan may be relieved to know that I was not enjoying an
expenses-paid stay at a government institution. The explanation is
this: two private projects – stuff I don't talk about on the weblog
because even *I* think it's too dull to post – have been taking
up all my attention.
I am starting to wind down on both these projects and hope to be
back in a week or two posting as frequently as before. With a bit of
luck, I may feel energized enough to do a major redesign of the
website, too – I plan to make the pages faster to load, and cut down
unnecessary hits by webcrawlers.
Among my "favorite links" is now a link which tells the w3c
to validate my webpage.
validator.w3.org
[http://validator.w3.org/check?uri=http%3A%2F%2Fwww.panix.com/~dannyw/weblog/]
The first time I tried it I got an eye-popping 174 errors – more
than Slashdot! A lot of them were mismatched p elements, which
I already knew about, but most of them were just flubs which I
hope to hack away. It's really amazing what IE manages to
display apparently cleanly. Right now I'm at about 130 errors...
The issue now is actually one which has puzzled me from the beginning:
exactly what sets the displayed right margin?
It seems that IE refuses to wrap a word if it cannot find whitespace.
So because I display very long URLs occasionally, they may force the
right margin to extend farther than I thought I was setting, and once
the right margin has been pushed out all the rest of the text expands
to match, which allows too many words per line, and damages readability,
as well as sometimes making it necessary to do a horizontal scroll.
(For purists who feel that the user/browser should control layout
parameters: I am struggling to do this in *stylesheets*, so it can be
overrridden when desired.)
The problem can also be triggered by "pre": this HTML keyword seems to
prevent IE from breaking the line even at spaces.
The problem is worse in the printable format, because while IE still refuses
to break the line, it simply *throws away* text outside the right
(physical) margin of the paper, with no error messages. As I often use
"pre" to display code, this can be disastrous.
I am currently trying to find this issue addressed on the web, with no success,
despite much tinkering with .css files. I apologize.
I tried to create a simple table today using the "pre" tag, which my .css
file defines as monospaced, but it just would not work.
For some reason the machine I have been using in this internet cafe displays
monospaced using some sort of OCR font, not Courier. I wondered if this was
because it's Thai Win98, and tried defining the language as "en-US" in the
metatags, but that was no help.
Eventually I gave up and used a table, but of course this was no fun
because it interacts with the code which automatically creates
paragraphs out of plain text. Eventually it sorta worked. I need to see
what happens with non-Thai setups.
A couple of days ago I realized Google was indexing a lot of my writeback
pages, and even causing users to prefer them (because they are smaller).
I don't like this because the writeback pages have no navigation. So I've
added a gadget to my (now bloated) .htaccess file to rewrite incoming
hits via Google from .writeback to .trackback. I haven't added a rule
for ".trackback" yet: I'll wait and see if the rule seems to work for
".writeback".
While reading a Slashdot article about a book called "Google: The Missing
Manual" it occurred to me that it would be useful to do a Google search
which returned *all* Google's entries for my site.
The search term I used was this: "dannyw site:www.panix.com" (because Google
does not seem to recognize the "~/danny" part with the "site:" operator).
When I ran it today it returned 713 hits when there are 699 separate
articles. That might be OK – I want Google not to index compilation pages;
but unfortunately Google are including ".writeback" pages in that total.
Worse, I now see that at least one of those ".writeback" pages has
"noindex,nofollow". Presumably that means I have the syntax wrong
somehow.
Also, there are plenty of compilation and daterange pages in the list.
I think what I need to do is use Apache to automatically convert people
coming in from Google to a .writeback page to go to the .html version.
The heck with what Google says.
Since non-".html" pages are included in the total of 713, that means Google
*still* has not indexed my entire site. Sheesh. I suppose it doesn't
help that every time they decide to index the site it triggers the overload
limit, but you'd think their recovery algorithm would handle that better.
Looking at the logs though, I can't make out any plan to what pages they
hit. In particular, they hit the same damn page over and over again.
F'petesake.
A couple of days ago I noticed that some twerp had added half a dozen spam
messages to this site. The messages were all essentially meaningless and
thus did not have any relationship to the article they purported to comment
on (or so I like to think). The common factor was that they all contained
the url of a commercial site (all the same one).
This type of spam is usually done to add links back to the spamming website:
Google thinks that that means this site is saying that site is interesting,
so it (infinitesimally in my case) improves the search ranking of that site.
Here is some of my log output:
7/27/2004|19:57:35|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Soc
iety/mmoore01.writeback|
7/27/2004|19:57:48|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Soc
iety/mmoore01.writeback|
7/27/2004|20:2:28|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Poli
tics/Iraq/saddam01.writeback|
7/27/2004|20:2:35|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Poli
tics/Iraq/saddam01.writeback|
7/27/2004|20:3:1|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Polit
ics/Iraq/thewarinusa02.writeback|
7/27/2004|20:3:4|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Opinions/Polit
ics/Iraq/thewarinusa02.writeback|
7/27/2004|20:3:37|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Computers/Int
ernet/sitecert01.writeback|
7/27/2004|20:3:40|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Computers/Int
ernet/sitecert01.writeback|
7/27/2004|20:4:5|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Computers/Inte
rnet/png03.writeback|
7/27/2004|20:4:9|151.38.236.213|libwww-perl/5.800|/~dannyw/weblog/Computers/Inte
rnet/png03.writeback|
I remember being worried by seeing "libwww-perl" as soon as I saw it.
The ip number looks up as " adsl-213-236.38-151.net24.it". I'm not going to
quote the url he wanted to spam because a) I don't want to give him any
more publicity and b) it could be a joe job, but maybe I just have a suspicious
mind.
I don't know why he bothered because my site displays writebacks without any
html: ie, the page does not have an active link to the url, just the *text*
of that url. Presumably Google does not rate that very highly. Maybe he
added the msgs before checking how they would be displayed.
It occurs to me that the perp probably used Google to search for "writeback"
in order to find victims. I should probably rename the feature somehow to make
it harder to search for.
He was clever enough to add writebacks to older pages that would not display
on the current start page, so that I would not detect it, but as it happens
I had written a little batch file to easily check for recent writebacks
(in the vague hope that anyone was actually using it as intended) so I
spotted the misuse as soon as I logged in. I left it for a while as the
result was not much of a problem. Today I wrote a little batch file to
snip out the spam and store it in a zip file (basically "zip -rtmT" –
"zip" has a lot of handy features – although it took me a while to figure
out you have to use "unzip -l" to list the contents).
It occurs to me that his bot could be set up to create spam on *every single*
page that Google has indexed with writeback. Wow! On a low-volume site like mine
that's easy to clean up but it would be a huge mess on an active site. Maybe
I need to add a field to a posted writeback which contains the incoming ip, to
make it easier to filter if necessary.
Again, it seems to me that this sort of issue needs to be addressed in some
sort of overview documentation for the writeback feature.
Although I recently set my writeback pages to "noindex", Google
already indexed plenty of my writeback pages prior to that. So
I've noticed in my logs that plenty of people are coming in
directly to the writeback page.
The writeback page includes a simple text version of the article,
so the user is probably happy. I think the user prefers to click the
writeback version because Google seems to have a rule of displaying
the writeback link as the main link and the regular page as the
inset link. Presumably that's because the writeback link, being
simple text, is about half the size in kB.
Of course the problem for me is that the writeback page has no
navigation links to the rest of my site, so users who come in to
the writeback page never check out anything else.
Hopefully this problem will subside as the indexed writeback pages
slowly age out of Google. If only there were some automated way
to *set* stuff like this in Google.
A few days ago I implemented a fix for the print-format "flavor"
feature. Mindful of my previous problems with this, I decided then
not to say it was fixed until I'd tried it for a few days. It
looks like it is OK now. I think I'll leave the link to it saying
"testing" for a while yet though!
Yesterday I changed the template for this site so that the single-story
pages will include the following line:
<meta name="robots" content="index,nofollow">
Obviously the reason for this is to prevent robots like the Googlebot from
following links on the page. The reason I need to do that is because I've
added links to trackback/writeback features, and of course for Google
to index those is a waste of time for me and Google.
This fix is not perfect, because *compilation pages* also include the
writeback/trackback links, and I don't want to add "nofollow" to them
because then Google would never reach my single-story pages.
The only clean way to do it would be to have no stories on the front page,
but include a link to a page listing links to *all* the stories (maybe
visible only to robots). Everything else would have "nofollow,noindex"
except the individual stories which would be "nofollow,index". But
I dislike home pages which are effectively just a "splash" page. And
I'm pretty sure they discourage people from going further into the site.
Incidentally *most* of the tweaks I have applied to the site have been directed
at robots. I haven't bothered listing them as they are (hopefully) invisible to
users. I'm listing this one because I'm irritated that this issue isn't
mentioned in the docs for the trackback/writeback module of Blosxom. For
instance, I just realized as I was writing this that I *also* need to
set "noindex,nofollow" in the writeback pages themselves.
My code does not grab the current path and filename properly, so the
link provided when you click on "formatted for printing" is
incorrect , unless you are looking at a topic rather than a single
file or date range. (Guess what I originally tested it with.)
I took a shot at fixing it last night but I haven't sorted it out
yet. Meanwhile, you can get the basic feature if you want just
by changing the file extension on the normal URL to ".prn". For
instance, if your current page is
www.panix.com
[http://www.panix.com/~dannyw/weblog/Computers/Opsystems/Windows/filext01.html]
you can manually edit the end of that URL to make:
www.panix.com
[http://www.panix.com/~dannyw/weblog/Computers/Opsystems/Windows/filext01.prn]
If there is no filename at the end, you can use "index.prn", eg:
www.panix.com
[http://www.panix.com/~dannyw/weblog/2004/02/index.prn]
A few weeks ago I noticed that for some reason IE was truncating the
right margin occasionally when I printed out pages from this site.
It seemed to be a problem inside IE because the point of truncation was
still within the printable area judging by the printed header/footer.
I have now created a ".prn" flavour for the pages. There is a link
to this "flavour" in the lefthand sidebar. It seems to print out
better than the standard format; at least it doesn't waste time on
navigation elements.
It's not super-clever. In particular, I would like the code which
expands URLs to know whether it is being called inside a regular
(.html) page or within a .prn page and adjust appropriately.
Also, there's a slight bug. Because of the way I detect whether
the displayed page is the home page or a different page, the home
page in print mode displays differently from the home page in
normal mode – at present the main difference is it shows 50
articles instead of 15.
Because I never used or read blogs before I set up this one, I don't
understand a lot of things about them. In particular I don't really
get trackbacks.
Google brought up the following links:
Movable Type has a nicely-formatted explanation which is unfortunately
very much aimed at their own product, so I found it quite opaque:
www.movabletype.org
[http://www.movabletype.org/trackback/beginners/]
More readable explanation, regrettably also oriented to Movable Type:
www.cruftbox.com
[http://www.cruftbox.com/cruft/docs/trackback.html]
More technical but far more informative explanation of how someone
programmed it by himself (otoh, he says at the top dated 2003-03 that
his system no longer works and he hasn't figured it out yet):
www.hitormiss.org
[http://www.hitormiss.org/projects/trackback/]
It would seem from the above link that one could send a ping to my
weblog using an URL such as the following (warning: my basic
display code automatically formats URLs to be clickable, so I
have had to change http to dttp here):
[2005-10-25: added whitespace to avoid long-line issues]
dttp://www.panix.com/~dannyw/weblog/Asia/Cambodia/Miscellaneous/ twobros01.trackback?dttp://www.panix.com/~dannyw/weblog/ &blog_name=danny+test&title=My+first+trackback+test
It returned the following XML page, which does not have an explicit
error msg but is otherwise not very encouraging:
<?xml version="1.0" encoding="iso-8859-1" ?>
- <response>
<error />
<message />
</response>
...Hmmm. It appears the creator of Blosxom has a page on this:
www.raelity.org
[http://www.raelity.org/archives/2002/09/06#computers/internet/weblogs/blosxom/trackbacks_in_blosxom]
Surprisingly, it appears to advise you to download code from Movable
Type for this! And it only supports *receiving* trackback pings.
Hmmm.
He says: MMmmm... you just gotta love that simplicity of integration.
I can't tell if he's joking or not. And how is it supposed to work with the
existing trackback/writeback module??
This guy grumbles that trackbacks are *easy* to understand. Ten people
provide comments that they're difficult, but he refuses to believe
them:
nslog.com
[http://nslog.com/archives/2003/03/31/trackbacks_tough_to_understand.php]
The "hitormiss" link above includes the following link to a technical
spec at Movable Type which makes far more sense than their overview above:
www.movabletype.org
[http://www.movabletype.org/docs/mttrackback.html]
"Writebacks" are the name of the feature in the Blosxom software which runs
this weblog which allows readers to add comments. Previously I ran into
some problems, as well as security concerns, so I had left the feature
disabled.
I took another shot at enabling writebacks – mainly because the feature is
intertwined with trackbacks – and it seems to be working OK. However I
am very nervous about allowing random twerps to add junk to the site when
I'm perfectly happy with *my own* junk. So I may well disable it again.
Btw, when you click on "View/add responses" it shows the entire text of
the article you're responding to, along with a form to fill in with
your comment. You need to fill in the form and then click the "Post"
button (way at the bottom). The system will send back the same page
with your posting at the bottom of any previous postings.
I'm actually very vague on how the trackback feature works *at all*, so
if you think it's not working, you're probably right. Hopefully you
can now let me know.
I realized recently that the top right-hand graphic of my weblog page –
it reads "Danny's Weblog" with an arrow pointing to "Opinions, Languages,
Links, Computing, Asia, Reviews" – may mislead people into thinking it's
clickable, and when they try clicking on it and nothing happens, they may
conclude the site is broken.
I considered adding a click action that just says "Please *don't* click here"
but (unusually for me) reconsidered.
So now clicking that graphic brings up this: an intro to the site.
I have always provided the "Chrome" subtopic for info about the site:
www.panix.com
[http://www.panix.com/~dannyw/weblog/Chrome/]
and you should look there also.
I didn't originally intend to make the graphic clickable because I
mildly disapprove of using graphics to provide basic site navigation.
(I am also too lazy to keep updating the graphic with new topics.) That's
why I use the Blosxom plugin which provides a clickable text tree
showing all the subtopics on the site (see "Site Map" in the left column).
Note that when you click on a subtopic, Blosxom also displays postings from
*sub-sub-topics*,in reverse date order, up to some limit – currently fifty.
The number of articles in each subtopic *and its sub-sub-topics* is displayed
in parentheses next to the topic on the site map.
Many of my opinions may appear paranoid. My basic justification for that is
the following thought experiment: If you have to choose one of 3
possible explanations A, B and C for something, and you assign
probabilities of 15%, 10% and 5% to them, and the government is
pushing "B", what is the most productive
(minimax) response? It seems to me that most people choose one of two
strategies: they actually *believe* the government explanation (because it
causes least worry and trouble – I suppose the psychological process
involved is rather like the interrogation sequence in Orwell's "1984"),
or they choose no explanation at all.
I happen to think – in my paranoid way – that the government itself
creates most of possibilities A and C, and D-K. So people who choose
*no* explanation allow the government to continue to rule via a
pyramid of lies.
"Our so-called leaders speak."
It turns out that my regex at least sometimes fails, apparently because
the main regex which detects a para and wraps a para format around it
*actually triggers at the end of the para*.
Grumble grumble... too lazy to fix this today.
A few days ago I noticed that Mozilla was not displaying quoted text
from my site. It transpired that was my fault: the code which wraps
paragraphs in a style was also wrapping a bare quote command,
which meant that it did not next properly: as usual, that caused
Mozilla to ignore the quote command completely, but IE tried to
DWIM, and succeeded.
After some head-scratching, I've succeded in patching the perl.
Most of my confusion was due to forgetting a /g at the end of the
regex: at that point the entire file is in a single variable, so
my regex would only work on the first quote command! Sheesh. I
hope recruiters never read blogs.
I also hope my patch doesn't ding something else...
1. I still haven't fixed the busted quote problem in Mozilla. I
found the right place to fix it in the code, but my regexes
didn't work right – *blush* – will get back to this when I feel
energetic (if ever).
2. I found why the "writeback" stuff was showing up in the HTML
source, and in Mozilla – because I'd absent-mindedly left it in
the story flavor file, duh.
3. On the other hand I'd like to add trackback features (not that I
incestuously quote other blogs much, but it's neat conceptually).
And they're intertwined with the "writeback" feature. So I may
get back to that.
Although I am happy that people want to read my many insights of genius,
it sometimes bugs me when people download my *entire* site. I have to wonder
if they're just using my text to set up a plausible-looking fake website
so that they can fill it with spamlinks. (Although everything on my site is
of interest to *me*, there is very probably no other person in the solar
system with the same *spectrum* of interests.) Additionally, I have to pay for
traffic, so I don't like it when people hit the site too often.
Accordingly I have started to set up some triggers which may cause
you to receive a "blocked warning" under some circumstances, such as:
1. You set your newsreader to check the site more than once per day
2. You hit the site more than ten times in 24 hours (under some circumstances)
For the latter condition, I intend to set up a rule which will actually allow
normal browsing, under most circumstances, to proceed, but which will catch
robots. I don't intend to describe this rule in detail.
Back in February I quoted a Slashdot poster who was talking about a
security problem in Windows: because Windows by default hides file
extensions, someone will eagerly click on boobies.jpg when it in fact
is boobies.jpg.exe.
Now I find the following in my log file:
5/7/2004|10:6:20|12.36.152.153|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0
; FunWebProducts; .NET CLR 1.1.4322)|/~dannyw/weblog/Computers/Opsystems/Windows
/filext01.html|www.google.com
[http://www.google.com/search?q=boobies+.jpg&hl=en&lr=&ie=UTF-8&oe]
=UTF-8&start=40&sa=N
which tends to suggest that the Slashdot poster was very correct. How desperate
would someone have to be to click on Google's listing for *my* site for
boobies? It must list around 612,497.
Btw, since *this* file has "boobies" in it 5 times, maybe it will attract
more hapless victims. (Dr Evil laugh)
I must admit I hadn't bothered to check out whether my site looks OK
in anything other than Lynx and IE 5/6. I assumed that since I was
using rather basic formatiing not much could go wrong.
It turns out that Blosxom handles paragraph formatting in a way
which does not allow my own formatting to nest. Specifically, at the
beginning of each paragraph, Blosxom puts in a "<p class=story_para>"
even though my own HTML may come after this. At the end of the para,
Blosxom closes the <p>, violating the nesting rule for my own
formatting!
I only use the "<q>" element for formatting more than a single line,
so this bug only causes a problem with quoted text. And IE doesn't
show the problem: as is generally the case, IE takes a shot at
displaying defective code, whereas Netscape descendants like
Mozilla etc just ignore the offending tags. Anyhow, if you use Linux,
or a Mac, or you just prefer Mozilla, that's why quoted text always
ends after a single para. At least until I figure out a fix. (I would
much rather not have to screw with the source files, but I'm not
looking forward to trying to get Blosxom's para formatting to safely
intertwine with my own.)
Btw, when I looked at the Mozilla output, it also shows an empty
<q class=writeback></q> pair. I don't know why that's in there:
I thought I'd disabled this.
I hope this information was useful. There may be a great deal more
information on this site that is relevant to what you need.
Take
a look at the "site map" display at left; you
can click on a topic to see many recent items on that topic.
Debug: hittotal: 4 startban: 0
dancookie: endbandate:
banned: 0 tempdate:
tert: jse: jsno jsh: 4