home   sections   references   cd:s   about   links   heptagon 
 no margins   view as white text on black backgound 


PARSE.CGI

An overview

Of the poorly-thoughtover, hithereto-buggily-implemented possibly-insecure script PARSE.CGI . It's a start, anyway.

Yet another stupid formatting scheme? Why, when there's already HTML? Sounds pretty tedious, you say? Well, it's not strictly a formatting scheme, think of it more as preprocessor-enabled html. Parse.cgi parses a pages in a slightly "custom" format (.par files) into standard html. Among other features, this permits random/ client-dependent/ cookie-dependent text substitutions on the fly, footnotes and easy indexing. (The random text substitutions means that the pages are dynamic, that's why the text sometimes is different at different times.) The .par files basically are html, but with certain minor enhancements. I don't really expect anyone else to use this, but who knows.

Why?

Why? Well. Because I like to hand-edit html, and I resolved to hand-edit my last <p> or </hx> tag. This site is quite small, so anything else would be overkill. My pages are always flat and text-based anyway — I don't hate blind people. I don't use much javascript, so the characters "[]{}" are pretty useless and could be redefined. (If this is a problem one could easily rehack the scripts to use something like <* ... > instead) I also thought a simple way of specifying 'dynamic' content that changes stochastically with each loading of the page, integrated footnotes, and automatic indexing could be sort of cool.

(Ist't HTML getting too tedious? Don't you agree with the old-timers, and sometimes feel HTML 2.0 was pretty OK — couldn't they just have left well enough alone? Do you really want to mess with style sheets? Wouldn't you like to be able to tweak something, like to add set-cookie-read-cookie behavior, background or something else globally across all your files with one edit? Would you like to be able to do stuff like transforming every other word of the whole site into pig latin, but only for users of the aol browser, with a few lines of perl? Or change all occurences of "OTOH" into "on the other hand" across the whole site, but non-destructively? Or depending on what time of day it is? In one line?)

Well, if you know cgi perl and are able to use it on your site, this is one way. To go back to plain html, one can of course "freeze" the output of parse.cgi into .html files with some BS like:

  # untested
  foreach ( <*.par> ) { 
    /(*.)\.par/; $base = $1;
    @output = `parse.cgi < $base.par`; $saw_header = 0; $page = "";
      foreach(output) { # strip any html headers
        if (/^$/) {$saw_header = 1;}
        $saw_header || next;
        $page .= $_;
      }
    $page =~ s/.par"/.html"/g;
    open(OUT, ">$base.html"); print OUT, $page; close(OUT);
  }

How-to basics

From within another .par page, I link to .par files with links such as

<a href="page.par">

When the link is selected, the page will be generated on the fly.

(The above gets converted on the fly to:

<a href="parse.cgi/page.par">

which means I didn't have to mess with any of the server configuration files. From within a normal html page the latter is what you have to use..)

Here is some info, anyway. I very often add features that don't get documented here, consult the source.

Hardwired stuff

The file

footer.par

gets included in every page parsed, before the </body> tag.

Headings

A very quickly hacked page such as

/// PAGE_TITLE
fsdfs
// abcd
dsa
// defg
asd

results in parsed output like:

<html><head><title>PAGE_TITLE</title></head><body>
<h2>PAGE_TITLE</h2>
fsdfs
<h3>abcd</h3>
dsa
<h3>defg</h3>
asd
<!-- footer.par included here! -->
</body></html>

(The 2/3 in <h2> <h3> is set globally at the top of the parse.cgi script. All this is easily hackable, of course, if you know perl)

footnotes

are invoked with the sequence

[* footnote text]

The default footnote handling method is to make a link to a new, separate page and with a javascript popup window, but this behavior this can be modified with the cookie "foot". Recognized values are "normal" or nonexisting, "href" which omits the popup window, "inline" which inlines the footnote in the text, "bottom" which puts the footnote at the bottom of the page, and "off" which deletes the footnote completely.

include

To include the text of the file 'file':

<!--#include file-->

stochastic and conditional text generation

Enables different text each time the page is accessed.

To output string a, b, c or d with equal probability:

{a|b|c|d}

p(a) = 0.1, p(b) = 0.3, p(c) = p(d) = (1-(0.1+0.3))/2

{?0.1a|?0.3b|c|d}

To output string a about one time every hundred accesses:

{?0.01a}

To output a only if string is contained in remote_host:

{@string a}

Same, but otherwise output b:

{@string a|b}

cookie-dependent content

Enables persistent state changes for site.

To output string only if cookie_key has cookie_value:

{=cookie_key cookie_value string}

To output string1 if cookie_key has cookie_value, string2 if not:

{=cookie_key cookie_value string1|string2}

This is pretty esoteric... the special 'timestamp' cookie-substitution-syntax will output string if the user's cookie with key 'timestamp' is lower than the supplied time()-value:

{=timestamp time()-value string1|string2}

This syntax inserts the timestamp cookie as a text string, but only if the user has cookie key 'lastvisit' set to value 'on':

{=gettimestamp ignored_string1 ignored_string2}

You set and change the cookies by specifyin them after a caret ("^") character to parse.cgi's filename argument, ie with urls such as:

page-to-generate-while-cookies-are-set.par^key1_value1^key2_value2

You set the 'special' timestamp cookie to 'now' by merely accessing it like

page-to-generate-while-timestamp-is-set.par^timestamp

The special syntax "page.par^HILITE_word" will mark all occurences of the word in the document.

Other special cookies change the formatting of the document. If the cookie is not present, the value "normal" is assumed. Some of the recognized ones are marg, back, showcookies, sick, smiley, dash, lastvisit, etc..

See the parse.cgi source or perhaps the "about" page for more details..

Directives

A few strings of the form
## directive ##
on a line of its own has special meaning to parse.cgi. Presently implemented directives:

To insert "string" into the BODY tag of the page:
## body string ##

To switch on/off literal mode (no parsing of the lines in between)
## literal ##
## /literal ##

To delete lines (lines are completely disregarded)
## omit ##
## /omit ##

To insert "fake" cookies (overrides real ones) into the page to control output
## fakecookies cookie_cookievalue ##
## fakecookies c1_val1^c2_val2^etc ##

other transformations

\ in the beginning of a line

is transformed into <p>

\\ in the beginning of a line

is transformed into <br>

If any of these (\\ or \) are directly followed by a TAB character the line will be indented on graphical browsers.

<g> , :) , ;) are possibly transformed into something else (configurable)

# comment

transformed to <!-- comment -->

## hidden comment

transformed into a blank line

Indexing

Enclosing a string in [brackets] will generate a link to an index, and an entry in the index.

If you want to use the brackets cosmetically, leave spaces like
[ this ]
and it won't get processed.

The program generate_index.pl extracts info from all the *.par files it finds in the present directory. It creates the following files:

sect_index_list — a list of all the pages / sections found, and in what files, and what their modification date is

word_index — a file used by parse.cgi to index words in the site that you 'forgot' to enclose in brackets (this was not useful — I disabled it in the end.)

word_index_list — a list of all index words with descriptions etc

sect-index.par — a nicer looking .par/html formatted section index for browsers

word-index.par — a nicer looking .par/html formatted word index for browsers

search pages

This was an idea I had to make links to search pages easy. The engines constantly change their URL specifics.. this way you only have to reedit in one place.

Strings of the form

[?KEY dejanews amazon]

will generate a search link.. this page in turn expands KEY/dejanews/lycos according to rules, set globally, in just one place (generate-web-search.cgi) .

parse.cgi would convert to something like:

<a href="generate-web-search.cgi/KEY+dejanews+amazon"> Search the web for KEY </a>

generate-web-search.cgi in its turn generates a page with the necessary links, correctly formatted.

It also allows for higher-level classes, say 'web'; implying 'lycos', 'infoseek' and 'altavista' for instance..


Page updated Mar 10, 2000 at 13:15 • Email: jens@panix.com

All content copyright © Jens Johansson 2024. No unathorized duplication, copying, mirroring, archival, or redistribution/retransmission allowed! Any offensively categorical statements passed off as facts herein should only be construed as my very opinionated opinions.