Yet another stupid formatting scheme? Why, when there's already HTML? Sounds pretty tedious, you say? Well, it's not strictly a formatting scheme, think of it more as preprocessor-enabled html. Parse.cgi parses a pages in a slightly "custom" format (.par files) into standard html. Among other features, this permits random/ client-dependent/ cookie-dependent text substitutions on the fly, footnotes and easy indexing. (The random text substitutions means that the pages are dynamic, that's why the text sometimes is different at different times.) The .par files basically are html, but with certain minor enhancements. I don't really expect anyone else to use this, but who knows.
(Ist't HTML getting too tedious? Don't you agree with the old-timers, and sometimes feel HTML 2.0 was pretty OK — couldn't they just have left well enough alone? Do you really want to mess with style sheets? Wouldn't you like to be able to tweak something, like to add set-cookie-read-cookie behavior, background or something else globally across all your files with one edit? Would you like to be able to do stuff like transforming every other word of the whole site into pig latin, but only for users of the aol browser, with a few lines of perl? Or change all occurences of "OTOH" into "on the other hand" across the whole site, but non-destructively? Or depending on what time of day it is? In one line?)
Well, if you know cgi perl and are able to use it on your site, this is one way. To go back to plain html, one can of course "freeze" the output of parse.cgi into .html files with some BS like:
# untested foreach ( <*.par> ) { /(*.)\.par/; $base = $1; @output = `parse.cgi < $base.par`; $saw_header = 0; $page = ""; foreach(output) { # strip any html headers if (/^$/) {$saw_header = 1;} $saw_header || next; $page .= $_; } $page =~ s/.par"/.html"/g; open(OUT, ">$base.html"); print OUT, $page; close(OUT); }
<a href="page.par">
When the link is selected, the page will be generated on the fly.
(The above gets converted on the fly to:
<a href="parse.cgi/page.par">
which means I didn't have to mess with any of the server configuration files. From within a normal html page the latter is what you have to use..)
Here is some info, anyway. I very often add features that don't get documented here, consult the source.
footer.par
gets included in every page parsed, before the </body> tag.
/// PAGE_TITLE
fsdfs
// abcd
dsa
// defg
asd
results in parsed output like:
<html><head><title>PAGE_TITLE</title></head><body>
<h2>PAGE_TITLE</h2>
fsdfs
<h3>abcd</h3>
dsa
<h3>defg</h3>
asd
<!-- footer.par included here! -->
</body></html>
(The 2/3 in <h2> <h3> is set globally at the top of the parse.cgi script. All this is easily hackable, of course, if you know perl)
[* footnote text]
The default footnote handling method is to make a link to a new, separate page and with a javascript popup window, but this behavior this can be modified with the cookie "foot". Recognized values are "normal" or nonexisting, "href" which omits the popup window, "inline" which inlines the footnote in the text, "bottom" which puts the footnote at the bottom of the page, and "off" which deletes the footnote completely.
To output string a, b, c or d with equal probability:
{a|b|c|d}
p(a) = 0.1, p(b) = 0.3, p(c) = p(d) = (1-(0.1+0.3))/2
{?0.1a|?0.3b|c|d}
To output string a about one time every hundred accesses:
{?0.01a}
To output a only if string is contained in remote_host:
{@string a}
Same, but otherwise output b:
To output string only if cookie_key has cookie_value:
{=cookie_key cookie_value string}
To output string1 if cookie_key has cookie_value, string2 if not:
{=cookie_key cookie_value string1|string2}
This is pretty esoteric... the special 'timestamp' cookie-substitution-syntax will output string if the user's cookie with key 'timestamp' is lower than the supplied time()-value:
{=timestamp time()-value string1|string2}
This syntax inserts the timestamp cookie as a text string, but only if the user has cookie key 'lastvisit' set to value 'on':
{=gettimestamp ignored_string1 ignored_string2}
You set and change the cookies by specifyin them after a caret ("^") character to parse.cgi's filename argument, ie with urls such as:
page-to-generate-while-cookies-are-set.par^key1_value1^key2_value2
You set the 'special' timestamp cookie to 'now' by merely accessing it like
page-to-generate-while-timestamp-is-set.par^timestamp
The special syntax "page.par^HILITE_word" will mark all occurences of the word in the document.
Other special cookies change the formatting of the document. If the cookie is not present, the value "normal" is assumed. Some of the recognized ones are marg, back, showcookies, sick, smiley, dash, lastvisit, etc..
See the parse.cgi source or perhaps the "about" page for more details..
To insert "string" into the BODY tag of the page:
## body string ##
To switch on/off literal mode (no parsing of the lines in between)
## literal ##
## /literal ##
To delete lines (lines are completely disregarded)
## omit ##
## /omit ##
To insert "fake" cookies (overrides real ones) into the page to control output
## fakecookies cookie_cookievalue ##
## fakecookies c1_val1^c2_val2^etc ##
is transformed into <p>
\\ in the beginning of a line
is transformed into <br>
If any of these (\\ or \) are directly followed by a TAB character the line will be indented on graphical browsers.
<g> , :) , ;) are possibly transformed into something else (configurable)
# comment
transformed to <!-- comment -->
## hidden comment
If you want to use the brackets cosmetically, leave spaces like
[ this ]
and it won't get processed.
The program generate_index.pl extracts info from all the *.par files it finds in the present directory. It creates the following files:
sect_index_list — a list of all the pages / sections found, and in what files, and what their modification date is
word_index — a file used by parse.cgi to index words in the site that you 'forgot' to enclose in brackets (this was not useful — I disabled it in the end.)
word_index_list — a list of all index words with descriptions etc
sect-index.par — a nicer looking .par/html formatted section index for browsers
word-index.par — a nicer looking .par/html formatted word index for browsers
Strings of the form
[?KEY dejanews amazon]
will generate a search link.. this page in turn expands KEY/dejanews/lycos according to rules, set globally, in just one place (generate-web-search.cgi) .
parse.cgi would convert to something like:
<a href="generate-web-search.cgi/KEY+dejanews+amazon"> Search the web for KEY </a>
generate-web-search.cgi in its turn generates a page with the necessary links, correctly formatted.
It also allows for higher-level classes, say 'web'; implying 'lycos', 'infoseek' and 'altavista' for instance..