Building a website using AsciiDoc

Contents

First site
Custom site
RSS
Dependencies

Note	This is a work in progress.

I want to build a simple website that I can host via my email provider who doesn’t support features like CGI, PHP or databases. I have some experience with DocBook and with AsciiDoc which allows you to write plain text documents with a simple markup that can be rendered in various formats, one of which is XHTML. This article attempts to document the process I went through to create a basic website using AsciiDoc.

I am using Slackware 13.1 which includes AsciiDoc 8.5.2 in the linuxdoc-tools package. See Dependencies below.

First site

As a first attempt, I created the site using vanilla AsciiDoc, only using a custom stylesheet to modify the appearance. I browsed /etc/asciidoc/xhtml11.conf to see how AsciiDoc was producing the XHTML, in particular the [header] template which is where the CSS and JavaScript <link> elements are created.

Using the standard xhtml11.conf, the AsciiDoc command I used was:

/usr/bin/asciidoc --backend=xhtml11 \
    -a max-width=1024px \
    -a linkcss \
    -a stylesdir=style \
    -a stylesheet=custom.css \
    -a disable-javascript \
    -a badges \
    -a icons \
    --out-file site/index.html index.txt

The max-width attribute restricts the width of the <body> element:

<body style="max-width: 1024px;">

The linkcss attribute means the HTML is created with <link> elements to the xhtml11.css and xhtml11-quirks.css files, rather than embedding the CSS in the HTML. This allows me to customise the CSS by copying the default stylesheets from /etc/asciidoc/stylesheets/. The stylesdir attribute means the CSS links will be created in the style/ sub-directory. I use the stylesheet attribute to add a custom stylesheet, instead of adding everything to my copy of xhtml11.css.

The disable-javascript attribute excludes the JavaScript <link> elements that I don’t need.

The badges and icons attributes enable the icons in the footer.

I don’t want to mix my source files (*.txt) with my output files (*.html) so I use --out-file site/… to place the generated HTML in the site/ sub-directory which also contains the style/ sub-directory for CSS, the images/ sub-directory for images, and any other non-generated linked resources.

For a single-purpose website, the standard AsciiDoc output would be quite adequate.

Custom site

I wanted a number of enhancements to the pages, such as a standard banner with links to the site home page, and a custom footer. I created a xhtml11.conf file in the conf/ directory and added [header] and [footer] sections, originally copied from the default templates in /etc/asciidoc/xhtml11.conf.

Mostly I just cut out stuff I didn’t need (relating to JavaScript, manpage output, etc). Instead of embedding my header navigation links in the [header] template, I put them in a separate _banner.html HTML fragment which I include in the header using the include::_banner.html[] macro.

In the banner, I want to set the class of a link if I am on the page being linked. To do this I used the regex conditional attribute:

<a class="{infile@.*index.txt:currentpage:otherpage}" ...>Home</a>

which says: if the file being processed ({infile}) ends with index.txt then set the class to "currentpage" else to "otherpage". If do this with every link in the banner, changing the regex as appropriate. I can them put some CSS in my custom.css stylesheet to indicate when I am on the current page.

To further separate input, output and configuration stuff, I structured the project directory as follows:

Project directory

website/
  |
  - conf/           (xhtml11.conf, etc)
  |
  - pages/          (index.txt, about.txt, etc)
  |   |
  |   - articles/   (other *.txt documents)
  |
  - site/           (site resources and AsciiDoc output)

My complete Bash script for creating the website is:

build.sh

#!/bin/sh

ASCIIDOC="/usr/bin/asciidoc --backend=xhtml11 -f conf/xhtml11.conf \
    -a linkcss \
    -a stylesdir=style \
    -a stylesheet=custom.css \
    -a max-width=1024px"

function process_asciidoc {
    for INPUT in  $@ ; do
        # If a file ending in ".txt" then process with AsciiDoc.
        if [ -f $INPUT ] && [ `echo $INPUT | grep -c ".txt$"` == 1 ] ; then
            echo "Processing $INPUT"
            OUTPUT=`basename $INPUT .txt`
            $ASCIIDOC --out-file site/$OUTPUT.html $INPUT;

        # Else if a directory, process its contents.
        elif [ -d $INPUT ] ; then
            echo "Processing directory $INPUT"
            process_asciidoc `ls $INPUT/*`
        fi
    done
}

process_asciidoc $@

Because I have customised the header and footer, I don’t need the badges, icons or disable-javascript attributes anymore.

I invoke as build.sh pages/* to rebuild the entire site (the script will only process *.txt files and will descend into directories) or build.sh pages/file.txt to just update one file. While the inputs can be separated into directories, the output is flattened so that all generated HTML just goes in the site/ directory.

RSS

I have a vague intention to periodically write some articles. I want to produce an RSS feed to publish these articles as written. Rather than manually editing the RSS resource, I have automated the process using AsciiDoc, XSLT and, of course, Bash.

Article info from AsciiDoc source

I am using RSS 2.0 so for each article, I need to produce an <item> entry in the form:

    <item>
        <title>...</title>
        <link>...</link>
        <guid>...</guid>
        <pubDate>...</pubDate>
        <description>...</description>
    </item>

I can obtain some of this information from the article’s AsciiDoc source: the <link> and <guid> values depend on the article file name so need to be obtained elsewhere. To get the <description> value, I mark an early paragraph (probably the first) as the "summary" by setting the paragraph’s block id. If there is no "summary" paragraph, there will be no <description> element.

Summary paragraph

[[summary]]
Summary paragraph text goes here.
...

RSS 2.0 expects the <pubDate> format to be RFC 822 compliant. I have been using the AsciiDoc revdate document attribute to show the article date in the HTML: I use the format dd-MMM-yyyy which isn’t RFC 822 compatible. The revdate attribute is the standard date value used by AsciiDoc. DocBook also supports the <pubdate> attribute: to set non-standard AsciiDoc document attributes, you have to use explicit "Attribute Entries" so the heading section of the AsciiDoc article becomes:

Article header with explicit attributes

Article Title
=============
:author: Geoff Lewis
:revdate: 20-Dec-2010
:pubdate: Mon, 20 Dec 2010 12:00:00 +1100

So now I have the publication date in RFC 822 format as part of the article. This pubdate document attribute isn’t used by the AsciiDoc XHTML backend so it is not going to upset my HTML output.

Making the article’s RSS <item>

With the "summary"-annotated paragraph and pubdate document attribute, I now have enough information in the article to build its RSS <item>. I don’t want to parse the AsciiDoc source myself. Instead I convert the AsciiDoc to DocBook XML which I feed to an XSL transformer equipped with a stylesheet to form the <item> element.

Firstly, note the default AsciiDoc DocBook backend does not handle our pubdate attribute so I need to use a custom docbook.conf where I override the [docinfo] template. I copy the default [docinfo] template from /etc/asciidoc/docbook.conf to my own conf/docbook.conf and append the following:

{pubdate#}<pubdate>{pubdate}</pubdate>

which says that if the pubdate attribute is set in the source article, then add a <pubdate> element to the document’s <articleinfo>.

So the command to generate my DocBook output of the source file "article.txt" on stdout is:

asciidoc -f conf/docbook.conf --backend=docbook -o - article.txt

I then pipe this DocBook XML to xsltproc which is equipped with the following stylesheet:

docbook2rss-item.xsl

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!-- Create XML but don't output the <?xml ...?> declaration. -->
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

<!-- We need to pass in the values to make the <link>/<guid> elements. -->
<xsl:param name="url">http://gslsrc.net/</xsl:param>
<xsl:param name="file">unknown.txt</xsl:param>

<!-- We are only interested in the <articleinfo> element. -->
<xsl:template match="article">
    <xsl:apply-templates select="articleinfo"/>
</xsl:template>

<xsl:template match="articleinfo">
    <item>
        <title><xsl:value-of select="title"/></title>
        <link><xsl:value-of select="$url"/><xsl:value-of select="$file"/></link>
        <guid><xsl:value-of select="$url"/><xsl:value-of select="$file"/></guid>
        <pubDate><xsl:value-of select="pubdate"/></pubDate>
        <xsl:apply-templates select="/article/simpara[@id='summary']"/>
    </item>
</xsl:template>

<!-- If there is no "summary" paragraph, there won't be a <description>. -->
<xsl:template match="simpara[@id='summary']">
    <description>
        <xsl:apply-templates/>
    </description>
</xsl:template>

</xsl:stylesheet>

And the command to process the DocBook XML is:

xsltproc --stringparam url http://gslsrc.net/ --stringparam file article.html \
    conf/docbook2rss-item.xsl -

The url and file parameters are passed to the stylesheet so that it can form the <link> and <guid> elements.

Generating index_rss.xml

All that remains is to put it all together in a Bash script that wraps the <item> elements in the rest of the <rss> XML. I only want my feed to include my 10 most recent articles (if I ever write that many). My article naming scheme uses the prefix lXXX_ where XXX is a zero-padded number. (I could conceivably write 100 articles, given the time: I doubt I shall ever write more than 999.) To find the articles to process, I use:

INPUTS=`find pages/articles --regextype posix-egrep \
    -iregex "pages/articles/l[0-9]{3}.*\.txt" \
    | sort -nr \
    | head -n 10`

The sort -nr will sort the file names numerically in descending order. The head -n 10 truncates the list to 10 items.

I have a Bash script to create site/index_rss.xml (a bit too long to include here) using the AsciiDoc and XSLT steps described above. The remaining step is to attach the RSS feed to the index page using an HTML <link type="application/rss+xml">. As it belongs in the <head> section, the <link> has to be included from the [header] template in conf/xhtml11.conf. As I only want to include it on the index page, I use:

{infile$.*index.txt:<link rel="alternate" type="application/rss+xml" title="RSS" href="index_rss.xml">}

which says that if the file being processed (infile) ends with "index.txt" then add the <link>, otherwise leave it out.

Dependencies

I had to install GNU source-highlight in order for the AsciiDoc source filter to work. It seems support for the Pygments highlighter was only added in AsciiDoc 8.6.0 so I need source-highlight if I wanted to use the filter.

There was no SlackBuild for source-highlight so I modified another GNU SlackBuild (gsl) to make my own. It’s not submission-quality for SlackBuilds.org, but it works. Download here.

GNU source-highlight doesn’t come with a Vim highlight config so I had a go at writing my own, which you can get here. It covers maybe 0.1% of the Vim syntax but 90% of the stuff I use. In particular, it’s handling of Vim comments is likely to be dodgy, given that a Vim comment starts with a double-quote so that you can have lines like:

let s:varname = "text string"   " trailing comment

You can run the source highlighter on Vim files by hand using:

source-highlight --lang-def vim.lang --src-lang vim -i test.vim -o test_vim.html

However, to get it to work with AsciiDoc, I had to copy my vim.lang to the source-highlight datadir (in my case, /usr/share/source-highlight) and edit /usr/share/source-highlight/lang.map to map the vim language to the vim.lang language definition file.

/usr/share/source-highlight/lang.map

...
vim = vim.lang

If I didn’t want to mess with the system-wide settings, I could make a local copy of the datadir, say ~/.source-highlight/data, and apply my changes there. As far as I have tried, it’s all or nothing: you can’t put only your own customisations in the local datadir and have source-highlight fallback on the system datadir for other resources.