Imagine I want to turn a XML Schema into an HTML document. Schemas embed documentation using the <xsd:documentation> element within the <xsd:annotation> element. The documentation is usually plain text with blank lines to separate paragraphs. If the plain text is output to HTML, the paragraphs will be lost. This article presents a couple of XSL transforms that allow the plain text paragraphs to be retained in HTML output.

<xsd:annotation>
    <xsd:documentation>
        text goes here

        another paragraph of text

        a final paragraph
    </xsd:documentation>
</xsd:annotation>

If the documentation text contains XML entities, it is wrapped in a <![CDATA[…]]> block.

An XSLT template matching the <xsd:documentation> element (or the built-in rule that matches text) will output the text as-is into HTML so that the plain text paragraphs are lost. To retain the formatting of the plain text, we have two options:

  1. Appending an <br/> tag at the end of each line, or

  2. Wrapping each plain text paragraph in <p> tags.

Both of these methods involve processing the block of text recursively line-by-line. The templates described below are not specific to XML Schemas: they can be used to format any text blocks as HTML.

Note

The templates have been tested using xsltproc (libxml 20706, libxslt 10126 and libexslt 815)

The templates have been updated courtesy of Sjef Bosman for the case where the text content doesn’t end with a new line.

Appending line breaks

Maintaining the plain text linebreaks in the HTML output provides some consistency of formatting. In particular, if the plain text contains code fragments, etc. where line breaks are significant, then maintaining line breaks is the only way to retain the original presentation.

To append a <br> tag to each line:

<xsl:template match="xsd:documentation">
    <xsl:call-template name="print-lines"/>
</xsl:template>

<xsl:template name="print-lines">
    <!-- If we are not passed text as a param, use the node's text. -->
    <xsl:param name="text" select="text()"/>

    <!-- If there is no (more) text, we are finished. -->
    <xsl:if test="string-length(normalize-space($text)) > 0">
        <xsl:choose>
            <!-- If the text contains a newline... -->
            <xsl:when test="contains($text, '&#10;')">
                <!-- Split text into the first line and the remainder.  We search
                for the newline char using the '&#10;' entity instead of '\n'. -->
                <xsl:variable name="line" select="substring-before($text, '&#10;')"/>
                <xsl:variable name="remainder" select="substring-after($text, '&#10;')"/>

                <!-- Output the line, a HTML <br/> tag and a newline. -->
                <xsl:value-of select="normalize-space($line)"/>
                <xsl:element name="br"/>
                <xsl:text>&#10;</xsl:text>

                <!-- Recurse using the remaining text. -->
                <xsl:call-template name="print-lines">
                    <xsl:with-param name="text" select="$remainder"/>
                </xsl:call-template>
            </xsl:when>
            <!-- Otherwise no more newlines, output the remaining text. -->
            <xsl:otherwise>
                <xsl:value-of select="normalize-space($text)"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:if>
</xsl:template>

Creating paragraphs

To wrap plain text paragraphs (text separated by blank lines) in HTML <p> tags we need to assemble paragraph blocks before output. We use two named templates: print-paras to processed the text block and assemble the paragraphs, and output-para to print each assembled paragraph.

<xsl:template match="xsd:documentation">
    <xsl:call-template name="print-paras"/>
</xsl:template>

<xsl:template name="print-paras">
    <!-- If we are not passed text as a param, use the node's text. -->
    <xsl:param name="text" select="text()"/>
    <!-- If we are not passed a paragraph block, start a new paragraph. -->
    <xsl:param name="para"></xsl:param>

    <xsl:choose>
        <!-- If there is more text to format... -->
        <xsl:when test="string-length(normalize-space($text)) > 0">
            <!-- Split text into the first line and the remainder.  We search
            for the newline char using the '&#10;' entity instead of '\n'.
            Note that if there is no newline in the remaining text, all the
            text becomes the 'line' and the remainder is empty.-->

            <!-- Get the line (all text if no newline). -->
            <xsl:variable name="line">
                <xsl:choose>
                    <xsl:when test="contains($text, '&#10;')">
                        <xsl:value-of select="substring-before($text, '&#10;')"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$text"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:variable>

            <!-- Get the remainder (empty if no newline). -->
            <xsl:variable name="remainder">
                <xsl:choose>
                    <xsl:when test="contains($text, '&#10;')">
                        <xsl:value-of select="substring-after($text, '&#10;')"/>
                    </xsl:when>
                    <xsl:otherwise></xsl:otherwise>
                </xsl:choose>
            </xsl:variable>


            <xsl:choose>
                <!-- If the line is not blank, append the line to the
                paragraph and continue processing the remainder. -->
                <xsl:when test="string-length(normalize-space($line)) > 0">
                    <xsl:call-template name="print-paras">
                        <xsl:with-param name="text" select="$remainder"/>
                        <xsl:with-param name="para" select="normalize-space(concat($para, ' ', $line))"/>
                    </xsl:call-template>
                </xsl:when>

                <!-- The line is blank.  Print what we have so far in a
                paragraph and continue processing the remainder with a new
                empty paragraph. -->
                <xsl:otherwise>
                    <xsl:call-template name="output-para">
                        <xsl:with-param name="text" select="$para"/>
                    </xsl:call-template>
                    <xsl:call-template name="print-paras">
                        <xsl:with-param name="text" select="$remainder"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:when>

        <!-- No more text: print what we have so far, and finish. -->
        <xsl:otherwise>
            <xsl:call-template name="output-para">
                <xsl:with-param name="text" select="$para"/>
            </xsl:call-template>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<!-- Prints text in an HTML <p> block if not empty. -->
<xsl:template name="output-para">
    <xsl:param name="text"></xsl:param>

    <xsl:if test="string-length(normalize-space($text)) > 0">
        <p><xsl:value-of select="$text"/></p>
    </xsl:if>
</xsl:template>