Intro to DocBook Publishing

If you're considering using the DocBook format as an publishing solution, this article will try to present an overview of the concepts, strengths and weaknesses of DocBook Publishing.

Applications

DocBook was designed for technical publications, particularly for computer software, but is ideal for any application where structured documents must be displayed in HTML and/or other formats. What is a structured document? A simple example of a structured document is a book. A book has a more-or-less standard structure, for example: cover page, table of contents, one or more chapters, and an index. Magazine articles, webpages, and many other documents we see every day have a standard structure. When you use DocBook format for authoring documents you work within a mental model of the structure of the document. Putting your documents into a formal structure requires some additional work and some learning of new concepts, but it allows you to easily reformat the document while preserving the meaning of the information within it. (See the Concepts section for more detail.)

Requirements

  • Staff with an approriate background to use or learn to use the tools

  • Free or Commercial DocBook tools installed on your computer systems

[Note]Note

This section needs more detail.

Concepts

Example 1. Section Structure

Using a word processor you could create a portion of a document with the following process:

  1. Enter text for he section title.

  2. Use the enter/return key to end the current paragraph and start a new one.

  3. Enter the text for the first paragraph of the section.

  4. Enter the text for the second paragraph of the section

  5. Select the text from the section title and format it in bold using a larger font.

The result would be something that looks like this:

Title Text

A body paragraph.

Another body paragraph.

Using DocBook XML markup you would create a (portion of a) document that looks like this:

<section>
  <title>Title Text</title>
  <para>A body paragraph.</para>
  <para>Another body paragraph.</para>
 </section>
 

The problem with the word processor approach is that when you save the document file there is no information saying that the heading you just formatted as bold refers to the following two paragraphs. What if there is another paragraph after that? Is it in the section or is it something else altogether? After you save the file (and forget your original intent,) you can't tell that Title Text is a section title for those two paragraphs or if it is a bridgehead. (A bridgehead is a heading typically used in fiction or journalistic works that doesn't necessarily apply to all following paragraphs.) This makes it hard for document publishing software to convert the file into alternate formats.

For example when converting a document to a collection of web pages you may want everything in one section to be on a single webpage (because they are logically related), but you may not want to break pages at a bridgehead. You may want sections to be in the table of contents, but bridgeheads not to be. If all the word processor saves is which font and whether it is bolded this information about the structure of your document is lost. If you've used MS-Word stylesheets, you have an understanding of the issues here. If you are really consistent in your use and naming of MS-Word stylesheets and you use names like "section", "abstract", "epigraph", etc. you have created the equivalent of a structured document using MS-Word.

The DocBook format forces you to think about your document in this way. (Just like your English teacher once made you outline your essays.) In addition to learning tools and tags, you must learn to think of your documents this way. If you are converting existing documents you need to be able to create this type of structure from the original document. Technical writers and lawyers usually create documents this way, no matter what tools they are using. If you aren't comfortable with this approach or your documents are too free-form, then DocBook is probably not for you.

The payoff for this extra thought during the authoring process is fivefold:

  1. Authors create better organized content (just like your old English teacher)

  2. Authors don't have to worry about formatting while authoring the document (or ever if someone else created the stylesheets!)

  3. Document publishers to change formatting thoughout the entire document long after the content is authored.

  4. Documents can be published into a wide variety of formats: PDF, HTML, RTF, MS-Word, WAP (for cell phones), Braille, etc.

  5. Documents take up less space in computer storage systems.

DocBook Software

There are four major Software components (categories) in the DocBook publishing process:

  1. DocBook DTD - The DTD is an open standard is freely available. It defines the elements used to structure a document.

  2. DocBook XSL StyleSheets - These stylesheets are used to convert form DocBook/XML to PDF, HTML, etc. and are standard and freely available.

  3. Editing/Authoring Software - you can use any text editor, but there are specialized editors that give you WYSIWG views of your content and/or have XML-specific editing capabilities.

  4. Publishing Software - not required to author documents, but are need to "run" the XSL stylesheets, etc. to produce the required output formats.

[Note]Note

This section needs specific examples, etc.

Strengths & Weaknesses

[Note]Note

This section is "To Be Done".


Creative Commons License
This work is licensed under a Creative Commons License.