Customizations


Adding paragraph styles

When converting existing documents to rtftohtml you often get a lot of warning message telling you that some paragraph styles are unknown. Now you can either

To add a new paragraph style, simply go to the .PMatch table contained in the file html-trans and add an entry to the end. Put the name of the paragraph style (quoted), the nesting level (usually zero) and the name of the .PTag entry that should be used.

html-trans File Format

The file html-trans is needed by rtftohtml to map character and paragraph styles contained in the RTF-file to corresponding HTML-tags. It must be readable either from rtftohtml's library directory (as set in the file makefile.rtftoweb) or from the directory contained in the environment variable RTFLIBDIR.

In html-trans there are four tables. They are labelled .PTag, .TTag, .TMatch and .PMatch. These tables begin with the name (in column one) and continue until the next table starts. All blank lines and lines beginning with a '#' are discarded. '#' lines are typically used for comments. The tables themselves are composed of records containing a fixed number of fields which are separated by commas. The fields are either strings (which should be quoted) integers or bitmasks.

.PTag Table

Each entry in the .PTag table describes an HTML paragraph markup. The format is:

.PTag

#"name","starttag","endtag","col2mark","tabmark","parmark",allowtext,cannest,DeleteCol1,fold,TocStyl

name
A unique name for this entry. These names are referenced in the .PMatch table.
starttag
This string will be output once at the beginning of any text for this markup.
endtag
This string will be output once at the end of any text for this markup.
col2mark
This string will be output in place of the first tab in every paragraph (used for lists)
parmark
This string will be output in place of each paragraph mark. (usually <br> or <p>)
allowtext
If 0, no text markup will be allowed within this markup. (for example <pre> or <h1> don't format well if they contain additional markup.
cannest
If 1, other paragraph markup will be allowed to nest within this markup. (used for nesting lists)
DeleteCol1
If 1, all text up to the first tab in a paragraph will be deleted. (used to strip out bullets that when going to unordered lists (<ul>).
fold
If 1, the filter will add newlines to the HTML to keep the number of characters in a line to less than 80. For <pre> or <listing> elements, this should be set to 0.
TocStyl
The TOC level. If greater than 0, the filter will create a Table of contents entry for every paragraph using this markup.

Sample .PTag Entries

"h1","<h1>\n","</h1>\n","\t","\t","<br>\n",0,0,0,1
This is a level 1 heading. The "\n" in the start and end-tag fields forcesa newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<br>" followed by a newline (just for looks). Text markup (like <b>) is not allowed within <h1> text, because we leave that up to the HTML client. No nesting is allowed - (see the discussion on nested styles). No text is deleted. Every paragraph using this markup will also generate a level-1 table of contents entry.

"Normal","","\n","\t","\t","<p>\n",1,0,0,0
This is the default for normal text. Regular text in HTML has no required start and end-tags. The "\n" in the end-tag field forces a newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<p>" followed by a newline (just for looks). Text markup (like <b>) is allowed within Normal text. No nesting is allowed - (see the discussion on nested styles). No text is deleted.

"ul","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,0,0
This is the entry for unordered lists. This generates a "<ul>\n<li>" at the start of the list and "</ul>/n" at the end. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<li>" preceded by a newline (just for looks). Text markup (like <b>) is allowed, and this entry may be nested - and it allows others to be nested within it. This allows nested lists. No text is deleted.

"ul-d","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,1,0
This entry is identical to the previous except that the DeleteCol1 field is set to 1. This is used to remove bullets (which really appear in the RTF) because we don't want to see them in the HTML.

.TTag Table

Each entry in the .TTag table describes an HTML text markup. The format is:

.TTag

"name","starttag","endtag"

name
A unique name for this entry. These names are referenced in the .PMatch table.
starttag
This string will be output once at the beginning of any text for this markup.
endtag
This string will be output once at the end of any text for this markup.
Note that unlike the .PTag table, no text markup should appear more than once. (Of course there is no good reason that it should appear.) If you have two entries with <b></b> start and end tags, it would be possible to get HTML of the form <b><b> text</b></b>. I don't know if this is invalid markup, but it sure is ugly.

.TMatch Table

Each entry in the .TMatch table describes processing for text styles. The format is:

.TMatch

"Font",FontSize,Match,Mask,"TextStyleName"
Font
The name of a Font, or "" if all fonts match this entry.
FontSize
The point-size of the font, or 0 if all point sizes match this entry.,
Match
A bit-mask, where each bit represents a text attribute. These bits are compared to the attributes of the style being output. They must match for this entry to be matched. One in a bit position means that the text style is set, a zero is not set.
Mask
A bit-mask, where each bit represents a text attribute. In comparing the style of the text being processed, to the Match bit-mask, this field is used to select the bits that matter. If a zero appears in a bit-position, then that style attribute is ignored (for the purpose of matching this entry.) Only 1 bits are used in the above comparision.
TextStyleName
This is either the name of an entry in the .TTag table indicating the HTML markup to use, or it is one of "_Discard", "_Name", "_HRef", "_Hot", or "_Literal".
The order of bits in the Match and Mask bit-maps are:
#    v^bDWUHACSOTIB - Bold
#    v^bDWUHACSOTI - Italic
#    v^bDWUHACSOT - StrikeThrough
#    v^bDWUHACSO - Outline
#    v^bDWUHACS - Shadow
#    v^bDWUHAC - SmallCaps
#    v^bDWUHA - AllCaps
#    v^bDWUH - Hidden
#    v^bDWU - Underline
#    v^bDW - Word Underline
#    v^bD - Dotted Underline
#    v^b - Double Underline
#    v^ - SuperScript
#    v - SubScript

Sample .TMatch Entries

# double-underline/not hidden -> hot text
# double-underline/hidden -> href
#    v^bDWUHACSOTIB,v^bDWUHACSOTIB
"",0,00100000000000,00100010000000,"_Hot"
"",0,00100010000000,00100010000000,"_HRef"
The first entry will match any text formatted with double underline EXCEPT if it is hidden text. This is accomplished by using those two bits to compare (the MASK field) and having a 1 in the double underline bit and a zero for the hidden text bit. The second entry will match any text formatted with BOTH double underline and hidden text. Any text that matches the first will be treated as the hot text of a link. Any text that matches the second will be taken as the href itself. (The filter requires that the HRef text immediately precede the Hot text.)

# Regular matches - You can have multiple of these active
# monospace fonts -> tt
"Courier",0,00000000000000,00000000000000,"tt"
This will match any text that uses the Courier font and mark it using the HTML text markup appearing in the .TTag table with the entry name "tt".

# bold -> bold
#    v^bDWUIACSOTIB,v^bDWUIACSOTIB
"",0,00000000000001,00000000000001,"b"
This will match any text that has bold attributes and will mark it using the HTML text markup appearing in the .TTag table with the entry name "b". Note that bold text using the Courier font would match both this entry and the previous. This will yeild markup of the form <b><tt>hi</tt><b>. Note that "b" is the name of an entry in the .TTag table, not the HTML markup that is used!

.PMatch Table

Each entry in the .PMatch correlates a paragraph style name to some entry in the .PTag table. The format is:

.PMatch

"Paragraph Style",nesting_level,"PTagName"

Paragraph Style
The paragraph style name that appears in the RTF input.
nesting_level
The nesting level. This should be zero except for nested list entries.
PTagName
The name of the .PTag entry that should be used for paragraphs with this paragraph style.

Sample .PMatch Entries

"heading 1",0,"h1"
This is a level 1 heading. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "h1".

"numbered list",0,"ol-d"
This is used for numbered lists. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "ol-d".

"numbered list 2",2,"ol-d"
This is an entry for a nested paragraph style. The nesting level of two is used to indicate that this paragraph should appear in the HTML nested within two levels of paragraph markups. The paragraph marked with this style may only appear after a paragraph style that has a nesting level of 1 or greater.

Navigation panels and Netscape support

If you want the navigation panels produced by rtftohtml (see section Headings) to look more spiffy, e.g. with images as panel buttons, or if you want the generated HTML documents to use images as their background or another text color, this section is for you.

By using the -N Command line option when invoking rtftohtml, it is possible to tell rtftohtml exactly how you want the created navigation panels to look like. The same configuration file can be used to add a few funny Netscapisms to the generated documents. If no -N-option was given, but rtftohtml finds a file named nav-panel in its library directory or the directory contained in the environment variable RTFLIBDIR it will use this file as the layout customization file. This way you can avoid having to add the -N command line options whenever you use rtftohtml.

An example for such a customization file is the file nav-panel, which has also been used when this guide was converted to HTML. By looking at this file you should easily see how the layout of your documents can be adjusted tou your taste.

Each line of such a customization file contains the definition of a layout element, as long as the first character is not the hash-character (#), which introduces comments. Everything that follows the first colon (:) in each line will be literally inserted into the HTML-files when needed.

The following elements may be configured:

previous
What to insert into the navigation panel when the "previous" element is to be created.
next
The same for the "next" element.
up
The same for the "up" element.
title
The same for the "title" element.
contents
The same for the "contents" element.
index
The same for the "index" element.
delimiter
What to use as the delimiter between the elements of navigation panels.
hr
What HTML-code to use when it's time to insert a horizontal line beneath or above navigation panels.
bgimage
Specifies an optional background (GIF-) image that should be used as the document background (requires Netsape).
bgcolor
Specifies an optional background color that should be used in the document background (requires Netsape). Syntax: #rrggbb (hexadecimal values for red, green, blue).
textcolor
The color to use for normal text. Same synax as for bgcolor.