What is rtftohtml?
is a tool to turn your, say, Word documents into documents which may be read
from within the World Wide Web. The format of these documents is called
HyperText Markup Language (HTML).
rtftohtml is able to automatically convert documents stored in RTF (Rich
to HTML. Most word processors in use on UNIX, Macintosh, PC or NeXT systems can
export their documents in RTF format (hint: have a look at the "Save as..."
dialog box of your favorite word processor).
The author of rtftohtml is Chris Hector. Have a look at his
pages at Cray.
In processing text, rtftohtml chooses HTML markup based on three
characteristics. These are
- The destination
of the text. Example destinations are header, footer, footnote, picture.
- The paragraph style.
Paragraph styles are user-definable entities, but some are pre-defined by the
word processing package. For Microsoft Word (on the Macintosh) examples are
"Normal" and "heading 1" or ("Überschrift 1" when using a german version).
- The text attributes.
Examples of text styles are bold, courier, 12 point.
The filter has built-in rules for dealing with destinations. For paragraph and
text styles, the rules for translation are contained in a file called html-trans.
By modifying this file, you can train rtftohtml to perform the correct
translations for your documents. The most common change that you will need to
make is to add your own paragraph styles to html-trans.
rtftohtml should produce reasonable HTML output for most documents. Here is
what you can expect:
- Your output should appear in a file called "xx.html" where "xx" or
"xx.rtf" was your input file name.
- Bold, italic and underlined text should appear with <b>,<i>
and <u> markup
- Courier font text should appear with <tt> markup
- Tables will be formatted using <pre> markup (only plain text is
supported in tables.)
- Footnotes will appear in a separate document with hypertext links to them.
- Table of contents, indexes, headers and footers are discarded.
- Table of Contents entries and paragraphs with the style "heading 1..6"
will generate a hypertext Table of Contents in a separate file. Each table of
contents entry will link to the correct location in the main document.
- All paragraph styles used in your document should appear in the file
"html-trans". This allows you to create a mapping from any paragraph style to
any HTML markup. There are many pre-defined styles in html-trans, including
"heading 1..6". (If a paragraph style is not found, a warning will be generated
and the text will be written to the HTML file with no special markup.)
- Each graphic in your file will be written out to a separate file. The
filename will be "xxn.ext" where "xx" or "xx.rtf" was your input, "n" is a
unique number and "ext" will be either "pict" for Macintosh PICT format
graphics or "wmf" for Windows Meta-Files format graphics. The HTML file will
create links to these files, using either "<A HREF=" or "<IMG SRC="
links. SINCE most WWW browsers do not understand "wmf" or "pict" format
files, the link will be to xxn.gif. This presumes that you will run some
other filter to translate your graphic files to gif.
- Text that is connected with copy/paste-link constructs, or tagged with
some special text attributes will generate hypertext links.