Longtime users of SCHEMA ST4 DocuManager may remember the function “Word
import” introduced in 2005. Then it was tried to change the structure and markups of
any given Word document with the help of a VBA macro (Visual Basic for
Applications) in such a way that a marked up version according to DocuManager
requirements was created. Technically in order to do so every character of the word
document was looked at and – on the basis of the implemented algorithm – decided,
which markup the character, word or paragraph should receive. The only thing nice
about this implementation: Because of the inefficiency of the editing of word
documents via VBA and the fact that computers then performed considerably slower
it was possible to watch the algorithms do their job. As if by magic the original
document slowly transformed into a DocuManager document which could then be
imported. Admittedly the arc of suspense notably flattened after having watched the
transformation of some pages.
Then we accepted rather quickly that the implementation of the function “Word
import” required complete revision with regard to efficiency, customizability and ways
of expansion concerning the import of other documents. The result was available for
the first time in the DocuManager 2.0.1 in 2007. In addition to the implementation the
name of the function was changed as well. Since then it has been called “Document
Import”. With the version ST4 DocuManager 2012 Adobe FrameMaker documents
can be imported for the first time via the same function as the word documents
before. It also goes for unstructured FM files.
The Document Import is technically based on XML structures: A very simply
structured XML file is expected as input. The interactive mapping dialogue in the
DocuManager loads this file and determines the used paragraph and character
formats (from the elements used). Doing so context information is evaluated as well
(e.g. paragraph in table, listing). These paragraph and character formats can then be
assigned by the user to the formats configured in the DocuManager. The Document
Import finally transforms the input XML file, under consideration of the mapping
information, to an XML representation conforming to DocuManager.
In order to create this XML input file the DocuManager plug-in in Adobe FrameMaker
initially uses the integrated FrameMaker function “Save As XML”. Other than
intended in the standard function, however, the conversion table of FrameMaker for
assigning the paragraph and character formats to the expected XML elements is not
used. On the one hand this is due to the fact that applying the conversion table
requires special know-how in FrameMaker and on the other hand that the
completeness of the conversion table would have to be newly checked for each
Further disadvantages of the standard function “Save As XML”:
- The tables lose some of their properties. For example the information on table
and column width is lost.
- All graphics are converted to a format defined by FrameMaker of – most of the
time – worse quality than the original (per default GIF). This could only be
changed if a “Structured Application” were especially created for this purpose
and integrated into the XML export process.
The DocuManager plug-in avoids these problems by using the following measures.
- The table properties are collected before the XML export by the ST4
FrameMaker import plug-in and stored in such a way that they are preserved
during the XML import.
- The information on image size and path to the original graphic are stored in
such a way during XML export that they are preserved during the XML import.
Where possible the graphic reference is reset to the path to the original file
and the unintentionally converted graphic file is ignored. Additionally the
information on image size is preserved.
The mapping of the paragraph and character formats is carried out as is usual in ST4
via the mapping dialogue of the document import. Should the source format change
most of the time only minor adaptations to the mapping are necessary. Mappings
once saved can be reused again and again and be changed via drag-and-drop.
Unfortunately all this is happening so fast that even with larger documents the
algorithm can no longer be watched doing its job.