Document Adapters
One of the core elements of GetPublished is the Document Adapter, or DocAdapter. DocAdapter is a set of web services and .NET assemblies for converting rich documents into standard HTML articles that can be displayed on the CodeGear Developer Network sites. By using DocAdapter, we can accept articles in multiple formats, and ensure we get valid HTML that works with the site’s overall look and feel.
DocAdapter is accessible in two ways: as a set of web services and as a set of .NET assemblies/Delphi packages. The web services and assemblies reference each other (the web service calls the assemblies to perform its tasks, and the assemblies can reference the web service to perform their tasks remotely). Because the web service and the assemblies know each other, we don’t have to worry about type matching (making sure the types used by the web service match those used by the assemblies).
Client applications can use either technology, or both. GetPublished, for example, uses the DocAdapter web service to perform the conversion from the source format to XHTML, and the CDN.Documents assembly to generate HTML, thumbnails, and other document elements.
Format Conversion
The CDN.DocumentConverters assembly (and its web service wrapper, cleverly titled DocAdapterService) contains conversion classes capable of reading files in several formats. The classes convert document text to XHTML, which is a convenient format for additional processing. Depending on the format, they can also extract additional data. For example, the Word conversion class extracts embedded images from Word documents, stores them as separate files, and creates <img> elements in the XHTML that refer to these files. A conversion class is any implementation of the IDocumentImport interface:
IDocumentImport =interfaceprocedureImport(inputStream: Stream; docAdapter: DocumentAdapter; extractFields: Boolean);end;
Because we’re using an interface, the DocumentAdapter class doesn’t need to know anything about the conversion class other than the fact it implements the interface. This means we can implement converters without recompiling the CDN.Documents assembly, and add them as plug-ins to the calling application.
The WebServiceConversion class is a special implementation of the IDocumentImport interface that calls DocAdapter web services. All DocAdapter services are based on the same definition, expressed in WSDL. All a client application needs in order to convert a document to XHTML to to pass the URL of such a web service to the WebServiceConversion class. GetPublished stores the URLs of the DocAdapter services in the database, so new formats can be supported by simply deploying a web service and adding a single record to GetPublished’s database.
Document Processing
Once the text and images are extracted, DocAdapter can create the HTML and necessary supporting files that can be displayed on a web site. This is done by calling a single method, CreateDocumentArchive, which returns a DocumentArchive object. The DocumentArchive object contains all the necessary information, such as the final HTML (including syntax highlighting), all referenced images, thumbnails, a table of contents, keywords, and other information that may be useful. The generation of these elements is controlled by parameters passed to the CreateDocumentArchive method. Some of these parameters are:
- Maximum image width. Images wider than this value will be replaced by thumbnails that link to the full-size image.
- Maximum image height. Images taller than this value will be replaced by thumbnails that link to the full-size image.
- The width of thumbnail images generated for images wider than the maximum specified width or taller than the maximum specified height.
- Whether to keep embedded images in the final document (external image references remain unmodified).
- The depth of the table of content to generate.
- Whether to produce printer-friendly output, which doesn’t include JavaScript elements for dynamically hiding and showing images and sections.
In GetPublished, most of these parameters are associated with specific content types configurable by system administrators.
Share This | Email this page to a friend
Posted by Yorai Aminov on May 2nd, 2008 under CDN, GetPublished |Server Response from: BLOGS1


RSS Feed
Leave a Comment