Documenting large XML Schemas and WSDL with FlexDoc/XML
Overview
FlexDoc/XML is a tool for development of high-performance and quality documentation/report generators from any data stored in
XML files.
Unlike many other tools in these field, the focus is not to generate some short documents (like invoices) or typical business reports, which
are basically a table or histogram representation of some rows and cells coming from an SQL-query.
Rather, the focus of our tool is data mining in multi-file XML data sources for some complex relationships and representing them in a human-readable form
– that is some kind of transformation of XML data into something that humans can read and easily navigate.
The original idea was inspired from XSL Transformations – to extend the XSLT-like
approach to any kind of Java based data-sources supplied by various APIs.
If you think a bit, what all those APIs provide, you may notice that essentially it comes down to a network of some primitive data items involved
in various association or aggregation relations. That everything can be mapped on some XML elements/attributes and you've got a
DOM, which can be operated with various things invented around
XML.
As a result, we have developed a programming framework (now called FlexDoc.XYZ),
in which a particular Java API datasource is represented through a special driver as a virtual XML document – Datasource Model.
The transformations are made by special templates that serve the role similar to XSLT scripts.
However, in addition to the task of collecting data and iterating by them, our templates are particularly focused also on the output
they generate – complete HTML and printable documentation laden with reach formatting, hyperlinks and, possibly, special images/diagrams
generated from the obtained data. That everything is programmed with various template components and their properties.
The templates are developed using a graphic Template Designer, which represents the template components visually in a form resembling the output they generate.
Further, the templates are interpreted by the Template Processor, which takes on input the Datasource Model and produces the result documentation.
Basing on that framework, we developed a tool called FlexDoc/Javadoc
that mimics standard Javadoc.
It takes the data from Doclet API, represents them as a virtual XML document and transforms it into a Java API documentation.
But then we realized that XML itself is also a good field for application of our technology full of various heavy-lifting tasks, for which XSLT seems too lightweight,
for instance, the generation of easily navigable single documentation by lots of XSD and WSDL files.
So, FlexDoc/XML was born complemented with two template sets – XSDDoc and WSDLDoc.
XSDDoc
In the age of Big Data and IoT, the information processed with computers may be characterized not only by big volumes but also by big diversity.
The diversity means complex relationships between particular data items. That translates into complex data structures.
Currently, XML provides the most advanced means to formally describe any complex data structures. That is
W3C XML Schema language (XSD).
XML schemas may be large and need to be documented so as to let people easily navigate them.
There are quite a lot of various XML schema documentation generators. Every XML IDE has one.
But all of them are typically lightweight and suitable for documenting rather small XML schemas (mostly the XSD file open in the editor).
Even if they are able to pull and document together all referenced XSD files, the documentation they produce tends to be a huge gibberish file
not so much more navigable as those raw XSD files themselves.
But when you need to document an XML schema made of thousands of components, you may find yourself in a territory quite desolate of any tools at all.
FlexDoc/XML XSDDoc provides a solution exactly for that case.
It is a template set for DocFlex/XML, which implements a high-performance universal XML schema documentation generator
with the following key features:
-
It takes on input any number of W3C XML Schema (XSD) files and generates by them a single documentation in one of three formats:
-
Multi-file framed (Javadoc-like) HTML
- Single-file HTML
-
Single RTF document
-
Automatically loads and documents all referenced XML schema. Basically, you need only to specify on input your schema driver (the root schema).
Anything else will be pulled in automatically. If some referenced schema files are found at different locations as they are specified in the schemas,
you can redirect to them using XML catalogs.
-
Ability to process and document together hundreds of XSD files with thousands of components. In case of framed HTML,
you get a highly-navigable documentation that will allow you to quickly find anything you need within such an enormous heap of data.
-
Joint documenting of conflicting XML schemas.
Conflicting XML schemas are those that define the same components (that is with the same local name and in the same namespace).
For instance, different versions of some your XML schema project will likely contain conflicting XML schemas.
-
The entire documentation is interconnected with a dense network of cross-reference links.
Hyperlinked is everything that has any logical connections: from the reproduced XML source of each XML schema file to a corresponding component detail
to anything that component depends or dependent on it. In case of RTF, hyperlinks are supplemented with page number references.
-
A unique method of documenting local element components (through extending their names and unifying by type) allows for a single approach
to any XML schema design patterns, which ensures clear-cut documentation in virtually any case.
-
With XHTML tags in XML schema annotations, you can format descriptions (as well as insert images), which will be rendered in both HTML/RTF.
-
Diagrams can be generated for all components with complex content and inserted in the documentation (both HTML/RTF).
In case of HTML, diagrams are supplied with hyperlink maps connecting everything depicted on a diagram to the corresponding documentation detail.
FlexDoc/XML includes own diagramming engine (called DiagramKit)
that generates XSD diagrams similar to those produced by XMLSpy.
Additionally, there are integrations with XMLSpy and
Oxygen XML that allow using any of those XML IDEs as an alternative
dynamically-linked diagramming engine.
-
Possibility for unlimited customizations:
-
The final doc-generator (XSDDoc + template processor) is controlled by hundreds of various parameters and options that let you to compose the result documentation
from huge range of included details. Using these settings, you can prepare several profiles for different types of documentation.
-
In case of HTML, you can also apply your custom CSS stylesheets to change how the documentation looks.
-
At last, you can modify XSDDoc itself, as it is just a template set
-
FlexDoc/XML is a pure Java application.
So, it equally runs anywhere Java SE works, in particular, on Windows, Linux and macOS.
-
Seamless integration with Apache Ant and
Maven.
The included FlexDoc/XML Maven plugin allows running XSDDoc from Maven.
This, in effect, makes it the world's only Maven plugin able to generate XML Schema documentation with diagrams!
WSDLDoc
WSDL is an XML-based language to describe web services.
By itself, it is a rather simple language. But it uses XSD to describe its data types, that is, the structure of pieces of XML, with which the web-service communicates.
So, each WSDL file is also a container one or several XML schemas,
which themselves may import or include a bunch of other XML schema files.
That immediately makes WSDL very difficult to document, because a proper WSDL documentation generator must:
- Document WSDL definitions themselves
- Be a full-blown XSD documentation generator at the same time, able to handle multiple XML schemas at once
-
If it will support documenting together multiple WSDL files, it must also be able to handle conflicting XML schemas,
because different WSDL files may contain schemas targeting the same namespace as well as import the same external schemas that should not be documented
multiple times.
That, of course, is difficult to implement. So, for a rather long time throughout the WSDL existence, there was no any meaningful WSDL documentation generator at all.
Those that appeared later are either targeted to document only some refined web-service features (typically provided by some standard API handling WSDL,
without much details about involved XML schemas) or those lightweight ones built into XML IDEs and able to document just a single WSDL file open in the editor
(the same as they do with XSD – by producing some gibberish HTML that is difficult to navigate).
FlexDoc/XML WSDLDoc is probably the only WSDL Documentation Generator able to do everything listed above – that is to document together
any number of WSDL files along with all contained/referenced XML schemas etc.
It is currently the second large template application for DocFlex/XML, which was derived from XSDDoc.
So, it inherits all features implemented in XSDDoc as well.
For completeness, here is the list of all important features of WSDLDoc:
-
Generation of single documentation by any number of WSDL/XSD files, in one of three forms/formats:
-
Multi-file framed (Javadoc-like) HTML, including navigation bar and index
- Single-file HTML
-
Single RTF document
-
If you have too many input files, you can pick them all using an Ant-like pathname pattern.
-
Processing/documenting of any number of XML schemas (along with WSDL) including:
- In the form of separate XSD files
- XML schemas embedded in WSDL (within
<wsdl:definitons>/<wsdl:types>
element)
-
Processing of any referenced WSDL files and XML schemas, in particular:
-
Correct processing of all
<wsdl:import>
, <xs:import>
,
<xs:include>
, <xs:redefine>
elements found across all involved WSDL/XSD files.
- Automatic loading and processing (i.e. inclusion in the documentation scope) all directly/indirectly referenced WSDL/XSD files.
-
Support of XML catalogs to redirect the referenced WSDL/XSD files when they are found at different locations as specified in the references.
-
Sophisticated documenting of XSD components (XML schema documentation):
- A unique method of documenting local element components via extending their names.
- Support of any XML schema design patterns.
-
Possibility of automatic inclusion of XSD diagrams generated by FlexDoc/XML native diagramming engine, with the support of all diagram hyperlinks.
-
Separate documenting of all WSDL definitions: services, ports, port types, operations, messages, bindings highlighted with special icons.
-
Documenting of all interconnections between WSDL definitions and XSD components:
- Hyperlinks from WSDL messages to the details of XSD elements/types describing the message data.
- In XSD element/type details, the list of all WSDL definitions where they are used.
- Copy the annotations of XSD elements/types to the documentation of those WSDL messages (and even operations) where they are used.
-
Using XHTML tags specified in both
<wsdl:documentation>
and <xs:documentation>
elements
to format corresponding descriptions in the documentation (as well as inserting images) in both HTML/RTF.
-
Possibility for unlimited customizations:
-
Through numerous WSDLDoc parameters and output format options.
-
In case of HTML, you can also apply your custom CSS stylesheets to change the documentation look & feel.
-
At last, you can modify WSDLDoc itself
-
Since FlexDoc/XML is a pure Java application,
you can run WSDLDoc anywhere Java SE works, e.g. on Windows, Linux and macOS.
-
Seamless integration with Apache Ant and
Maven.
The included FlexDoc/XML Maven plugin allows running WSDLDoc from Maven.
What are templates?
In a typical programming parlance, templates are meant for some kind of light-weight programming, a notch higher than patterns.
The quality things are supposed to be coded in a universal language like
C++ or
Java.
But once a particular application programming field (like GUI)
becomes complicated enough, you won't do much with a universal language itself and just empty hands. So, some kind of framework typically arises,
a library with some specific classes and procedures. It brings in some new concepts and its own programming layer. While using those things,
you nominally remain coding in your universal language, actually, almost all you do with it is just the creation of some specific controlling
structures and the processing of some events required by that application-field framework. All of that represents actually the programming
in a complete different level of semantics, however packed in the universal-language form.
A disadvantage of that approach is that although from the framework's point of view you may be doing quite mundane things (like programming
of some GUI dialog with a few labels and input fields), that is almost impossible to automate with some specialized programming tool
(e.g. graphic designer) because the corresponding coding constructs related to the particular framework's components cannot be reliably
extracted or modified automatically within the universal-language code. (Since being universal means a Turing-complete thing,
which implies that a programming semantics expressed in it cannot be universally understood by a finite algorithm).
For programming of GUIs that may be OK, because a typical separately-working unit of a GUI (e.g. some window or dialog) is built of no more
than a few dozens of components. But when the application field is such that any serious autonomous unit (e.g. a generator of complete HTML
documents) must be made of hundreds or thousands of specialized components, creating and coordinating all of them with some universal-language
classes and coding may be close to impossible.
In that case, the only way to go with the framework any further is to get rid of the universal-language coding altogether and to replace it
with a specialized programming medium that naturally expresses all the framework's concepts and components.
What that programming medium may be? We see only two possibilities here:
-
A human-oriented scripting language with some powerful operators designed to be as expressive as possible so as to minimize coding.
That, however, would mean driving everything to Turing-completeness again thereby making it difficult to automate.
-
A more primitive and verbose language (which we call templates) designed primarily for processing (“understanding”) by some
software with the goal to visualize the program semantics as much as possible, which would make the automation of the programming much easier.
The last approach is what the FlexDoc.XYZ templates are.
Links