Open Enrollment | Subscribe to Printing Impressions HERE
Follow us on

XML?Defining Documents on the 'Net

February 1998
Catalog printers, pay close attention: XML—an enhancement of HTML or a redefined, simplified version of SGML, depending on how you view it—is one document manipulation language you need to know, and know well.

XML will be your friend. Why?

Extensible Markup Language (XML) allows users to define their own structure and tags, completely tailored to a particular document. XML is considered a subset of Standard Generalized Markup Language (SGML)—the 20-year-old, far too complex, yet far too vital to lose an ounce of respect for—document language.

HTML is an application of SGML.

For commercial printers, notably high-end catalog printers that repurpose wares onto glitzy Internet pages for continued sales appeal, XML, a nonproprietary standard, will become quite a considerable player in everyday operations.

"A catalog publisher that is repurposing data for the World Wide Web can use XML to mix and match information in ways not possible with HTML," reports Frank Gilbane, a director at CAP Ventures, a Norwell, MA-based research firm. "XML is designed to solve the main limitations of HTML without the complexities that a full SGML would carry with it."

So, if you were wondering why Microsoft god Bill Gates trumpeted XML during Seybold San Francisco this past fall—tagging the new language as a key to the future of publishing—now you have a clue.

Beyond the Internet
While Internet publishers, naturally, will be enthused by and entrenched in the document manipulation flexibility afforded by this new kid on the block, it's critical that printers, especially catalog printers, welcome this new face to the publishing neighborhood.

"XML will aid in the creation of custom catalogs, allowing commercial printers to enhance their services, thus offering more bits of information within the data they transform from paper to the World Wide Web," CAP's Gilbane explains.

How will XML empower the innovative graphic arts operation? Paul Trevithick, vice president of marketing at Bitstream—better known throughout the industry as the founder of Archetype, recently acquired by Bitstream—offers some direction.

"Since the philosophy behind XML (and SGML) is to separate content from presentation (format, color, font, geometry), it allows writers, editors, illustrators and database personnel to create pure and structured content, free from issues—like how the information will be used and presented," Trevithick says. "More and more, we need to be able to develop information content independently because increasingly these same chunks of content will be reused and reconfigured into many different pages in different media."

By developing workflows oriented around the central management of these digital chunks, or digital assets, and pouring this liquid of digital content into any one of a number of output contexts, tremendous efficiencies and new possibilities arise.

The efficiencies come, Trevithick explains, from only having to update this shared, digital content once, and having the output pages automatically generated from Internet, as well as print, templates. The new possibilities come from the fact that the publishing process is moving away from static print or Internet pages towards a dynamic, new publishing model.

Enter Dynamic Publishing
In dynamic publishing, pages are generated from a template on-the-fly, making the personalization or customization of content to a particular target audience segment—as select as an audience of one—much easier to achieve.

Cory Klatt, chief technology officer of ImageX, is very excited about XML and anticipates its widespread popularity by the end of 1998.

"We see XML as an opportunity to more closely integrate with our vendors without having to deal with the complexities of the underlying data structure of their systems," he explains.

ImageX offers a Web-based solution to its customers as well as a Web-based interface for its vendors. ImageX relies on HTML and PDF to transmit documents to and from its vendors. However, HTML and PDF, Klatt argues, do not provide a structure to transmit information regarding the data that makes up the document.

"XML provides the ability to define a common markup language that can be extended to accommodate the information needs of our vendors, thereby increasing our ability to communicate and share data," Klatt contends.

What is his projection for XML in the graphic arts? In short, XML provides the much-needed ability for the printing, publishing and graphic arts industries to define a more extensible way to share documents and data.

"While HTML and Adobe's PDF specification provide portability of documents, they currently do not describe the data that makes up the document," Klatt reports. "XML allows us to define a markup language that describes the underlying data elements of a document."

Empowering Variable Data
Bitstream's Trevithick trumpets XML, not just for its applications in print-on-demand, but for its impending role in empowering the more valued sister idea of on-demand variable data printing.

As Trevithick illustrates: Quark is the pen, PostScript is the ink and XML is the message, or the story, being delivered.

Still, don't expect HTML to drop off the face of catalog publishing. HTML is a proven, valid page description language that will continue to be used, especially for applications that are not complex.

"HTML will continue to exist," CAP's Gilbane forecasts. "But, for high-end, serious publishing applications, XML will be more important."

Gilbane stresses there is not a conflict waging between HTML and XML. Rather, it's a partnership, to a large extent. "Creating HTML out of XML is as easy as the push of a button," he promises. "It's as easy to create HTML out of XML as it is to create PDF out of Word."

One thing, though, is certain. "Catalog printers that repurpose complex digital content from print applications to extensive pages on the World Wide Web, and even those that don't, will be using XML every day," Gilbane projects. "XML is where publishing is going."

Likewise, P.G. Bartlett, president of ArborText, sees tremendous opportunities for XML in online catalog ordering applications and related advertising and public relations applications currently using HTML.

"I predict the rapid adoption and widespread use of XML for more automated, more flexible manipulation of data on the Internet, as well as in document databases and other applications," Bartlett projects. "HTML is for documents; XML is for data."

—Marie Ranoia Alonso

A Talk With Paul Trevithick

Q: Why is XML rapidly being selected as the technology of the future for online applications that benefit from utilizing text stored in a vendor-neutral, platform-independent way and what is the future of XML?

A: Web pages can also be dynamically generated and personalized from shared XML. This overcomes the limitations of static, handmade Web pages, which are expensive to update. In an XML-based workflow, HTML is treated as just another page description language—another possible output format, just like PostScript.

Looking to the future for XML from an Internet perspective, XML's infinite tag set and rich syntax allow rigorously defined information to flow between computers on the Web. As corporations of all sizes struggle to become "e-businesses" or interactive corporations, Web sites will become the central, defining face of a company.

The consequence of this is that the printed page, eventually, will be driven by the Internet site.

Since the Web site must, by definition, have the most up-to-date, accurate information, printed collateral, direct mail and documentation must be driven from the information repositories behind those Web servers.

If the Web servers are talking XML, prepress providers have a tremendous opportunity to offer XML-to-PostScript services.

X-tenuating Circumstances:
What's So Great About XML?

  • XML is a subset of SGML and was designed especially for Web display. It should be viewed as an extremely easy dialect of SGML.

  • XML is engineered for interoperability with both SGML and HTML.

  • The digitization of documents is not a new task, but XML offers a newer, better alternative to solving many problems, thanks to a document grammar that is sufficient for capturing the structure of regular document images.

  • XML allows for a variation of presentation among document types, so that a document may be automatically classified and digitized based on analysis of the document image. The result: A tagged document that is much more consistent, much richer and more readily usable on the Web.


Companies Mentioned:


Click here to leave a comment...
Comment *
Most Recent Comments: