PDF to EPUB Conversion Tools Score Low on Accuracy in VIGC Test
TURNHOUT, BELGIUM—June 6, 2012—Publishers wanting to convert their libraries of books to eBooks may find it problematic, given a test run by the Flemish Innovation Center for Graphic Communication (VIGC) that found tools for converting PDFs to EPUB files scored an average of 30 percent accuracy, with the lowest-performing tool scoring just 10 percent. And even more significant: the four EPUB validation tools that VIGC used produced different results.
“There’s no doubt that the popularity of eBooks is soaring,” says Eddy Hagen, general manager at VIGC. “This trend leaves publishers facing a big challenge—they have to convert their whole back catalog to make them available as eBooks.
“An obvious approach is to take the print-ready PDF files and convert them to EPUB files, the standard file format for ebooks. For printers, this presents a potential new service they can offer their customers. On the internet you can find a lot of tools for converting PDFs to EPUB files—unfortunately, however, it’s not that straightforward.”
Conversion tools struggle to make the grade
The VIGC assessed 13 tools. The test began with a perfect PDF/X-4 file, which was converted via the different tools to an EPUB file. The next step was to validate the EPUB file with four different validation tools, to check if they conformed to the EPUB specifications. Then, all files were checked visually with five different EPUB viewers. In some cases there was a final conversion from EPUB to Amazon Kindle or to Apple iBooks.
“We used a very challenging print-ready test file that contained text and images,” explains Hagen. “And from a typographic point of view, we added all kinds of tricks. Eventually we ended up with a book covering nearly 30 pages. In total, we tested 65 different elements—from a simple italic, to OpenType functions like ligatures, through mathematical formulas. We didn’t expect any tool to register a perfect score, but at the same time we didn’t expect some tools to score as low as 10 percent.”
Ligatures prove a challenge
An important issue highlighted by the VIGC test is the difficulty in converting ligatures. A ligature is a “combined glyph”—two or sometimes three letters that have been joined together for aesthetic reasons.
“A good example is the combination of the letters ‘f’ and ‘I’,” continues Hagen. “In the text we wrote, there were the word profiles where the ‘f’ and ‘i’ were replaced by one glyph, the ligature. But in some EPUB files we found ‘profles’ instead of profiles’—the ‘i’ had been dropped.
“Some conversion tools didn’t recognize the ligature and made an incorrect conversion. The automatic use of ligatures is the default in Adobe InDesign—i.e. InDesign will automatically replace certain combinations with the ligature—so you can imagine how often this happens in a document, and how often this will go wrong with some tools. You need to scan the converted files manually to check for missing letters, which simply isn’t feasible.”
Validation misses the mark
According to Hagen, validation tools pose an even bigger problem. “What amazed us the most was the validation of the EPUB files. Validation was a standard practice in our test, a simple way to partially check the quality of the conversion. We used four different validation tools and got different results.
“Some tools could validate one EPUB file, while another tool couldn’t. And the differences were inconsistent too—it wasn’t a case of one tool always being different from the other three. Based on our results, publishers face a big challenge in ensuring EPUB files—and subsequently the ebooks themselves—have been converted accurately.”
The complete report with detailed test results can be purchased from the VIGC. They will also continue their search for good tools for EPUB-creation, to further support publishers and printers.