Document Engineering for a Digital Library: PDF recompression using JBIG2 and other optimization of PDF documents
| Authors | |
|---|---|
| Year of publication | 2010 |
| Type | Article in Proceedings |
| Conference | Proceedings of DocEng 2010 conference |
| MU Faculty or unit | |
| Citation | |
| web | |
| Doi | https://doi.org/10.1145/1860559.1860563 |
| Field | Informatics |
| Keywords | Authoring tools and systems; Categorization; Classification; Document presentation; Representations/Standards; Character recognition; Digital mathematical library; Digitisation workflow |
| Description | Several innovative document transformations and tools developed in the process of building the Digital Mathematical Library DML-CZ http://dml.cz are described. The main result is our new PDF re-compression tool, developed using a enhanced jbig2enc library. Together with pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size and transmission needs by 62%: using both programs we reduced the size of the original already compressed PDFs to 38%. We briefly describe workflow and tools developed for creating the digital library. The batch digital signature stamper, the document similarity metrics which uses four different methods, a [meta]data validation process and math OCR tools represent some of the main [by]products. Such document engineering, together with Google Scholar indexing optimization, have led to the success of serving digitized and born-digital scientific math documents to the public in DML-CZ, and are being employed also in The European Digital Mathematics Library, EuDML. |
| Related projects: |
|