DjVu

DjVu ( / ˌ of eɪ ʒ ɑ v u / DAY -zhah- VOO , like English ” seen ” [3] ) is a computer file format designed Primarily to store scanned document , Especially Those Containing a combination of text, line drawings , indexed color images, and photographs. It is used as a background image, progressive loading , arithmetic coding , and lossy compression for bitonal ( monochrome) images. This allows you to get high-quality, readable images to be stored in a minimum of space, so they can be made available on the web .

DjVu has been promoted as an alternative to PDF , supporting smaller files than PDF for most scanned documents. [4] The DjVu developers report that color magazine pages compress to 40-70 kB, black-and-white technical papers compress to 15-40 kB, and ancient manuscripts compress to around 100 kB; a satisfactory JPEG image typically requires 500 kB. [5] Like PDF, DjVu can contain an OCR text layer, making it easy to perform and to copy and paste .

Free browser plug-ins and desktop viewers from different developers are available from the djvu.org website. DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux ( Okular , Evince ) and Windows ( SumatraPDF ).

Despite its advantages, DjVu is not widely supported by scanning and viewing software.

History

The DjVu technology was originally developed [5] by Yann LeCun , Léon Bottou , Patrick Haffner, and Paul G. Howard at AT & T Labs from 1996 to 2001.

Due to its declared higher compression ratio (and thus smaller file size) and the ease of converting large volumes of text in DjVu format, it is an open file format , it has been considered superior to PDF . Independent technologist Brewster Kahle in a 2004 talk on IT Conversations discussed the benefits of acquiring easier access to DjVu files. [6] [7]

The DjVu library distributed as part of the open-source DjVuLibre package has become the reference implementation for the DjVu format. DjVuLibre has been maintained by the original developers of DjVu since 2002. [8]

The DjVu file format specification has gone through a number of revisions:

Revision history
Support status Version Release date Notes
Unsupported 1-19 [1] 1996-1999 Developmental versions by AT & T labs preceding the sale of the format to LizardTech .
Unsupported Version 20 [1] April 1999 DjVu version 3. DjVu changed from a single-page format to a multipage format.
Older, still supported Version 21 [1] September 1999 Indirect storage format replaced. The searchable text layer was added.
Older, still supported Version 22 [1] April 2001 Orientation page, color JB2
Unsupported Version 23 [1] July 2002 CID chunk
Unsupported Version 24 [1] February 2003 LTAnno chunk
Older, still supported Version 25 [1] May 2003 NAVM chunk. Support for DjVu bookmarks (outlines) was added. Changes made by Versions 23 and 24 were made obsolete.
Current Version 26 [1] April 2005 Text / line annotations

Technical overview

File structure

The DjVu file format is based on the Interchange File Format and is composed of hierarchically organized chunks. The IFF structure is preceded by a 4-byte AT&T magic number . Following is a single FORMchunk with a secondary identify Either of DJVUgold DJVMfor a single-page or a multi-page document respectivement.

Chunk types

Chunk identify Contained by Description
FORM: DJVU FORM: DJVM Describes a single page. Can be the root of a document and be a single-page document or referred to from a DIRMchunk.
FORM: DJVM N / A Describes a multi-page document. Is the document’s root chunk.
FORM: DJVI FORM: DJVM Contains data shared by multiple pages.
FORM: THUM FORM: DJVM Contains thumbnails.
INFO FORM: DJVU Must be the first chunk. Describes the page width, height, version format, resolution , gamma , and rotation.
DIRM FORM: DJVM Must be the first chunk. References other FORMchunks. These chunks can follow this chunk inside the FORM:DJVMchunk or be contained in external files. These types of documents are referred to as bundled or indirectly , respectively.
NAVM FORM: DJVM If present, must immediately follow the DIRMchunk. Contains a BZZ-compressed outline of the document.

Compression

DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is a few words: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (eg, 100 dpi); the mask image is a high-resolution bilevel image (eg, 300 dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44. [5] The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies the same pattern on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter “e” in a multiple font, it compresses the letter “e” ounce (as a compressed bit image) and then records every place on the page it occurs.

OPTIONALLY, thesis shapes May be mapped to UTF-8 code (Either by hand or Potentially by a text recognition system ) and Stored in the DjVu file. If this mapping exists, it is possible to select and copy text.

Since JBIG2 has been based on JB2, both compression methods have the same problems when performing lossy compression. Numbers may be substituted with similarly looking numbers (such as replacing 6 with 8) if the text was scanned at a low resolution prior to lossy compression.

Licensing format

DjVu is an open file format with patents. [4] The file format specification is published, the source code for the reference library. [4] The original authors distribute an open-source implementation named ” DjVuLibre ” under the GNU General Public License . The AT & T Corporation , LizardTech , Celartem and Cuminas .

Support

Despite its advantages, DjVu is not widely supported by scanning and viewing software. [9] While viewers can be downloaded, DjVu files is not implemented in most operating systems by default. [10]

SumatraPDF (Windows) among others can manipulate DjVu files.

In 2002 the DjVu file format Was Chosen by the Internet Archive have a size in qui icts Million Book Project Provides scanned public-domain books online (along with TIFF and PDF). [11]

Wikimedia Commons , a media repository used by Wikipedia , others conditionals PDF and DjVu media files. [12]

See also

  • JBIG2
  • Comparison of e-book formats

References

  1. ^ Jump up to:i DjVu File Version Format , By Jim Rile, Posted: Fri Feb 23, 2007 1:08 am, PlanetDjVu
  2. Jump up^ “DjVu Licensing” . DjVu Sourceforge page . Sourceforge.net. 2011-08-17 . Retrieved 2011-09-21 .
  3. Jump up^ “DjVu.org – the first menu for djvu resources” . djvu.org . Retrieved 2017-07-02 .
  4. ^ Jump up to:c “What is DjVu – DjVu.org” . DjVu.org . Retrieved 2009-03-05 .
  5. ^ Jump up to:c Leon Bottou; Patrick Haffner; Paul G. Howard; Patrice Simard; Yoshua Bengio; Yann Le Cun (1998). “High Quality Document Image Compression with DjVu, 7 (3): 410-425” (PDF) . Journal of Electronic Imaging.
  6. Jump up^ Brewster Kahle (December 16, 2004). “Universal Access to All Knowledge” (Audio, Speech at 1h: 31m: 20s) . Network Conversations.
  7. Jump up^ “LizardTech To Open Source To DjVu Java Viewer” . ECM Connection . December 7, 2004 . Retrieved 18 August 2017 .
  8. Jump up^ http://djvu.sourceforge.net/
  9. Jump up^ Manual for Xerox / Visioneer OneTouch, widely used scanning software for business and home use, showing support for multiple file formats, but not DjVu.
  10. Jump up^ A DjVu file test. Click on the image in the file to open a file with the support for the .djvu format.
  11. Jump up^ “Image file formats – OLPC” . Wiki.laptop.org . Retrieved 2008-09-09 .
  12. Jump up^ Wikimedia Commons. Project scope: PDF and DjVu.