[Techtalk] Gimp tutorials

Miriam English mim at miriam-english.org
Fri Oct 18 03:19:22 UTC 2013


A. Mani wrote:

> djvu is not paper oriented... it is primarily a format for the web and
> electronic publishing with strong focus on size.
> It beats jpeg in image handling for web and can be optimized for
> hand-held devices
> Specs: http://djvu.org/resources/

I have to admit I haven't a lot of experience with djvu after dabbling 
with it and finding it was yet another one-way format, making it 
near-impossible to get stuff back out of. You are right that it is less 
bloated than pdf, but as you noted, the viewer is not commonly included 
in computer installations.

All the djvu ebooks I have (I don't have many) are merely scanned images 
of paper pages, not styled text. Some seem to contain OCR'ed text as 
meta-data as well, but it lacks any markup. Styled text will always be 
much more efficient than images of text. Do djvu documents exist that 
consist of simply styled text? I haven't seen any. Agreed that for a 
scanned document djvu would be the hands-down winner in filesize.

I've found a Wikisource discussion of djvu at:
http://en.wikisource.org/wiki/Help:DjVu_files
I am grateful to you bringing up the topic as I now have a way of 
OCR'ing and extracting images from a scanned djvu document.

> epub is ahead for screen reading.

The newish epub format is certainly a step in the right direction. It is 
basically a bunch of html documents and images all zipped up into a 
single file. You can see this if you rename an epub document to a zip 
file then unzip it.

It is two steps forward and one step back however, because it has 
creeping committee-itis. They keep adding further requirements to the 
format, complicating the already complex process of making of an epub 
file. I was publishing an epub format ebook for a friend and I 
hand-edited a few last-minute changes. I found that making a tiny 
mistake in the additionally required and ridiculously convoluted 
"content.opf" file or the "toc.ncx" file can screw up the whole book.

One of the nicest things about simple html is that it happily skips over 
any stuff it doesn't know how to interpret and displays it anyway -- it 
is fault-tolerant, bless Tim Berners-Lee's heart. Many newer formats are 
insanely fault-intolerant. For example, if one byte in a multi-megabyte 
pdf document is wrong it refuses to even open.

Best wishes,

	- Miriam

-- 
If you don't have any failures then you're not trying hard enough.
  - Dr. Charles Elachi, director of NASA's Jet Propulsion Laboratory
-----
Website: http://miriam-english.org
Blogs:   http://miriam-e.dreamwidth.org
          http://miriam-e.livejournal.com




More information about the Techtalk mailing list