[Courses] [python] No lesson -- open discussion

Akkana Peck akkana at shallowsky.com
Sun Sep 18 19:12:57 UTC 2011


Ehud Kaldor writes:
>    -----BEGIN PGP SIGNED MESSAGE-----
>    Hash: SHA1
>    i know some time went by, but this seems like a good forum to ask:
>    is there a good library for 'reading' images? OCR is what comes to
>    mind, but I am not sure if this is the right technical term for taking
>    an image of text and reading the text off it.
>    Thank you, Ehud

Yes, OCR is the right term (Optical Character Recognition).
I haven't done any OCR myself, but from what little I've read,
Google's Tesseract engine seems to be highly regarded. A web search
for python ocr turned up http://code.google.com/p/pytesser/ and
that's where I'd start. Debian seems to have the tesseract binary
in its repositories, but not the Python wrapper, so you'd install
tesseract and PIL (the Python Imaging Library, package python-imaging
on this Debian machine), then download the pytesser from the URL above.
It looks like it should be fairly easy to use.

	...Akkana


More information about the Courses mailing list