[Techtalk] PDF to txt

Alvin Goats agoats at compuserve.com
Sat Oct 23 13:36:24 EST 2004

> I been trying to convert PDF file to plaintext.
> $pdf2ps filename.pdf output.ps
> $ps2ascii filename.ps output.txt
> Output: txt file.
> But the problem is there's nothing inside the file. It's all garbage like 
> lots of @@@@. 

Some of the pdf's are graphics: scans of text and saved as pdf. Lot's of
the early US Mil-Specs in Acrobat format are that way. The kind of
output you are getting sounds like a graphic image.

There is NO way to convert these directly to text. As for outputing to a
graphic and using an OCR, I've never been successful doing so. Most use
strange graphic formats with incompatable pixel resolution. 


Where you can, you might also consider trying the command:

ps2ascii filename.pdf output.txt

Later versions of ghostscript will take either ps or pdf and convert to
ascii, saving you a step.


More information about the Techtalk mailing list