[Techtalk] PDF to txt
agoats at compuserve.com
Sat Oct 23 13:36:24 EST 2004
> I been trying to convert PDF file to plaintext.
> $pdf2ps filename.pdf output.ps
> $ps2ascii filename.ps output.txt
> Output: txt file.
> But the problem is there's nothing inside the file. It's all garbage like
> lots of @@@@.
Some of the pdf's are graphics: scans of text and saved as pdf. Lot's of
the early US Mil-Specs in Acrobat format are that way. The kind of
output you are getting sounds like a graphic image.
There is NO way to convert these directly to text. As for outputing to a
graphic and using an OCR, I've never been successful doing so. Most use
strange graphic formats with incompatable pixel resolution.
Where you can, you might also consider trying the command:
ps2ascii filename.pdf output.txt
Later versions of ghostscript will take either ps or pdf and convert to
ascii, saving you a step.
More information about the Techtalk