Heyo! Can anyone recommend a free (as in beer) option for transforming image PDFs to OCR'd PDFs [1] ? French support + macOS required, FLOSS preferred.

[1]: I'm not sure if I'm very clear 😕. Here's my use case: I have an app on my phone that scans documents to PDFs but it doesn't do any OCR. I also have a bunch of digital documents for which I don't have a paper version anymore. I'd like to OCR these documents to make them searchable and allow copy/paste.


I'd suggest checking out imagemagick (convert command) for preprocessing the original, then using tesseract for OCR

@Jase thanks for your suggestion! pdfsandwich was mentioned before and is basically a toolchain wrapping enhancement tools and tesseract.

I guess your suggestion is more adapted to raw images? I don’t know if imagemagick can be used on PDFs.


yes, although you will need ghostscript installed too.

Check out the imagemagick docs / forums for pdf ocr preprocessing
Sign in to participate in the conversation

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.