: I'm not sure if I'm very clear 😕. Here's my use case: I have an app on my phone that scans documents to PDFs but it doesn't do any OCR. I also have a bunch of digital documents for which I don't have a paper version anymore. I'd like to OCR these documents to make them searchable and allow copy/paste.
@Jase thanks for your suggestion! pdfsandwich was mentioned before and is basically a toolchain wrapping enhancement tools and tesseract.
I guess your suggestion is more adapted to raw images? I don’t know if imagemagick can be used on PDFs.
@ben @themactep Thanks you both for your suggestions! pdfsandwich seems like a cool wrapper around tesseract and other tools.
It seems to work OK for french documents.
I'd be happy to have GUI suggestions as well!
At least I have a nice CLI tool in my toolbelt now 👍
@kettcar64 @mike Thanks for your suggestion, but I don’t feel safe uploading my pdf to an online service I don’t have control over. Plus there are some confidential documents I’m not allowed to upload anywhere among the ones I need to process 😊
I admit that it’s a really simple and easy solution though, and it might be sufficient for some!
@Crocmagnon if you don't need them to be directly stored on your phone, you can selfhost paperless-ng! It's a great app I selfhost at home and it has mobile apps that allow you to upload scans directly from your phone. The machine hosting it will then do OCR and the web interface let's you search through tags or OCR content
@iconvacation it looks awesome! I’ll definitely check it out, thanks for the suggestion!
Do you know if it can be configured to push the final OCR’d document to a specific NextCloud folder? That would complete the loop nicely 👌🏻
@Crocmagnon This might be quite a bit of overkill but I had paperless running for a couple of years and it did its job wonderfully. It's a server architecture that ingests everything in a folder, OCRs and files it for you. While the original isn't maintained there is https://github.com/jonaswinkler/paperless-ng nowadays, though I have not tried this fork.
Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.