phiresky.github.io/blog/2019/r

Recursive search in all manner of files, including pdf, compressed archives, subtitles of .mkv files (!!!)

Just amazing


I tested it for a while, and it's not super fast to search the subtitles, a couple seconds an episode (I guess it must do some pretty heavy operations), but it's pretty cool that it works at all.

@Matter What's the file size of each srt and how many srt files are there to search through?

@brandon that's just the thing: the subtitles are in the video containers! (mkv in this case)

@Matter Oh wow! That's...not too bad!

Is the intention to leave them in the container?

@brandon yes, this was just to test out rga. I don't plan on searching a string in the subtitles of my series very often 😆

@Matter It's actually really cool that it can do that without having to perform an extraction. Would make it easier for YouTubers to search their past videos for that one thing that they said :P

@brandon maybe it does do an extraction. It uses different adapters for different files, like ffmpeg for mkv, mp4 and avi.

It has smart caching features, and you can even enable an OCR algorithm to search for text in images.

@brandon no, no, the OCR is only for jpg and png files, I don't think running a 25fps video through OCR will finish somewhere in this decade 😆

@Matter Aww :P Though I think if you ran it through a GPU it would be able to do this today, I mean...if we can trace to the end of a path of light at 60fps, we should be able to run OCR with such parallelism to enable that, no?

@brandon sure, but you would need specialized hardware for that, in the GPU. And that would set the OCR methods in the silicon, which isn't ideal 🤔

Maybe a big-ass FPGA? Interesting ideas

@Matter I'd think an ASIC would be more apt, maybe something like a software+hardware solution where the ASIC is installed into a PCIe slot

