I've been wanting to try out #maturin for awhile now, and with some of the #LLM tinkering I've done at work, I finally had an excellent use case for it.
Its an opinionated #rust implementation of splitting #langchain documents as well as some #unstructured post processors. For cleaning and splitting, I've clocked it at between 40 and 75x faster than the python implementation, and on my machine it can clean and split 25,000 documents in a second.
Check it out at https://github.com/cam-barts/rs_document