I've been tinkering with a simple CLI tool for viewing & querying data files (CSV, Parquet, etc.). The most recent feature is the ability to view Parquet row group metadata.
There are more capable tools out there for sure, but this is really coming in handy during my day-to-day work.
Also, writing simple CLI tools is oddly satisfying.
@andygrove Writing CLI tools in Rust is just delightful. The support libraries are so nice.
I need to add support for Parquet and similar formats to #dbcrossbar. But I weirdly don't encounter them in the wild that often. I generally tend to encounter data in CSVs and column stores, but not as much in between.
@andygrove Are you aware of https://www.visidata.org/ perhaps you could consider creating a plugin there
@andygrove OMG, I did the exact same thing a few weeks ago:
@andygrove I used duckdb for mine--didn't know about data fusion.
@criccomini Yes, Tim and I have been swapping ideas on this kind of tooling. There are quite a few of these CLIs being built around Arrow / DataFusion / Ballista.
@andygrove nice! we had similar needs for this library https://github.com/DerwenAI/pynock though it's focused on representing data for scalable graph compute