#Python proof-of-concept for importing packagesite.yaml into FreshPorts. The steps are:
1 - From each line of 32500-line yaml file, extract 3 fields creating a csv file
2 - load cvs file into db
Step 2 takes seconds.
I need help / advice with step 1 which takes 3 minutes.
Code and examples are found here:
@dvl I have a weird idea. Is using jq an option? :-).
My hunch is that you are being CPU bound by the YAML interpreter.
This produces a similar output for me (modulo the leading ABI string).
cat packagesite.yaml | jq -r '.name + "\t" + .origin + "\t" + .version'
@evilham I combined your suggestion and one from bsd.network:
$ time jq -rc '[1, .origin, .name, .version] |
' < ~/tmp/FreeBSD\:12\:amd64/latest/packagesite.yaml > packagesite.csv
$ time ./import-via-copy-packagesite.py
The data get in there fast enough.
@dvl that's really cool. Thank *you* for working on that.
@evilham I like working with database tables and functions. I enjoy the design aspect.
Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.