Follow

proof-of-concept for importing packagesite.yaml into FreshPorts. The steps are:

1 - From each line of 32500-line yaml file, extract 3 fields creating a csv file
2 - load cvs file into db

Step 2 takes seconds.

I need help / advice with step 1 which takes 3 minutes.

Code and examples are found here:

git.langille.org/FreshPorts/pa

Thank you

@dvl I have a weird idea. Is using jq an option? :-).
My hunch is that you are being CPU bound by the YAML interpreter.

This produces a similar output for me (modulo the leading ABI string).

cat packagesite.yaml | jq -r '.name + "\t" + .origin + "\t" + .version'

@evilham I combined your suggestion and one from bsd.network:

$ time jq -rc '[1, .origin, .name, .version] |
@tsv
' < ~/tmp/FreeBSD\:12\:amd64/latest/packagesite.yaml > packagesite.csv

real0m1.351s
user0m1.295s
sys0m0.055s

$ time ./import-via-copy-packagesite.py

real0m1.731s
user0m0.131s
sys0m0.008s

The data get in there fast enough.

Next step, go from that raw data into normalized form. That should be easier & faster now that it's in a [] database [on ].

Thank you.

@dvl that's really cool. Thank *you* for working on that.

@evilham I like working with database tables and functions. I enjoy the design aspect.

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.