proof-of-concept for importing packagesite.yaml into FreshPorts. The steps are:

1 - From each line of 32500-line yaml file, extract 3 fields creating a csv file
2 - load cvs file into db

Step 2 takes seconds.

I need help / advice with step 1 which takes 3 minutes.

Code and examples are found here:

Thank you

@dvl I have a weird idea. Is using jq an option? :-).
My hunch is that you are being CPU bound by the YAML interpreter.

This produces a similar output for me (modulo the leading ABI string).

cat packagesite.yaml | jq -r '.name + "\t" + .origin + "\t" + .version'

@evilham I combined your suggestion and one from

$ time jq -rc '[1, .origin, .name, .version] |
' < ~/tmp/FreeBSD\:12\:amd64/latest/packagesite.yaml > packagesite.csv


$ time ./


The data get in there fast enough.

Next step, go from that raw data into normalized form. That should be easier & faster now that it's in a [] database [on ].

Thank you.

@dvl that's really cool. Thank *you* for working on that.

@evilham I like working with database tables and functions. I enjoy the design aspect.

