Follow

Anyone know of a public service that one can use to fetch and parse the text of an article similar to Pocket but...public?

@brandon I'm not aware of a public service like that, but wallabag (github.com/wallabag/wallabag) is a open source project similar to Pocket. Perhaps the fetch/parse components could be used.

@ruivieira Heh, I self-host wallabag already, only thing is I would rather not have the NY Times after me for re-hosting their content XD

@brandon ? I think Framasoft has a free instance. There are probably some others if you look.

@brandon
Sounds like another job for bash scripting! Surely there's a way to do that with a script... if not, I could figure out a Python way probably...

@poetgrant eeeh....with like 200 hours at least of work to put into it necessary, not even gonna try

@brandon
Hehehe... I feel like it should be easy with the right Python modules... let me look into it.

@poetgrant Let me know how it works out. I'd expect there to be so much variation between different sites that this would be extremely complex, but if there's an elegant solution out there I haven't thought of, I'd be happy to be wrong!

@brandon

@mike
I have been thinking about it. I think if I can get my script to load all text on a page, then I can scrape it too. The problem would be discerning the articles from the ads and extras
@brandon

@mike
So I started writing with the Scrapy framework... this is no joke... this is going to take a while.......... hehehe
@brandon

Sign in to participate in the conversation
Fosstodon

Fosstodon is a Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.