Follow

Like ? Know ? I created a tool to generate RSS feeds from arbitrary websites using CSS selectors.
And it's called... Feed me up, Scotty!

feed-me-up-scotty.vincenttunru

I just pushed a new version that:

- supports pagination,
- allows narrowing down the content, and
- has configurable timeouts for feed sources.

Thanks everyone who gave it a try and provided feedback!

@VincentTunru How cool is that!?🎉
Thank you, Vincent, for making this!❤️

@VincentTunru I'll be damned, it works!! Now trying to generate an #Odysee channel feed, almost there ;) +1 for the "combined feed" feature!! dude, this rocks. I'm going to write an article for #linuxfr about this gem.

@VincentTunru Gaah 🤖 can't seem to catch those li.card elements (odysee.com/@eevblog:7) no matter what I do, no items are generated... Idea(s)?

@yphil That site is taking a long time to load the actual content, so when it thought it was done loading, it actually only consisted of

<div id="app"><div class="main--empty"></div></div>

I've pushed a new version that should fix that. Should be released when the following build is done: gitlab.com/vincenttunru/feed-m

@VincentTunru Wow, that was quick, thanks! From what I understand, the code runs "remotely" on your repo (mine only containing feeds data/params) but I just ran the pipeline again, w/o luck, so I must be missing something? :(

@VincentTunru Is feed-me-up-scotty hard-configured to target a branch named "main"? That could be what I ran into ; still investigating why the jobs all say "main" (gitlab.com/yphil/solexine/-/jo) even though I configured the schedule (gitlab.com/yphil/solexine/-/pi) to target "master"..?

@yphil No, it's setup to automatically run the latest version published to npm. That said, npm's CDNs is sometimes just slow to update. Reeeeaaally slow in this case, it seems, since I still see version 1.0.0 (the fix is in 1.1.0) on npmjs.com/package/feed-me-up-s.

@VincentTunru Ah yes, I forgot about dreaded #npm (I can't publish my packages anymore myself - example: npmjs.com/package/feedrat - go figure) so I'll wait :)

Thx for *your* patience too :)

@VincentTunru I just tested again(& reseted the fork to work w/ the main branch, so I'm pretty sure I'm up-to-date)) no luck, my feed is still empty :( so I guess it must be my selectors... Are you able to generate a feed from an Odysee channel yourself? If yes, I'd love to see the selectors you use :p

@yphil Weird! I do get a feed with the following config, although it was empty on the first try for me as well — maybe try rerunning it?

[eevblog]
title = "EEVblog"
url = "odysee.com/@eevblog:7"
entrySelector = "li.card"
titleSelector = "h2"
linkSelector = "a:first-child"

@VincentTunru YAY it works now 😎 w/ your selectors!! The link is OK ; The thumbnail image is the icon of the channel so no image, I guess ; still, Vincent, you are now officially one of my personal HEROES :)

@yphil Great to hear! I just looked at adding an imageSelector, but unfortunately there's a bit more variance with some sites having regular <img>s, some having a background-image somewhere, and possibly srcset and the likes. I'll think about that some more.

@VincentTunru But... It's the same as the other selectors, only both - admittedly - harder to find / scrap & much less critical: if the selector isn't found, then it's not found, no biggie, and if it's not even present in the feeds.toml file, same! Am I missing something ?

Dude, this project is fantastic, do you have a crowdfunding url set up? I mean, as we say in France "c'est mal vendu" (this is so hard to find that it's not really for sale) 😋

@yphil Well, no, not really, but in the Odyssee page you linked, the images aren't a regular <img>, and I was hoping to support at least your use case :) I'll definitely add support for regular images, but I'll try to support background images as well.

And no, there's nothing to fund: I'm privileged enough to be able to work on things like this, and am scratching my own itch and enjoying myself when doing so, so I wouldn't feel comfortable taking people's money.

@yphil OK, so, I got image detection working, but... It turns out Atom feeds don't support images for individual feed entries. 😭

@yphil The individual feed entries there don't have specific references to images either, as far as I can see. I think Petrolette just takes an image referred to in the content and uses that as cover image?

@VincentTunru An optional imageSelector would really be the beezneez 😋

@VincentTunru Sometimes it's empty, like right now all my #odysee feeds are ; I guess they really don't want to be scraped ;(

feature suggestion 

@VincentTunru Thanks for sharing this, this is a very nice idea that makes an otherwise cumbersome task very easy!

One thing I've had to deal with writing HTML to RSS converters was that often it'd require some ad-hoc modification of the content you grab with selectors. e.g. there will be variation from item to item, dates will be absurd, they'll add a flashing "new" GIF as if it was the 90s, etc. In some other cases, the feeds will be much more useful if postprocessed, e.g. files can be made to enclosures, URLs could be cleaned up, etc.

In that light a scripting feature could make this tool even more powerful. IDK if it'd be possible to make the DOM available to the user, but even if it was just a random executable script with HTML in its stdin, a lot can be achieved say with Nokogiri in a Ruby script. E.g. these scripts here are for my university's announcements gitlab.com/cadadr/hacettepe2rs which can demonstrate how some feeds may require postprocessing.

feature suggestion 

@cadadr Yeah I can see the use case. I think the best approach in that case is to set up your CI job to execute the post processing script after Feed me up, Scotty! completes, and processes the generated feed rather than the HTML. Or alternatively, to just create a fork — it's a relatively small script.

@VincentTunru
Sounds great!
Could you link me an example webpage?
I'd like to see how it looks/feels. :)

@paulfree14 The homepage shows an example config and below that links to the generated feed (and contains a screenshot of those feeds rendered).

So the "Did you know?" section on the Wikipedia homepage can be turned into this feed: vincenttunru.gitlab.io/feeds/f (plug that into a feed reader to see what it looks like)

@jpfox Yeah, looks like there's many similar projects. Quite a few were shared in the Hacker News thread: news.ycombinator.com/item?id=2

I think Feed me up, Scotty! is mainly useful if you prefer having it in a place under your control without needing to run your own server, e.g. GitLab CI/CD or GitHub Actions.

@jpfox @VincentTunru The super cool part about rss.bridge is that it is super easy to use. As far as I understand, the above one needs some skills and you have to self host it if I am right. I'd love to find an easy way to create RSS out of any website that doesn't have one.

@tio @jpfox Yes, see my other reply: fosstodon.org/@VincentTunru/10

The linked Hacker News thread contains lots of suggestions. I'm sure one of those will suit your need!

@VincentTunru sounds pretty cool ! Thanks !

oh and btw, if you rotate your logo 135degrees the wave will kinda look like a styized “F” like your tool’s name :smart:

@Guerin Ha, I was going for an imitation of a UFO beaming up a person, but a stylised "F" is a fun idea too 😃

@VincentTunru
This is so cool!
Thank you so much for that great tool.

@VincentTunru I have a custom widget thing for #homeassistant to tell me what recycling bins should go out with some ugly selectors. I think this would be much nicer!

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.