fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

8.7K
active users

Davide Eynard (+mala)

I got back home from the amazing and while I am drafting a “Build Your Own Timeline Algorithm” blog post to follow up on my talk I decided to keep the momentum going with a thread about it. You'll find my code at github.com/aittalam/byota and the talk slides here: fosdem.org/2025/schedule/event (a video with the recording of the talk should be available soon too). Now let’s dig into !

What is BYOTA? Let’s start from what it is not: it’s not a new algorithm (you’ll see a few, but all coming from very classical ML), an end-user application, or a service.
To me BYOTA is a call for arms, starting with an imperative: BUILD. I like to see it as a playground where people can easily play with, create, and share their own algorithms. My main contribution here is just taking a few relatively recent and absolutely amazing tools and putting them together.
Where does this idea come from?

I like to say it all started with a sink, but for me it was about half a year before that. I was already fed up with what was happening around all social media and was looking for non-centralized alternatives. I joined the Fediverse and found some great communities of practice that resonated a lot with me. Each community had different characteristics, language, message frequency, number of participants. This is where I started to think about the idea of a “personal algorithm”.

But let us start with the TL algorithms we all know: they are usually not monolithic but made of smaller, more specific algorithms which contribute to decide whether and how posts appear on your timeline. If you are interested in this topic and want to delve deeper, let me point you to this excellent blog post (part of an excellent series!) by Luke Thorburn: medium.com/understanding-recom

For algorithms specifically running on the fediverse, let me point you to the following in sparse order:
- searchtodon.social/
- github.com/hodgesmr/mastodon_d
- tootfinder.ch/
- fediview.com/
- github.com/pkreissel/foryoufeed
I am grateful I could stand on the shoulders of these giants, learning from them about both the advantages and the limitations of running algorithms on the Fediverse.

Every algorithm I saw was affected by one or more of the following problems: bias; lack of privacy, transparency, and user control; dependency on ML algorithms which are complex, computationally heavy and require to run in a centralized way. Note that reverse chronological itself -which many welcome as absence of an algorithm- is an algorithm and it is biased, as it generally favors statuses by people who write more and in the same timezone as me.

In practice, those are actual problems only when the objective is to retain users on your platform. However, we can break this assumption and decide we don’t need to compete with commercial algorithms: serving people’s purposes is more than enough. When you do this, you immediately realize there’s a solution to each of the above problems, one which relies on open, small, interpretable models, that can be run locally for ad-hoc tasks without requiring too much ML knowledge: local embeddings.

BYOTA is built on this assumption: do something useful while preserving user privacy and control, using open models and local-first technologies. While the concept sounds straightforward, developing something which is both easy to use and customize, computationally light, and useful in my spare time was not trivial. Luckily, in the last year I found some tools which are the right ones for the task and now BYOTA exists thanks to these three main components: Mastodon.py, llamafile, and marimo.

Mastodon.py is a python client library for Mastodon and I use it to download my own posts and my instance’s timeline updates. As what we are interested in are just status contents, you can swap this library with whatever works with your setup: a generic ActivityPub client, a specific one for a different social network that is smart enough to expose an API, an RSS feed reader… Even a mix of these if you want to play with cross-platform search and recommendations. mastodonpy.readthedocs.io

mastodonpy.readthedocs.ioMastodon.py — Mastodon.py 2.1.3 documentation

is a Mozilla tool that packages a language model in a single executable file that will run on most platforms. It is 100% local and has been optimized to run on slower hardware, from Raspberry Pis to my 8yo laptop. It is based on llama.cpp which supports a plethora of models, not just LLMs: I chose all-minilm because it’s tiny (50MB) and has open code, research papers, and datasets. github.com/Mozilla-Ocho/llamaf huggingface.co/sentence-transf

In , we use to calculate *sentence embeddings*.
If you don’t know what embeddings are, just think about them as numerical descriptors of your Mastodon statuses, which are closer (as in two cities’ coordinates being close on a map) the more semantically similar their respective statuses are. We’ll get back later to this with a more visual description. If you are interested in embeddings and wanna delve deeper, see vickiboykis.com/what_are_embed by @vicki.

vickiboykis.comWhat are embeddings?A deep-dive into machine learning embeddings.

Marimo is a reactive notebook for Python, that is also sharable as an application. It mixes code, markdown and UI widgets in the same interface so you can (1) develop as you’d do with other notebook environments, (2) share it as an application, by hiding the code and only displaying the UI components, (3) allow people to use it as they want, customizing both code and interface within a grid-based UI. Above all, marimo relies on WASM to run everything inside one’s browser.

What does having a WASM-powered notebook mean? Consider : you can download it from its repo, pip-install its deps, and run it locally as any python notebook. But you can also deploy it as HTML+Javascript files, host it somewhere super cheap (coz the server won’t run any of your code), and people will be able to run it in their browser with no need to install anything else! Plus, this will work both with “my” algorithm and whatever alternative you might develop starting from BYOTA’s code.

So, what can you do with ? The first thing is embeddings visualization. In these pictures you can see a 2D plot of embeddings calculated on four different timelines: home (blue, only people I follow), local (orange, all posts from my instance, which is fosstodon.org), public (red, federated posts from people followed by users of my instance), and the timeline that I got searching for the hashtag (light blue).

You can also plot all embeddings together! In this picture I have selected, analyzed, and annotated a few areas of this map. What i find cool is that semantically similar statuses will always be close to each other regardless of which timeline they appeared in, so you can follow content across different timelines. This is simple to grasp and easy to interpret, as it is based on the actual text present in statuses. And this way of encoding semantic similarity allows us to do many other things.

Semantic search is one of them! If you provide a status ID (which you can get by selecting a post in the map) you can look for the statuses (default is top 5) which are most similar, that is closest, to it. You can also just make up a sentence that describes what you are interested in (e.g. the figure shows the closest statuses to “I am a fan of free software and everything opensource”). Note that the results come from different timelines, in this case public, local, and tag/gopher.

How could we have a timeline algorithm without post re-ranking? As you see in the figures, by relying on sentence embeddings you can have a content-based re-ranking of a timeline. You basically apply the “style” of a set of statuses to another list, by putting on top those statuses which are overall closer to the ones you provided. For privacy purposes I did not show a re-ranked public timeline: I did it with my own messages, re-ranking them according to the “style” of and .

“This looks nice, but how heavy is the calculation of these embeddings?”.
I embedded 800 statuses using four different models, from the 22M-parameters all-MiniLM to the 7B e5-mistral. I tested them with two different local servers, llamafile and ollama, and two different laptops, my 2016 MacBook Pro with Intel CPU and my 2024 one with M3 Max. The results are shown in the picture. The summary is: it can work on older hardware, and on recent one you will barely notice the calculation overhead.

All-miniLM took 11 sec on M3, 52 sec on my 8yo laptop. And embeddings are already good! Larger models might provide extra perks (which I have not investigated yet) but at the price of higher compute. Interestingly, despite the fact that both ollama and llamafile are based on llama.cpp, ollama seems to be faster on newer hardware / smaller models, while llamafile becomes a better choice for larger models on older hardware. I used default params for both servers so there is space for improvement.

“What data do I have to share to run this?”
The answer is: none. The embedding servers run completely offline. And for the marimo notebook, the only remote connections you’ll do are those related to the marimo and WASM dependencies at bootstrap time, and those required to download posts from the mastodon instance. You are always in control of which and how many messages you download before running the algorithm. From then on (embeddings, plots, search, re-ranking) everything runs on your device.

This was , thanks for following this long thread! What you see here are some of the next steps I planned: in addition to natural ML extensions, I’d like to see it grow as a tool for people to experiment with different algorithms and easily share them, and for less tech-savvy people to use as easily as possible. For this to be true, I will invest some time in understanding how to bring this to fruition at protocol level, rather than a single application. Stay tuned!

@Davide Eynard (+mala) I don't miss any timeline algorhythm. Why should i want to build one at all?

@jrp if you don’t miss it, you probably shouldn’t. But if one does, I think they should be able to do it in a way that is custom and 100% under their control.

On my side, I follow different communities of practice which have very different characteristics and just scrolling the reverse chronological timeline (or even hashtags / lists) to me feels suboptimal at times. Sometimes I’d just like to delve deeper into a topic, or surface just a part of my home timeline.

@Davide Eynard (+mala) (sorry i posted an answer to a different thread here before) - i understand now, thank you. That's surely a useful idea for some. This sounds a Mastodon problem for one, and one that could be solved with a dedicated plugin as well. I don't know if there are plugins on Mastodon, i am on Hubzilla, and here we have this.

@jrp Thanks for sharing, I'll check that plugin too! 🙏

@Davide Eynard (+mala) Aren't lists on Mastodon already something like an own timeline algorhythm?

@jrp Yes, IMO mixing hashtags and lists is the closest thing you can get to one, as you can filter respectively by content and people (plus you can hide ppl in a list from the main timeline which makes it lighter).

One thing I like to do is learning new language/hashtags from specific communities so I can improve my searches. Semantic similarity can help with that, but I also realize it is just one way to use a SN (1 more reason why I’m all in for full user control!)

@mala this is super interesting! I’ve been checking the fosdem video website every day waiting for the recording of your talk to be published, so it’s great to get a textual “preview” :D

@piger thank you! The delay was my fault 😇 but now the video is ready to be published. You’ll find the posts here a bit richer in contents IMO, while the video has definitely more jokes 😜

@mala great, thank you! Especially because I need a break from trying to make byoya do oauth with Mastodon :blobsweats:

@piger oh no is it that bad? Please lmk what does not work, that can only improve the tool 🙏

@mala shouldn’t be too hard, it’s just that I was a bit tired and I thought I could just hack my way through 😅
The issue I was trying to solve is that you can’t use the API with user+pass if your Mastodon account has 2FA enabled.

@piger oh shoot I had not thought about that 🤦Also, I should enable 2FA for my mastodon account too 😅 - I’ll try that out and see how I can support it in the tool! 🙏

@mala sorry, didn't mean to auth-shame you :P I'll give it another try too, but first I really need to read how Marimo actually works... it's been too long since I touched any kind of Python notebook!

@piger ahah no worries! I have pushed some code that should fix the 2FA issue here using oauth, if you want to play with it: github.com/aittalam/byota/tree

I tested it both with and without 2FA, it should work fine now but I might have missed some flow checks in marimo (as the flow is not linear, I have to put stops here and there to make sure all the proper forms are filled and then ignore them when all the tokens are stored). As soon as I am sure the flow is ok I'll merge this into main

GitHubGitHub - aittalam/byota: Build Your Own Timeline AlgorithmBuild Your Own Timeline Algorithm. Contribute to aittalam/byota development by creating an account on GitHub.

@mala I removed the old .txt files before trying again, but now the mastodon lib is throwing an exception as soon as you start the notebook: gist.github.com/piger/3d7cf5c1

(thanks for your patience btw! I tried to understand the oauth parts of mastodon.py and was REALLY irritating :blobsweats: )

Gistdamn_you_oauth.txtdamn_you_oauth.txt. GitHub Gist: instantly share code, notes, and snippets.

@piger no problem at all! I'll check it out tonight (or this weekend at the latest) starting from a clean slate. I really appreciate your feedback, this helps me make it better for everyone! 🙏

@mala if this can help, I managed to write the least amount of code to set up the mastodon lib to do OAuth and fetch a few posts: gist.github.com/piger/3d7cf5c1

Later I'll try to adapt it to Marimo, but there's the tiny problem that I still haven't studied how it works 😅

Gistdamn_you_oauth.txtdamn_you_oauth.txt. GitHub Gist: instantly share code, notes, and snippets.

@piger hey, sorry for getting back so late on this… I just wanted to tell you I tried it and it works great! Eventually it is straightforward enough to just say “generate the token” and it makes the UI so much simpler! Thank you 🙏

I refactored my code and I am writing new docs for it. I have created a Docker image so ppl can run EVERYTHING (notebook+engine) in one command. And I am preparing synthetic data for demoing it. It should all be out soon 🤞 (there’s a -hopefully good- surprise too😇)

@mala that's great news! And don't worry about the delay; once I got it working I decided to try to understand a bit more how it worked (I even bought a book 😅) and then got distracted by other things. As usual :)

I'll definitely give it a try once you publish the new changes!