Fosstodon @fosstodon

11 posts11 participants0 posts today

**paigerduty** @paigerduty@hachyderm.io · 1d

this week I'm reading Human Factors in Systems Engineering

there are so many gems I've highlighted already but really vibed with how the author clearly and simply expressed the impact of writing docs "early" here

paige holding up the book curiously peeking inside

image of Developing Documentation section, transcribed in thread

#systems #sre #NowReading

**Open Source JobHub** @osjobhub · 2d

Open Source JobHub @osjobhub

Are you looking for a new remote job? Browse 400+ remote positions from open source companies including @acquia @grafana @mozilla @wikimediafoundation and more on #OSJH
https://opensourcejobhub.com/jobs/?q=remote&utm_source=mosjh
#career #OpenSource #engineer #sales #security #marketing #CloudNative #developer #DevSecOps #SRE #FOSS

(person sitting at a desk working on a computer with 3 monitors and a black cat sitting on the floor next to them) Want to work from anywhere? Find remote jobs now -- Open Source JobHub

**The Linux Foundation** @linuxfoundation@social.lfx.dev · 2d

The Linux Foundation @linuxfoundation@social.lfx.dev

Want to grow your open source career? The LiFT Scholarship offers training & certs to help you level up—whether you're starting out or advancing.
Apply by April 30: https://app.smarterselect.com/programs/102338-Linux-Foundation-Education

#DevOps #SRE #Engineers

**jj@::1$:~** @thejtoken@hachyderm.io · 2d

jj@::1$:~ @thejtoken@hachyderm.io

It's really nice to see that Google #SRE Prodcast is back!
https://podcasts.apple.com/es/podcast/google-sre-prodcast/id1615778073?i=1000703737516

Apple PodcastsWe’re back with Season 4!Episodio de podcast · Google SRE Prodcast · 16/04/2025 · 15 min

**Meysam** @meysam81@mastodon.social · 2d

Meysam @meysam81@mastodon.social

Deploy Consul as OpenTofu Backend with Azure & Ansible

https://developer-friendly.blog/blog/2025/04/14/deploy-consul-as-opentofu-backend-with-azure--ansible/

#sre
#terraform
#consul
#ansible

developer-friendly.blogDeploy Consul as OpenTofu Backend with Azure & Ansible - Developer Friendly Blog

More from

Meysam

**Alexander** @webmacher@mastodon.social · 3d

Alexander @webmacher@mastodon.social

Auf geht's erstmals nach #Berlin zu meinen neuen Kollegen von coding. powerful. systems. CPS GmbH. (Vom wohl kleinsten Bahnhof im Nachbardorf.)
#Antrittsbesuch #Hauptstadt #Arbeit #Dienstreise #SRE

**craque sprung** @dtauvdiodr@c.im · 3d

craque sprung @dtauvdiodr@c.im

Running an Incidents 101 training tomorrow. Including two games, both involving some dice rolling, should be fun. I don't feel nervous, I know what process and common ground to cover.

Trying my best to keep the material interesting between getting some interaction and showing the necessary slides with steps and rules on them.

In one activity we throw gaming dice and build a context,
randomizing things like customers affected, size of response, time of day, etc. Then use the rules for gauging severity. That's the whole game!

I hope above all that the activities go well and the people unfamiliar with the process will have an opportunity to learn something. I cannot make it be everything for everybody, but I hope to the right people it is a help.

#SRE #SkillDevelopment #resilience

**paigerduty** @paigerduty@hachyderm.io · 3d

paigerduty @paigerduty@hachyderm.io

a short lil blog post sharing how re-reading the evergreen etsy Debriefing Facilitation Guide helped me better investigate a mysterious sound....

https://www.paigerduty.com/on-describing-not-explaining/

paigerduty · 3dOn Describing Not Explainingthis past weekend a mysterious sound that came from somewhere behind and above the living room interrupted movie night... unfortunately for me and my partner this was during The Bourne Identity, an action-thriller that had heightened our sense of paranoia 😬 obvi we immediately paused the movie and took a quick

#sre

**Marcos Dione** @mdione@en.osm.town · 6d

Marcos Dione @mdione@en.osm.town

Not sure if I asked this before: Does anyone use anything in particular to inject #apache logs into #SQL databases? I have been looking around and asking around and the only solid I got was "do not expect an apache module for that; it would introduce too much latency to each request" in #httpd@libera.chat.

#SysAdmin #SRE

**Yuna** @LunaFreyja@hachyderm.io · 6d

Yuna @LunaFreyja@hachyderm.io

Your logs are lying to you - metrics are meaner and better.

Everyone loves logs… until the incident postmortem reads like bad fan fiction.
Most teams start with expensive log aggregation, full-text searching their way into oblivion. So much noise. So little signal. And still, no clue what actually happened. Why? Because writing meaningful logs is a lost art.
Logs are like candles, nice for mood lighting, useless in a house fire.

If you need traces to understand your system, congratulations: you're already in hell.

Let me introduce my favourite method: real-time, metric-driven user simulation aka "Overwatch".

Here's how you do it:

Set up a service that runs real end-to-end user workflows 24/7. Use Cypress, Playwright, Selenium… your poison of choice.
Every action creates a timed metric tagged with the user workflow and action.
Now you know exactly what a user did before everything went up in flames.

Use Grafana + InfluxDB (or other tools you already use) to build dashboards that actually tell stories:

* How fast are user workflows?
* Which steps are breaking, and how often?
* What's slower today than yesterday?
* Who's affected, and where?

Alerts now mean something.
Incidents become surgical strikes, not scavenger hunts.
Bonus: run the same system on every test environment and detect regressions before deployment. And if you made it reusable, you can even run the service to do load tests.

No need to buy overpriced tools. Just build a small service like you already do, except this one might save your soul.

And yes, transform logs into metrics where possible. Just hash your PII data and move on.

Stop guessing. Start observing.
Metrics > Logs. Always.

#Observability #MetricsOverLogs #DevOps

**Meysam** @meysam81@mastodon.social · 6d

Meysam @meysam81@mastodon.social

I'm starting to fall in love with #fedora distribution.

Installing a package is always one `dnf` away.

I've never experienced such a great ease and experience in my whole system administration life before.

#sre
#linux
#sysadmin
#ubuntu

Continued thread

**Jan Schaumann** @jschauma@mstdn.social · Apr 10

Apr 10

Jan Schaumann @jschauma@mstdn.social

System Administration

Week 10, Backups: Core Concepts

In this video, we begin our discussion of backups by covering some core concepts and terminology, looking at full vs. incremental vs. differential backups and the difference between long-term storage and disaster recovery of files due to more localized data loss.

https://youtu.be/IRu04Mc7VlA

YouTubeCS615 System Administration, Week 09, Segment 1 - Backups, Part IBy cs615asa

#SysAdmin #SRE #DevOps

Continued thread

**Buttered Jorts** @ajn142@infosec.exchange · Apr 10

Apr 10

Buttered Jorts @ajn142@infosec.exchange

And here’s the big reveal:

Virtual flash cards for the key terms for all of DevOps Institute’s exams. I took the glossaries from all their public study guides, deduplicated them, converted the courses they appear in to tags and added an exam they missed.

https://github.com/ajn142/DOI-Exam-Glossary

Reposting because I forgot the number one rule of chronological timelines (don’t post when everyone’s asleep lol).

GitHubGitHub - ajn142/DOI-Exam-GlossaryContribute to ajn142/DOI-Exam-Glossary development by creating an account on GitHub.

#DevOps #DevSecOps #SRE

**DevOps Weekly** @devops_discussions@mastodon.social · Apr 10

Apr 10

DevOps Weekly @devops_discussions@mastodon.social

Observability Migration - A new approach

https://www.cloudraft.io/blog/influxdb-to-grafana-mimir-migration

Discussions: https://discu.eu/q/https://www.cloudraft.io/blog/influxdb-to-grafana-mimir-migration

CloudRaft · Apr 7How We Migrated Terabytes of Metrics from InfluxDB to Grafana Mimir: A Complete Observability OverhaulLearn how we migrated 100TB of historical metrics from InfluxDB v1 to Grafana Mimir for a leading communication company. Discover challenges, architecture choices, dashboard conversion, and custom tooling like mimircli to modernize your observability stack with Prometheus, OpenMetrics, and PromQL.

#devops #kubernetes #sre

**Jamie Gaskins** @jamie@zomglol.wtf · Apr 10

Apr 10

Jamie Gaskins @jamie@zomglol.wtf

Site Reliability Engineering is often like Cassandra (not the database), where you tell devs the kinds of scaling issues they'll see if they continue following clever shortsighted patterns — you're frequently correct but they never believe you.

https://en.wikipedia.org/wiki/Cassandra

en.wikipedia.orgCassandra - Wikipedia

#sre

**Tiago F** @tiagojferreira@bolha.us · Apr 10

Apr 10

Tiago F @tiagojferreira@bolha.us

https://youtu.be/oEdUuF0IVeU?si=YNuYAawMkFZsJgLH

@badtux_
#git #SoftwareLivre #linux #linuxtips #devops #sre #amsterda

YouTubeTudo sobre a TRETA que deu origem ao GIT! 😱By LINUXtips

**DevOps Weekly** @devops_discussions@mastodon.social · Apr 8

Apr 8

DevOps Weekly @devops_discussions@mastodon.social

Job search journey as a DevOps/SRE/Platform engineer in Netherlands/Amsterdam(Dec '24 - Apr '25)

http://cargo.one/

Discussions: https://discu.eu/q/http://cargo.one/

cargo.onecargo.one | Where shipments are wonGenerate door-to-door quotes, win new business, and book with the largest portfolio of airlines & agents.

#devops #sre

Continued thread

**Jan Schaumann** @jschauma@mstdn.social · Apr 8

Apr 8

Jan Schaumann @jschauma@mstdn.social

System Administration

Week 9, Writing System Tools

This week we're going on a side-quest to discover solid #programming best practices that apply across simple scripting, prototyping, growing your tools, and owning a software product. We don't have videos for this topic, but the slides below include a lot of hopefully useful links ranging from coding style to ticket management and commit messages.

https://stevens.netmeister.org/615/09-writing-system-tools.pdf

A flow-chart like illustration of the evolution of sysadmin tools from shell aliases and scripts to /usr/local/bin tools to system-wide installation to full-fledged software product with an API. As complexity increases, the number of users increases, but the number of assumption we can make about the environment in which the code runs decreases.

#SysAdmin #DevOps #SRE

Continued thread

**JesseBot** @jessebot@social.smallhack.org · Apr 4

Apr 4

JesseBot @jessebot@social.smallhack.org

If you've tried both Thanos and Mimir, which do you prefer? Feel free to comment why below

56%Thanos
44%Mimir AKA Grafana Mimir

#thanos #prometheus #alloy

**JesseBot** @jessebot@social.smallhack.org · Apr 4 *

Apr 4 *

JesseBot @jessebot@social.smallhack.org

So, I've been using Thanos to receive and store my prometheus metrics long term in a self hosted S3 bucket. Thanos also acts as a datasource for my dashboards in Grafana, and provides a Ruler, which evaluates alerting rules against my metrics and forwards them to my alertmanager. It's ok. It's certainly got it's downsides, which I can go into later, but I've thinking... what about Mimir?

How do you all feel about Grafana's Mimir (source on GitHub)? It's AGPL and seems to literally be a replacement of Thanos, which is Apache 2.0.

Thanos description from their website:

Open source, highly available Prometheus setup with long term storage capabilities.

Mimir description from their website:

...open source software project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus and OpenTelemetry metrics.
Both with work with alloy and prometheus alike. Both require you to configure initially confusing hashrings and replication parameters. Both have a bunch of large companies adopting them, so... now I feel conflicted. Should I try mimir? Poll in reply.

#thanos #prometheus #alloy

Recent searches

Search options

Administered by:

Server stats:

#sre