fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

9.8K
active users

#logfile

0 posts0 participants0 posts today

Matrix Räume in Synapse aufräumen – einfach automatisiert

„80 % aller gespeicherten Daten werden nie wieder angeschaut.“Diese Statistik trifft nicht nur auf riesige Datensilos zu, sondern auch auf unsere Matrix-Chaträume, die mit Matrix Synapse laufen. Systemmeldungen, CI/CD-Ben...

techniverse.net/blog/2025/04/m
#Linux #AdminScript #Automatisierung #bash #cronjob #linux #Logfile #Matrix #purge_history #Selfhosting #Synapse

Day 14 of #100DaysOfCode:

Created a tutorial on analyzing millions of URLs:

🔵 2.4M URLs from a web server log file
🔵 Splitting into their components creates a 5.7GB (giga) DataFrame
🔵 Using the new output_file parameter saves the same data in a 67MB (mega) file
🔵 Read only the columns you want, while filtering for a subset of rows
🔵 Enjoy!

Notebook and video:

bit.ly/49socSd

www.kaggle.comURL-website-analysisExplore and run machine learning code with Kaggle Notebooks | Using data from URL DataFrame

GoAccess; lnav; agrind

(Tagline ‘splainer.)

We’ve all got’em, and most of us dread needing to have to look at them, as it usually means something’s gone awry.

Yep. We’re talking about log files. Dozens. Hundreds. Thousands…of log files. Of every shape, sort, and size.

Today we present three resources (one from “back in my day”, that is still cranking through text files, today) to help you get a handle on what your logs might be saying to you. Though, if you actually hear them saying something (and, you’re not using a screen reader), you have far more issues that what may lie in those files.

There should be something for any developers, system administrators, and data crunchers who regularly work with log files to troubleshoot issues, monitor systems, or analyze application behavior.

Type your email…

Subscribe

GoAccess

GoAccess (GH) is, for me, a blast from the past. It’s a pretty spiffy log file analyzer that offers a real-time, terminal-based, and web-based interface for monitoring web server statistics. It’s designed to be a fast, and with it, you can parse virtually any web log format, including — but not limited to — Common Log Format (CLF), Combined Log Format (XLF/ELF), W3C format (IIS), and Amazon CloudFront (Download Distribution). This flexibility means we can analyze logs from a wide variety of sources without the need for extensive configuration or setup.

One neat feature of GoAccess is its ability to generate real-time, interactive reports that can be viewed in a web browser. This is achieved through its own websocket server, which pushes the latest data to the browser, allowing users to see up-to-the-minute information about their web traffic. This real-time analysis is particularly useful for quickly diagnosing issues or understanding traffic patterns as they happen.

GoAccess also supports incremental log processing. This means that it can process logs in chunks, keep track of what it has already analyzed, and then continue from where it left off. This feature is handy when analyzing large log files or for continuous monitoring over long periods. The tool can also output its data in various formats, including HTML, JSON, and CSV, providing flexibility in how the analyzed data is consumed and shared.

lnav

The Logfile Navigator, lnav (GH), is an enhanced log file viewer that takes advantage of any semantic information that can be gleaned from the files being viewed, such as timestamps and log levels. Using this extra semantic information, lnav can do things like interleaving messages from different files, generate histograms of messages over time, and providing hotkeys for navigating through the file.” This terminal-based application also lets us merge, tail, search, filter, and query log files with ease. There’s no server to set up, no complicated configuration; just point it to a directory, and it takes care of the rest. The section header is it slupring up all my access logs.

It has direct knowledge of three particular, and one generic, log sources:

  • access_log: Apache common access log format
  • syslog_log: Syslog format
  • strace_log: Strace log format
  • generic_log: ‘Generic’ log format. This table contains messages from files that have a very simple format with a leading timestamp followed by the message.

The tool also has support for performing SQL queries on log files using the SQLite3 “virtual” table feature. For all supported log file types, lnav will create tables that can be queried using the subset of SQL that is supported by SQLite3. For example, to get the top ten URLs being accessed in any loaded Apache log files, we can execute:

;SELECT cs_uri_stem, count(*) AS total  FROM access_log  GROUP BY cs_uri_stem  ORDER BY total  DESC LIMIT 10;

Here’s the sad result on mine:

I really dislike staring at linux journalctl logs, but with journalctl | lnav they become way easier to triage.

Honestly, there’s so much packed into this tool that you really just have to try it out, which you can do without installing anything! Just do ssh playground@demo.lnav.org in a terminal and follow along with the tutorial.

Make sure to keep the extensive documentation link handy.

agrind

Photo by Maria Orlova on Pexels.com

NOTE: the proper name of this tool is angle-grinder.

I cannot find better words than the author’s intro to the tool so here that is:

The [Rust-based] ag utility lets us parse, aggregate, sum, average, min/max, percentile, and sort [our] data. [We] can see it, live-updating, in [our] terminal[s]. [It’s] designed for when, for whatever reason, [we] don’t have [our] data in graphite/ honeycomb/ kibana/ sumologic/ splunk/ etc. but still want to be able to do sophisticated analytics”.

“It can process well above 1M rows per second (simple pipelines as high as 5M), so it’s usable for fairly meaty aggregation. The results will live update in your terminal as data is processed. [What’s more, ag bundles a] bare bones functional programming language coupled with a pretty terminal UI.”

The basic premise is similar to that of jq: you feed it lines of text and filter + perform operations on them in a script you fit between quotes:

$ agrind '<filter1> [... <filterN>] | operator1 | operator2 | operator3 | ...'

Examples speak louder than templates.

I have to admit it was rather fun watching it live-update the counts of HTTP status codes across 53 (~220MB) of my rud.is main web server access logs

$ time cat rud.is.access.log*| agrind '* | apache | count by status'status        _count----------------------------200           909037301           98643202           41206304           39099404           34667206           8425302           4615403           3958499           3427405           1333101           749204           508502           370503           239400           106201           6500           4409           18.71s user 0.82s system 156% cpu 6.073 total

It supports defnining fields as named capture groups in regular expressions, which is pretty cool. For example, we can pick out the timestamp and path from all the GET requests in this synthetic Go Gin app log:

2024-02-26T12:00:01Z | INFO | 200 |   90ms | 192.168.1.1 | GET /api/v1/users2024-02-26T12:00:02Z | INFO | 201 |   45ms | 192.168.1.2 | POST /api/v1/users2024-02-26T12:00:03Z | INFO | 404 |   10ms | 192.168.1.3 | GET /api/v1/unknown2024-02-26T12:00:04Z | INFO | 500 |  120ms | 192.168.1.4 | PUT /api/v1/users/1232024-02-26T12:00:05Z | INFO | 200 |   78ms | 192.168.1.5 | GET /api/v1/posts2024-02-26T12:00:06Z | INFO | 403 |   12ms | 192.168.1.6 | DELETE /api/v1/users/1232024-02-26T12:00:07Z | INFO | 200 |   65ms | 192.168.1.7 | GET /api/v1/comments2024-02-26T12:00:08Z | INFO | 422 |   47ms | 192.168.1.8 | POST /api/v1/posts2024-02-26T12:00:09Z | INFO | 200 |   89ms | 192.168.1.9 | GET /api/v1/users/123/posts2024-02-26T12:00:10Z | INFO | 204 |   15ms | 192.168.1.10 | DELETE /api/v1/posts/123

via:

$ cat gin | agrind '"GET " | parse regex "^(?P<ts>[^|]+).*GET (?P<path>.*)"'[path=/api/v1/users]        [ts=2024-02-26T12:00:01Z][path=/api/v1/unknown]      [ts=2024-02-26T12:00:03Z][path=/api/v1/posts]        [ts=2024-02-26T12:00:05Z][path=/api/v1/comments]     [ts=2024-02-26T12:00:07Z][path=/api/v1/users/123/posts]        [ts=2024-02-26T12:00:09Z]

This example is from the README, but it shows how much nicer it is to do the processing in ag cs jq:

curl  https://api.github.com/repos/rcoh/angle-grinder/releases  | \   jq '.[] | .assets | .[]' -c | \   agrind '* | json         | parse "download/*/" from browser_download_url as version         | sum(download_count) by version | sort by version desc'version       _sum-----------------------v0.6.2        0v0.6.1        4v0.6.0        5v0.5.1        0v0.5.0        4v0.4.0        0v0.3.3        0v0.3.2        2v0.3.1        9v0.3.0        7v0.2.1        0v0.2.0        1

There are plenty more examples in the repo, and the author has a pretty cool blog post on the Rust journey taken to build the tool.

Type your email…

Subscribe

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️

https://dailydrop.hrbrmstr.dev/2024/02/26/drop-427-2024-02-25-whats-great-for-a-snack-and-fits-on-your-back-%f0%9f%aa%b5/

Beim Auswerten der Zugriffszahlen gab es eine Überraschung bei der Zahl der eindeutigen Besucher*innen: ab dem 22. November plötzlich scheinbar tausende.
Aber: alle Anfragen kommen aus zwei IP-Bereichen aus China. Die Aktionen scheinen total zufällig, auch der User-Agent scheint gewürfelt zu werden, daher die hohe Zahl. Für ein DOS ist es viel zu wenig. Mittlerweile ist der Spuk wieder vorbei. Keine Ahnung, was das sollte…