fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

10K
active users

The thing I often need the most but don't have is a private test machine with as many cpus as possible so I can do meaningful performance testing. For example, right now I want to test some refcount improvements but I lack a machine with enough cpus to do that which is really annoying.

@brauner it looks like we are both looking into scalability of reference counters in the Linux kernel. In my case it's in the scheduler+mm subsystems: lore.kernel.org/lkml/202410020

Are there specific reference counters which you suspect to be bottlenecks ?

lore.kernel.org[RFC PATCH 0/4] sched+mm: Track lazy active mm existence with hazard pointers - Mathieu Desnoyers

@DesnoyersMa I'll try that on the big box I have, curious! Not about the mm side specifically, just the hp case in general for other uses.

@axboe Let me know how it goes. Note that if you run into limitations with my minimalistic implementation, there are various ways it can be improved to cover more use-cases (e.g. more hazard pointer slots per CPU, dynamically adjusting the per-CPU scan depth, scanning for HP ranges, ...). My approach is to enhance it only when use-cases require it.

@DesnoyersMa Sure will do. It's 512 thread box, I'll run 24/48/96/192/256/512/1024 threads and dump the numbers here for -git and -git + patched.

Jens Axboe

@DesnoyersMa Here's the quick run, 48..2048 threads. System is a 2x 9754. Not sure this is what you expected, but it's 100% reproducible. Ran the tests twice on both, separate boots, and it's consistent. Test is context_switch1_threads -t<NUM THREADS>.

@axboe That's unexpected. I tested on a AMD EPYC 9654 96-Core Processor (2 sockets, 384 HW threads total) and got very different results. Perhaps we should share our kernel config by email.

@axboe scratch my previous comment. That's a 4.9x speedup (490%) for 192 threads ??

@DesnoyersMa Right, any of the Diff results are how much faster the patched kernel is compared to the stock one. So 192 threads, that's a +390% speedup, or 4.9x as fast.