Fosstodon @fosstodon

Recent searches

Search options

Only available when logged in.

I'm probably never going to write the actual article I'd originally intended these charts for. But if you want to see the difference between latencies between #OpenZFS and #btrfs on an eight-drive system that's creating and replicating automated snapshots regularly, here ya go.

We're looking at fio random access, limited to (simultaneous) 8MiB/sec read and 23.0MiB/sec write. The system has eight 12TB Ironwolf rust drives, in four ZFS mirrors vs one eight-wide btrfs-raid1.

Sep 11, 2023, 06:49 PM·

4boosts·5favorites

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

In each case, the system is creating and replicating snapshots regularly.

Most of the latency deltas you're seeing come from the snapshot/replication tasks. Without those, you do still get ZFS advantages—but not this catastrophically severe.

Note that #btrfs is displaying latencies two and sometimes THREE orders of magnitude higher than #ZFS across a disturbing amount of the range of results. This is not just an issue at the absolute fastest or slowest ends of the scale, this is... normality.

**Jim Salter** @jimsalter · Sep 11, 2023 *

Sep 11, 2023 *

Jim Salter @jimsalter

You might very reasonably ask "what about btrfs-raid10?" especially considering that it's the closest-to-sane multi-drive #btrfs topography.

Well, it's just the tiniest bit slower than btrfs-raid1 (in latency terms) in my testing, but the shape of the graph is unchanged.

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

This is the fio control file I used for both ZFS and btrfs. (Hint: if you want the raw text, check the ALT tag; you can copy and paste from there.)

[global]
bs=64k
numjobs=8
iodepth=8
ioengine=libaio
group_reporting
name=fio
rw=randrw
rwmixread=25
rwmixwrite=75

[job1]
# rate limit to 1MiB/sec read per process, 3MiB/sec write per process
# --> total 8MiB/sec, 24MiB/sec total with our eight process workload
rate=1m,3m

size=10G
directory=/mnt/source/fio
# make data compressible. note: empirically confirmed this means
# data compresses to ~~30% smaller than uncompressed, NOT
# to ~~30% of its original size!
#
# this was fio's attempt at "real world compressiblity" most likely;
# in practice ZFS LZ4 compression gets 1.33x compressratio using
# this setting.
buffer_compress_percentage=30
refill_buffers
buffer_pattern=0xdeadbeef
# make sure the test doesn't end early, with writes still pending
# and uncommitted.
end_fsync=1

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

Snapshots are taken every five seconds; replication happens every twenty.

Adding insult to injury, in the roughly 45-minute runtime of each test, ZFS replicated 589 snapshots of 593 taken... btrfs only replicated 186, of 374 taken.

These tests were performed in Jan 2021, using the then-current HWE kernel for Ubuntu 20.04 (kernel v5.8).

The tests and charts demonstrate problems I first noticed *in production* seven years earlier, in Jan 2014.

**Greg Brooks** @keyboardg@mastodon.social · Sep 11, 2023

Sep 11, 2023

Greg Brooks @keyboardg@mastodon.social

@jimsalter the way every kernel seems to have some btrfs work in it, it would be cool to see that graph move over time.

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

@keyboardg personal gripe, as somebody who's followed the filesystem for more than a decade: it would be nice to see a LOT of btrfs-related things actually moving over time. But none of the things I really care about ever seem to.

**Ólafur Jens Sigurðsson** @ojs@c.im · Sep 11, 2023

Sep 11, 2023

Ólafur Jens Sigurðsson @ojs@c.im

@jimsalter Excuse my ignorance, but what does the x-axis stand for? Percentage of how full the filesystem is?

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

@ojs no pardon necessary, thank you for asking!

This is a range of fio latency results on a long-running test. What you're looking at is a line of individual data points running from best result (lowest latency) on the left, at x=0%, to worst result (highest latency) on the right, at x=100%.

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

@ojs so, let's say a graph like this showed two overlapping lines, but from 0%<x<5%, one line was higher. That would mean the two systems perform equivalently 95% of the time, but one block out of 20 is faster for the lower line... which probably doesn't much matter, since those are the fastest results anyway.

If you see the same at 95%<x<100%, that means one system is slower than the other on the worst-case 5%. This is more likely to be significant since the difference is where the pain lives.

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

@ojs what we're seeing here is much worse than either of those cases. These are log-scale graphs, meaning each major line on the Y axis represents an increase of 10x.

Take the read latency chart: for roughly 40% of the ENTIRE range, btrfs is 10x OR MORE slower to return each block than ZFS is. This is not a minor issue, it shows you a massive degradation that you will experience constantly with a similar workload.

**Jim Salter** @jimsalter · Sep 11, 2023

Sep 11, 2023

Jim Salter @jimsalter

@ojs moving on, because this is a rate-limited workload, that means we're not seeing how each system performs under the worst possible conditions on an unreasonably heavy workload: we're seeing how it operates with a REASONABLE workload that the hardware is more than capable of sustaining.

**Ólafur Jens Sigurðsson** @ojs@c.im · Sep 11, 2023

Sep 11, 2023

Ólafur Jens Sigurðsson @ojs@c.im

@jimsalter Fantastic, thanks for the clarification

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back