Jeremy Allison writes:
'" The data shows that “frozen” vendor #Linux kernels, created by branching off a release point and then using a team of engineers to select specific patches to back-port to that branch, are buggier than the upstream “stable” Linux #kernel created by Greg Kroah-Hartman. '"
https://ciq.com/blog/why-a-frozen-linux-kernel-isnt-the-safest-choice-for-security/ #LinuxKernel
@kernellogger as usual, the point is not that these are bug free, but that they are regression free. The kernel upstream releases break userspace on every new release, and kernel maintainers don't care. See https://github.com/torvalds/linux/commit/a1912f712188291f9d7d434fba155461f1ebef66 for example, as Daan just found out, which removed a mount option without caring that it is still being used, so since 6.8 every btrfs device can no longer be mounted by systemd
thx, yeah, I already have been watching that.
1/ FWIW, I think you owe the kernel developers an apology, as you made a lot of noise and claimed "kernel maintainers don't care", when they clearly do once the problem was properly reported -- and quite quickly even. And yes, sure, in the ideal world they would have cared some more and performed a code-search before removing this option to prevent it in the first place. But we are all imperfect and make mistakes. Same for @pid_eins, who…
2/ …wrote "And my main beef here is that they claim they wouldnt do it ever..."[1], as that is not even true. They often try changes or removals to see if it breaks something – and if it does, it's reverted. Even the removal of the support for the original i386 was handled like that by Linus himself.
@kernellogger @bluca sure, but then the rule is not "we never break userspace" but more "move fast and break things, and sometimes revert where people protest too loudly".
I mean, that's fine by me, but maybe they should communicate it like that then.
The thing is that removing a widely documented mount option is very *obviously* a compat breakage. You cannot discount that. It's not just a "mistake" to remove something like that, it's an *obvious* attempt to break compat.
@pid_eins @kernellogger @bluca yeah in graphics we go with a 10 year delay for the obvious compat breakages
so either wait 10 years after the last known user was updated to the new interfaces (where we know of them, which is the usual case since it's all open source)
or 10 years after the replacement shipped for more script interfaces like some of the stuff in sysfs
@pid_eins @kernellogger @bluca 10 years seems to be enough where the only people you would end up breaking are those who don't upgrade kernels anyway, ever
@pid_eins @kernellogger @bluca of course there have been screw-ups and misses. but when those happen we try to put the references to the relevant userspace we broke into the reverts, so that people can start the 10 year clock at the right time
Actually, the exact relevant rule is "WE DO NOT BREAK USERSPACE", all in uppercase.
https://lkml.org/lkml/2012/12/23/75
I find the sound of that mail quite different from your much weaker "let's maybe undo the worst shit if people complain too loudly"... And of course "uh, sometimes we fucked up so hard, we cannot fix it anymore, let's add a new api instead" (which is what happened in the block device capabilities/media change api).
(again, I actually find it OK if API is broken from time to time, just be honest about it, and communicate properly, and do a bit of research first. Don't claim that uppercase extremism and then do not even superficially follow through)
hmmm:
$ grep -ri 'no regressions' Documentation/ | wc -l
13
$ grep -ri 'not break userspace' Documentation/ | wc -l
0
Also:
"WE DO NOT BREAK USERSPACE": 2 hits – https://lore.kernel.org/all/?q=f%3ATorvalds+%22WE+DO+NOT+BREAK+USERSPACE%22
"no regresssions": 44 hits –https://lore.kernel.org/all/?q=f%3ATorvalds%20%22no%20regressions%22
Seems like an easy way to handle stuff like this in the future is either use the kernel announce ml or create a non patch rfc/depreciation notice ml. And have a proper procedure for doing as well.
@kernellogger @pid_eins My impression, having more of an outside perspective and working with higher level languages: should deprecations perhaps always be gated with a config flag, perhaps even a common one similar to BROKEN?
With Java/Scala, it's always quite clear for me where deprecated methods are used. Also I can have builds fail due to that or not, so that I notice new deprecations when building / in CI.
there are various things that can work and I guess it depends on the situation what reasonable and effective.
For the kernel I something think "add delays (together with a msg in the logs) that grow longer and longer over time when people use deprecated stuff, at some point people get curious and will investigate" might be something that might help, OTOH it's a kind of stupid idea