To all #Rust programmers on #Linux that want to use the kernel's new pidfd feature to asynchronously await on many child processes from the same thread -- reliably -- I've released the [pidfd] crate, which achieves precisely that.
Note, however, that Linux 5.3 is the minimum requirement to use this crate. Linux 5.4 is required to get the exit status from terminated processes (use the `waitid` feature).
Testing would be highly welcome!
@mmstick how is this OK? https://github.com/pop-os/pidfd/blob/2a4a09472df17cd55c771108688ec1028c6f2eff/src/lib.rs#L77-L79 if you wake the future immediately, it's going to get polled again, and it's going to be spinning forever
@mmstick observe this yourself: strace the provided example (getting a giant stream of poll() calls), and see it eat 100% of CPU.
The timeout is currently defined as 0ms, but it could be set to 1ms to eliminate that issue entirely.
I'm not aware of any means to notify the future that it needs to be polled again in the future. Without telling the context to wake it up, all of the futures block and never complete because they're waiting for something to externally wake them.
@mmstick that would *not* eliminate the issue, it will still spin a thousand times a second.
You have to wake the task/future from a reactor if some sort. Yes, realistically that means you need to depend on parts of either tokio or async-std. Or you could try doing that from the SIGCHLD handler, then you won't need a reactor.
Ideally I'd want a solution without relying on a reactor, so I will investigate that.
@mmstick to clarify, you're not supposed to poll the future every now and then to see what's up (whether is it Poll::Ready yet). The future is supposed to tell you exactly when it's ready to make progress, and only then you poll it again. (By you I mean the executor here.)
@bugaevc I will be publishing a new version of pidfd soon which uses its own simple reactor running in a background thread.
When a new FD is sent to the reactor, a byte is written to a pipe to cancel the poll operation, so that the reactor can add the new FD to the list, and resume polling. On process termination, the waker and atomic bool associated with the future is activated.
So this solves the problem of polling too often, and keeps CPU usage down to 0.
@mmstick this is much better than polling all the time (your initial approach) or sleeping all the time, blocking the whole thread (your 1ms approach)!
But I don't think it's okay to just spawn a thread from a random library in a language like Rust, either. See https://github.com/sozu-proxy/lapin/issues/201#issuecomment-527132559
An okay solution would be for your crate to have optional Cargo features that would enable integration with tokio and async-std reactors, and use an internal reactor if none of them is enabled.
@mmstick think of it this way: a complex async program might realistically use 500 different crates or more. What if each of those crates decided it wanted to spawn its own thread and implement its own reactor?
This is exactly what tokio and async-std are doing when they spawn their reactors, which as it stands, makes any crate that relies on them as hard dependencies on that runtime.
This is equally compatible with async-std and tokio, and only requires the one thread for all PidFd futures. Conditional compilation options could always be added to specialize.
There's currently no standard way to define reactors. That said, I'm not sure why 500 crates would need their own reactors.
@mmstick yes, but:
a) they let you configure it https://docs.rs/tokio/0.2.2/tokio/runtime/index.html#runtime-configurations
b) you're not tokio or async-std. pidfd is just one of the many utility crates, whereas tokio and async-std are *the* async runtime crates
Hundreds of small utility crates would want their own reactor for the same reasons as you do, because they need *some* reactor but don't want to outright depend on tokio or async-std.
@bugaevc Most crates have no need for a reactor, as they're merely using building blocks from other crates to create futures. Handling process FDs are pretty low level in comparison.
Tokio and async-std do not expose their reactor for crates to use, so pidfd would need to be directly integrated to them -- and thus there is no need for a project to use the pidfd crate directly. You would simply use the async process::Command type for that runtime.
@mmstick there are many, many use cases that require watching some fds, perhaps with a timeout. And as your own example makes it evident, authors of such crates prefer rolling their own reactors instead of using building blocks from other crates, 'cause how bad could spawning just one more thread be, right? :D
Here's a working demo of how one can make tokio reactor watch over pidfds https://paste.gg/p/anonymous/5aea495941264217a6af522868151743 Doing the same for async-std is sadly blocked on https://github.com/async-rs/async-std/issues/293 for the time being
@bugaevc It seems to me that reactors shouldn't be bound to either async-std or tokio. Why not have a generic FD reactor that all crates and runtimes can share?
The goal for many of projects has been to avoid the dependency on tokio so that we can use plain old futures, or async-std, instead. So it would be good to see async-std expose its watcher, although I'd also like to see a generic reactor that isn't dependent on either.
@mmstick tokio's reactor is exactly that! You realize that you can depend only on the reactor part of tokio by using the right Cargo feature (namely, "io-driver"), right?
The fact that there are two popular reactor crates in the ecosystem can be seen as unfortunate (because duh, fragmentation) or positive (because yay healthy competition).
@bugaevc Fragmentation is never an issue in an open source ecosystem -- it's a sign of strength -- but it's also nice to have options which aren't locked into a particular thing, so that it's easy to swap out components over time.
I'd like to see async-std and tokio seperate their components into standalone crates, so we don't have to pull in the whole dependency tree. There are features in both that would be pretty useful as their own crates.
@bugaevc Good news; I have separated the reactor from the pidfd crate, as [fd-reactor]. The pidfd crate is now updated to use this, which also fixes a bug from the last.
The first project of ours to take advantage of this is our new WIP [System76 Support] tool. All seems to be working. Significant reduction in dependencies from [dropping tokio].
[system76 support]: https://github.com/pop-os/support
[dropping tokio]: https://github.com/pop-os/support/commit/cf9edd0c5c9d2eb46efcc75de4153f346dd16222
@mmstick well, this looks simple enough for your use case, so maybe that's fine. Any serious reactor implementation has to use (or reimplemented the functionality of) mio (docs.rs/mio), which is what both tokio and async-std do. And I don't see why you would use your own reactor over tokio with its "io-driver" feature; it's also small and light on dependencies as long as you don't enable all the other features.
@mmstick But it sounds like you're still missing the big picture. You're hoping that most other async crates (that need watching over an fd) are going to use the fd-reactor crate — they won't. It's more of an ecosystem question than a technical one really — we don't want utility crates to spawn their own threads, so we have to hope they all standardize around a few reactor implementations.
@mmstick And the only two that have a chance of filling that role are tokio's and async-std's reactors. Maybe it would make more sense if fd-reactor had backends for integrating with tokio and async-std (when they expose the interface); then you could use it knowing that it won't spawn threads unless it has to.
Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.