Async without async
Async without async
To start with a short TLDR: this article is my exploration of implementing an asynchronous networking application, without using async Rust.
Background
Over the past months (if not years at this point) I have been playing around with some sane approaches of implementing consensus algorithms, and perhaps more general distributed systems.
As part of this journey I am seeking ways to have more control over the whole “application framework”. This recently led me to ask a question: Can I have a performant, IO heavy application without using async Rust?
Why not async?
In many ways async support in Rust is great. If you are just writing a
web application, the async and await keywords really make it very easy
to write the code as you would with sync Rust. However, everything
comes at a cost - async Rust brings in the complexity of the whole
async runtime, and hides a lot of what is going on from our sight.
One of the reasons for it is that Async Rust and accompanying runtimes are built to be a generalized solution, to support a lot of different cases and be robust in many different ways. To be able to do this some complexity arises naturally, which is then well hidden from us by async and friends. Not all this complexity is needed for every use case, and since there is no such thing as a free lunch, there may come the time to pay for it.
Another thing we sacrifice by using an async runtime is control. It is programmed in a specific way, with some knobs that we are able to tweak and some we are not. Until we understand the code thoroughly and grasp the possible code paths, there will always be a black box aspect to it.
What I have learned over the years is that sometimes it is better to ditch a one-size-fits-all, batteries-included solution, and build something simpler, use-case-specific, sacrificing some time but sparing yourself a lot of complexity, and retaining full control and better understanding of the system.
Part of this exploration is to answer the question whether it is worth it in this case.
Objective
Not using async Rust is not a goal in itself, but only a means to an end. The main objective remains to build a proof of concept of a simple system that could be used to implement more sophisticated software on top of it.
The goal is exploration, but there are a few constraints I want to satisfy.
-
Keep it simple
- The foundation needs to be simple, easy to reason about, troubleshoot and understand. Let the complexity arise from the problems that applications on top of it will be solving, not from its fundamental parts. Given that I want it to be single threaded, at least for as long as it is not a performance limitation.
-
Keep it real
- This application aims to be proof of concept, of something that
could be turned into a functional system. For me this implies:
- No busy waiting - I do not want to burn the CPU when nothing is going on.
- No added latency - When IO is ready, it should be processed, not wait until a few milliseconds sleep between loop iterations finishes.
- This application aims to be proof of concept, of something that
could be turned into a functional system. For me this implies:
-
Not just request trigger
- It is not a REST API I want to use it for, therefore applications built on top need to have a way of “triggering” some logic not only when a request arrives. To be more specific here, I am thinking of time based triggers, be it intervals or timeouts, there needs to be a way to run some logic based on those, and not just incoming IO.
Since I am ditching async Rust, and IO is still at the core of the application, the first step is to figure out how to handle it without the magic of Tokio. Let’s take a look at the possibilities.
Handling IO
If I were asked to write the simplest echo server to handle just one connection I would end up with something like this:
|
|
And that is the first, likely simplest approach to handling IO -
blocking IO. The application will block on the connection.read call
until there is something to read.
Now if I would have to handle multiple connections there are a few ways to extend it.
I suppose that the most intuitive one is to just handle each connection in a separate thread and keep accepting in the main one:
|
|
Clearly, this approach is more versatile than handling only one connection, but it is also clear that this approach violates one of my objectives – being single threaded.
Note: a variation of this could be process per connection, which is used by some systems. Still they often use async IO anyway.
The other option that we have allows us to keep our single thread, all we need is making sockets non-blocking and adding a bit more code:
|
|
However, a careful observer can immediately see that this violates another constraint, as the loop will just keep spinning burning all the CPU cycles. We could avoid busy looping by adding a short sleep between iterations, but that is added latency I want to avoid as well.
With all of those out of the way, and not suitable, we need to do a full circle and go back to async. Not necessarily async Rust but async IO nonetheless.
Asynchronous IO (or not really)
As the name suggests, asynchronous IO is not synchronous.

But what it really means is a bit complicated.
Async IO can work in different ways, and I am not sure if there is a real, correct definition of what is async and what is not. In general, when we talk about async, it is understood as something that happens “in the background” and there is some notification when “things are ready”.
Different systems work in different ways. With io_uring, IO happens in kernel space and user space application receives notification when the work is completed, while with epoll the application still does the dirty work of IO syscalls, and just receives the notification when there is progress to be made.
Deeper tangent: I would say that async is in the eye of the beholder. One can argue that epoll is not “real” async since the application only receives the event and all work still happens synchronously (in a non-blocking way, let’s say). However, if you go with this thinking then Rust tokio isn’t really async, since it is also the application that does the IO. “But io_uring is a real async!” You may object. In its case it is not the application that does IO, but the kernel itself. It must be a true async then! However, if you look at it from the perspective of the CPU (or even the kernel), it all happens on the same silicon (perhaps on different cores, but that is not for us to decide), so is it “async” after all?
Different operating systems have different APIs for async IO, to name a few:
kqueue- MacOS, BSDepollandio_uring- LinuxIOCP- Windows
There are other, older mechanisms on Linux as well, such as poll and select, but these days epoll is likely most prevalent, with io_uring being the newest and slowly getting more adoption.
Since penguins dominate the server world, I focused on Linux and took a deeper look into Epoll.
Epoll
I knew about epoll from the first time I asked myself a question “But how does async, actually work?” that led me into the deep rabbit hole of various kernel mechanisms, down to the realm of CPU interrupts (if you never went there, I highly recommend that journey). However, not being a C programmer, I never used it “directly”.
Most of the “async” web dev libraries in all languages rely on it, but hide it carefully under a few layers of abstraction, mainly because they are meant to work on all OSes and not just on Linux. But, let’s get to the point…
Epoll as a whole is an API in the Linux kernel that allows a process to register interest in IO events for a set of file descriptors.
There are 4 syscalls listed under the epoll man page:
epoll_createepoll_create1epoll_ctlepoll_wait
Names are somewhat self-explanatory, so I will not copy-paste definitions from the man page, feel free to check it out on your own.
I will not leave you empty-handed, however, and give you a quick intuition of how things work: Epoll is about events. Instead of constantly checking if there is any IO to be done, the user space application receives “notification”, when there is “progress” to be made. Since behind this mechanism is the kernel, while waiting for the events the waiting thread can “go to sleep” and get woken up when the IO event arrives, hence not wasting CPU cycles by spinning around checking all connections, and also not adding latency with an actual sleep.
One might ask: how do we know when to stop reading or writing then? Well, if you ask the socket politely it will tell you. As long as it is a non-blocking socket, as async IO is usually used in conjunction with those.
And by socket telling you what I mean is returning
EAGAINorEWOULDBLOCK.
A look inside
For each epoll instance created in userspace there is an
eventpollallocated on the kernel side. It contains a red-black tree ofepitems keyed by file pointer and file descriptor. When we register interest, the new tree node is inserted, and a callback is added to file descriptors’ wait queue. This callback is where the magic happens, as whenever we callepoll_waitour thread will be parked (if no interests are ready), and it is this callback’s job to wake it up (if the interest mask matches). Additionally when this happens the reference toepitemfrom the tree is inserted intoeventpoll’s ready list.
Now, to the more interesting part: how to actually use it. My goal here is to get a real glimpse of epoll in all its glory, not covered by the compatibility layers and easy to use abstractions.
Fine, fine… Using libc is not the lowest one can go, but it is good enough for today…

First things first, I need something that listens on a port and accepts connections. No epoll magic here, no async IO, just good old C:
|
|
This will do the work, as a simple echo server. However, as an example of single thread blocking IO (just in a different language), it can only handle one connection at a time.
Since async IO only makes sense with non-blocking sockets, the first step is to make the listening socket as such:
|
|
Now, this breaks the echo server, since accept will no longer block,
but return the error instead:
|
|
Which basically means that: “if the socket was blocking, the call would
block”. Now it is the time we need to create an epoll instance and
register read interest (EPOLLIN) for socket_fd on it:
|
|
Next, instead of just calling accept and handling the connection
directly in the main thread, we call epoll_wait inside the loop. When
the socket is able to accept a connection, epoll_wait returns,
putting an event into the buffer we pass to it. We then iterate
through new events, checking if the associated file descriptor is
the listening socket – in which case we accept all new connections
and add them to epoll – or regular connection socket otherwise.
|
|
Epoll interest can be registered as level-triggered or edge-triggered. Level-triggered is a default option and it will keep notifying while the interest “is fulfilled”. So if I register a TCP connection socket with read interest
epoll_waitwill keep waking with an event until I read all available data from that socket. Edge-triggered (EPOLLEToption) on the other hand will notify (at least once) only when the interest “becomes fulfilled”, so in the example above, only when new data arrives to the socket. More details can be found on already known to you man page.
Here I add TCP connections to epoll as edge-triggered, however, in this case it does not really matter, since I read all available data each time and we are not working around any constraints. I also do not care about writes as they are done in best effort fashion.
Gotchas with writes
Write interest is slightly more complicated. If we were to use level-triggered epoll, we would get wake up events as long as the socket is writable, which if we do not have anything to write will be all the time, hence the application will never “sleep”. One option here is to register write interest only when there is data to be written, and then remove it. It is not a problem with edge-triggered epoll, however, here we need to be mindful that we only get notified when the socket state changes to be writable, therefore if the socket was already writable, and we have new data to send, we will not be notified, so either again, we re-arm the epoll with write interest only when we have data to write, or whenever we have new data we attempt to write to socket immediately, and stop when the write would block.
Handling existing connections will change slightly as well, as now we
also need to handle EAGAIN and EWOULDBLOCK since connection
sockets are now non-blocking as well:
|
|
The write part is a bit simplified since I do not want to store an extra
state of what has been successfully written and what was not (I could
register connection fds with write interest as well (EPOLLOUT) and get
notified when there is some progress to be made writing).
Now to compile it and run
|
|
And connect from two separate terminal windows:
|
|
If I now start writing to the connections, I can see messages being echoed back, and server logs show its hard work.
|
|
It can handle multiple connections, runs in a single thread, does not add any artificial latency, and is not busy looping.
This checks the requirements. So, as all software does eventually, it is time to rewrite it in Rust.
This full code is available on Github.
Rust
Since I am not planning to become C wizard anytime soon, to build the foundation for something more complex and come back to the idea of “async without async”, there actually needs to be some async that I ditch, so Rust it is.
Do not expect fireworks, though, just a bit more “flashy” echo server…
Mio
To not libc myself into oblivion or unsafe my way to hell, I decide to take an easier path. Path well trodden by others, the secret async source behind Tokio - Mio.
Mio not only wraps around Epoll with a nice, easy-to-use API, but also does so over other OSes’ async APIs, making our app cross-platform!
We can now forget about epoll’s naked glory. However, the overall approach is the same as in C:
- Create a Poll instance (which on Linux uses Epoll)
- Register listener and accepted connections as a
Source - Wait for events, and handle IO in non-blocking way
And to be fair, there is not much more to it, since Mio is handling all the dirty work behind the scenes.
Poll's API is quite similar to what we saw in C code, but without
making your hands dirty with direct syscall calls.
As with C code, the first thing to do is register a listener socket.
To use it with Poll, it needs to be wrapped with
mio::net::TcpListener,
which provides the aforementioned Source trait implementation
expected by the
Registry::register(...) method (Registry lives inside Poll):
|
|
While in C events are associated with a specific file descriptor, to be
cross platform Mio uses
Token, which is
a wrapper around usize and allows us to map the event back to the
Source, for example a specific TCP connection, or as is the purpose
of listener_token, to TCP listener.
With Poll initialized, we can wait for events and process them:
|
|
And to process them, we again come back to Tokens, as each
mio::Event
is associated with the token used when registering event::Source:
|
|
It is analogous to our previous echo server, differentiating
between events to the listening socket and connection sockets, with
the difference that here we compare Tokens instead of file
descriptors.
Unlike in the C implementation, however, here we handle writes properly
by registering both READABLE and WRITABLE interests on the TCP
stream from an established connection. For that to work, writes are
initially appended to an in-memory buffer and then written to the
connection whenever we can make progress:
|
|
When reading and writing to the connection, we need to remember to
handle WouldBlock errors as “cannot do more, wait for next epoll
event”:
|
|
That gives us all necessary ingredients to be async in non-async world, and completes the echo server. However, if you paid attention there is one more objective to be taken care of.
Time driven action
A lot of applications have to do more than just handling IO, and do some work periodically, be it send some heartbeat or request some data from another system. Let’s consider those as time-based work, as opposed to request-based work that is triggered by IO events.
It is then time to add a killer feature to the mighty echo server. Every 5 seconds it is going to send its “status” to all connected clients. Yes, that’s it.
In async Rust one could simply do some
tokio::select
magic with a
interval timer
as one of the branches.
I still want the application to stay single threaded, so a separate
sleeping thread is also not an option. Fortunately, Epoll (or Mio) has
exactly what we need. With epoll_wait we can specify the timeout for
how long we want to wait for events, before the function will return
and presumably come back to the next iteration of the loop. Mio Poll
exposes the same functionality in poll method as well.
|
|
I already pass timeout through wait_for_events function. And that
brings me to the last piece of this puzzle, which is the aforementioned
loop that will send the status periodically, and wait for the IO
events when idle:
|
|
To set things straight, broadcasting the message is non-blocking as
well. All it does is extend the write buffer for all connections
and try to progress the write as much as it can until hitting
the WouldBlock error. In case of fatal connection errors, we just
drop them:
|
|
Just extending the buffer would not be enough, and we would not get any event if the connection would currently be writable, since the state did not change. This is the thing I mentioned here.
There is not a lot more magic than in C implementation, but for the sake of completeness, let’s run it.
You can find full source code on GitHub.
|
|
And if we connect from two terminal windows, we can see that the echo server is cruising.
|
|
|
|
Summary - Good Bad and Ugly
This concludes the echo server, and thus the proof of concept. As intended, I have achieved “async IO” without using async Rust.
Well, if we come back to the rant in Asynchronous IO section, maybe it is not so async. Perhaps this article should be named “Non blocking, event driven IO without async”, but hey, that is not very catchy. Anyway, I am digressing…
It, however, begs the question: was it worth it? If I were about to give you advice on writing echo servers, you are probably better off just using async, and that is likely true for most simple applications.
With this simple example it is hard to argue the case of async without async, yet I am going to give it a shot.
Bad
Writing an echo server in async Rust could probably be done in 1/3 of the
lines of code. For all IO operations instead of catching would block
errors, we would simply await. Mio Poll, although still there,
would be conveniently hidden inside the belly of the Tokio runtime.
Ugly
Things would get a bit ugly if we were to go deep into some more complex network protocol. Let’s consider that we need to perform a simple handshake. With Tokio we could simply:
|
|
What happens under the hood, is that the compiler would convert
this code into a state machine, with each await marking the state
transition, where it can yield.
This convenience is gone when we drop async Rust. For it to work with our framework, we would need to build this kind of state machine ourselves (unless we chose to block the thread), a greatly simplified example could be:
|
|
Each “state” logic could be called multiple times in case our “hands” are so big that they do not fit into send queues, or acks did not arrive fully at once, therefore there can be much more logic hidden inside function calls.
Without async, buffers and queues become your friends.
Good
Enough with the beating, though. I was supposed to argue for, not against, my own creation!
One thing we do not see in this code are Mutexes. Since everything
is single threaded and there are no
Tasks ready to jump
around different threads with the next await, no
Send + Sync + 'static, no Pins and Arcs, we do not annoy the
borrow checker, nor ourselves. There is no async runtime, so when
things get hot, it should be easier to troubleshoot and debug.
The whole system is much more deterministic, IO can be easier to
separate from the application logic (think some kind of event loop),
and perhaps a small thing, but there is no function coloring and
async spreading over the whole codebase.
Of course some of those things could be achieved with async as well.
Conclusion
To wrap up these lengthy conclusions, I am yet to experiment with this approach in more sophisticated systems and see whether it has some juice in it.
How it turns out, maybe you will be able to find out in the next
episode article. Until then I hope you enjoyed this experiment and
perhaps even learned something. If you have any questions, comments, or
suggestions, feel free to reach out. Thanks for reading!