Waiting Is a Skill

"Nature does not hurry, yet everything is accomplished."

Lao Tzu

A mutex, short for mutual exclusion, is a lock that only one thread can hold at a time. If thread A holds it, thread B waits. A simple idea that is very consequential in practice.

Python has a Global Interpreter Lock. The GIL is a mutex, a single lock, that the Python interpreter holds whenever it is executing Python bytecode. Only one thread can hold it at a time. This means that no matter how many threads you create, no matter how many cores your machine has, only one thread is ever running Python at any given moment.

Python's answer to thread safety was to make multithreading mostly not work for CPU-bound tasks. It was a design decision made in 1992 and it has been causing heated forum debates ever since.

And yet, Python powers some of the highest-traffic services on the internet. FastAPI handles tens of thousands of concurrent requests. Async database drivers, async HTTP clients, async queues. The language that cannot truly run two Python threads simultaneously is somehow doing all of this.

The answer is that for most services, you are not waiting on the CPU. You are waiting on the network.

I/O stands for input/output, which is a catch-all for any operation where your program talks to something outside itself, reading a file from disk, querying a database, making an HTTP request, receiving data from a socket. The defining characteristic of I/O is that it is slow, not because your code is slow, but because the thing on the other end is physically farther away than your CPU. Disk, network, database, all of them are orders of magnitude slower than RAM, and none of them are under your program's control.

Here is the thing about I/O. When your code makes a database query, your Python thread sends the query, and then it sits there. Waiting. For the database to process it and send back the results. That wait might be 10 milliseconds, might be 200 milliseconds, but during that entire time your thread is doing nothing. It is not computing. It is not running any Python code. It is just waiting. And while it waits, it is holding a thread, holding memory, holding an OS-level resource, doing absolutely nothing useful with any of it.

Multiply this by a thousand concurrent requests and you have a thousand threads, each waiting for their respective database queries, each sitting completely idle, collectively consuming gigabytes of memory and thousands of OS thread handles, all to wait. This is like hiring a thousand employees to stand by their phones and do nothing else.

The insight that asyncio is built on is simple, instead of giving each waiting task its own thread, have one thread that knows how to wait on many things at once and switches between them whenever one is ready.

This is called the event loop.

The event loop is a loop, that is the description. It runs continuously, asking the operating system, of all the I/O operations I have in flight, which ones are ready? The OS, using mechanisms like epoll on Linux and kqueue on Mac, monitors all those file descriptors and sockets and tells the event loop which ones have data ready to read or space ready to write. The event loop then wakes up the specific task that was waiting on that I/O and lets it run until it hits another wait point. Then it parks that task, checks for the next ready I/O, and wakes the next one.

One thread. Many tasks. Each task runs until it voluntarily yields control, either with await or by hitting an I/O boundary. This is called cooperative multitasking, and the cooperative part is the thing to understand. No task is ever preempted. No task gets paused in the middle of a computation because a timer went off. Every task runs until it decides to yield. The event loop depends on every task being a good citizen.

Run this and feel it:

import asyncio

async def task(name, delay):
    print(f"{name} starting")
    await asyncio.sleep(delay)
    print(f"{name} done")

async def main():
    await asyncio.gather(
        task("A", 2),
        task("B", 1),
        task("C", 3),
    )

asyncio.run(main())

This finishes in about 3 seconds, not 6. A, B, and C all start immediately, yield at their await asyncio.sleep, and the event loop runs the next one. When their respective timers expire, they are woken up and finish. Three seconds of wall time for six seconds of total waiting, because the waiting overlapped. One thread did all of it.

The await keyword is how a coroutine says, I am going to pause here. Take the event loop back. Do other things. Come back to me when the thing I am waiting on is ready. It is the mechanism of cooperative yielding. Without it, a coroutine runs to completion without ever releasing control, which means nothing else runs while it does.

async def marks a function as a coroutine. Calling it does not run it. It returns a coroutine object, a description of the work to be done. The event loop runs it. This is different from a regular function, which runs immediately when you call it.

The part that surprises most engineers, async is infectious. Once you write one async def function, everything that calls it must also be async def, because you can only await from inside an async context. This propagates up the call stack. If you have a sync function calling an async one, you are stuck, you have to either make the sync function async too or use asyncio.run() to create a new event loop, which blocks your current thread. The codebase migrates toward async gradually and somewhat relentlessly, like a tide coming in.

The most damaging mistake in asyncio is blocking the event loop.

import time
import asyncio

async def bad():
    time.sleep(5)  # THIS BLOCKS EVERYTHING

async def good():
    await asyncio.sleep(5)  # yields, everything else keeps running

time.sleep(5) in an async function does not yield. It blocks the OS thread that the event loop is running on. For those 5 seconds, nothing else in your entire async application runs. No other requests are processed. No other I/O is handled. The event loop is frozen, waiting for your blocking call to return. One requests.get() or open() or time.sleep() in an async function, and you have just turned your concurrent application into a sequential one while that call runs.

The debugging sign to look for, if your async service is handling requests sequentially under load, it feels fine at one request at a time and then falls apart under concurrency, you have a blocking call somewhere. Every request waits for the previous one to finish. The event loop is not idle. It is stuck.

A word on what asyncio does not solve. The GIL is still there. If your task is CPU-bound, say you are computing a hash, rendering an image, running a machine learning inference, asyncio gives you nothing. The event loop is one thread. CPU-bound work on one thread does not benefit from an event loop. For CPU parallelism, you want multiprocessing, not asyncio. Python's answer to "do multiple CPU-heavy things simultaneously" is separate processes, not threads or coroutines, because only separate processes have separate GILs.

Asyncio solves I/O concurrency. Multiprocessing solves CPU parallelism. They are different problems. Using asyncio for CPU-bound work is like hiring that phone-waiting employee and then asking them to also write your code while they wait. They are not going to be good at both.

The mental model worth holding is this -> the event loop is a single thread that knows how to wait on many things at once. async def defines a task that can pause and resume. await is the pause. The event loop fills the pauses with other tasks. The whole system is faster than threading for I/O-heavy workloads because it eliminates the overhead of creating, switching between, and destroying OS threads for tasks that spend most of their time waiting anyway.

The GIL is not the tragedy it sounds like. Most web services are I/O-bound. The bottleneck is the database, the external API, the message queue, not the CPU. For those services, asyncio gives you the concurrency you need without the cost of threads. For CPU-bound work, you reach for processes instead and stop arguing about the GIL.

Python learned to wait well. That turns out to be enough for most of what we actually build.

I await your questions.