Why Node.js and Python Don't Use Linux's Native Async I/O API

November 8, 2025 · 11 min read

Founder, Vertexcover Labs - AI-native engineering studio

TL;DR:

Linux provides a native Asynchronous I/O (AIO) API that should theoretically allow true background file operations. Yet popular frameworks like Node.js and Python's asyncio avoid it entirely, relying instead on thread pools with blocking I/O. Why? The answer reveals important lessons about API design, cross-platform compatibility, and the gap between theoretical elegance and practical engineering. The future may lie in io_uring, a modern interface that addresses many of AIO's shortcomings.

The Puzzle

If you've worked with async programming in Node.js or Python, you know that file I/O operations like fs.readFile() or aiofiles.read() don't truly run in the background at the kernel level. Instead, these frameworks use thread pools to simulate asynchronous behavior—spawning threads that make blocking system calls while the main event loop continues.

This seems wasteful. Why maintain a thread pool when Linux provides a dedicated AIO API specifically designed for asynchronous I/O? The answer isn't obvious, and understanding it requires examining how different I/O models actually work in practice.

Three I/O Models in Linux

1. Blocking I/O

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);

Behavior: The process halts until the operation completes. If data isn't available, the call blocks the entire thread. Simple to use but creates concurrency bottlenecks.

Use case: Single-threaded programs or situations where blocking is acceptable.

2. Non-Blocking I/O

int fcntl(int fd, int cmd, long arg);  // Set O_NONBLOCK
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);

Behavior: When you set the O_NONBLOCK flag on a file descriptor, read and write return immediately. If the operation can't complete, they return -1 with errno set to EAGAIN or EWOULDBLOCK.

The Critical Caveat: This works perfectly for network sockets, pipes, and device files, but regular file I/O ignores the non-blocking flag. Opening a file with O_NONBLOCK has essentially no effect—file reads and writes block regardless. This is a fundamental limitation of how the Linux VFS (Virtual File System) layer interacts with filesystems.

Integration Pattern: Non-blocking I/O typically pairs with multiplexing mechanisms:

// Pseudo-code for event loop
while (true) {
    int ready = epoll_wait(epoll_fd, events, MAX_EVENTS, timeout);
    for (int i = 0; i < ready; i++) {
        if (events[i].events & EPOLLIN) {
            // Socket is readable, call read()
        }
        if (events[i].events & EPOLLOUT) {
            // Socket is writable, call write()
        }
    }
}

Why This Matters: Since file I/O blocks regardless of the flag, async frameworks must use thread pools for disk operations while using non-blocking I/O for network operations. This hybrid approach adds complexity but is necessary given the limitations.

3. POSIX Asynchronous I/O (AIO)

#include <aio.h>

int aio_read(struct aiocb *cb);
int aio_write(struct aiocb *cb);
int aio_error(const struct aiocb *cb);
ssize_t aio_return(struct aiocb *cb);

Behavior: These functions initiate truly asynchronous operations. You submit a request via aio_read or aio_write, which returns immediately. Later, you check completion status with aio_error and retrieve results with aio_return.

// Example usage
struct aiocb cb;
memset(&cb, 0, sizeof(cb));
cb.aio_fildes = fd;
cb.aio_buf = buffer;
cb.aio_nbytes = size;
cb.aio_offset = offset;

aio_read(&cb);  // Initiates async read, returns immediately

// Later, check if complete
while (aio_error(&cb) == EINPROGRESS) {
    // Do other work
}
ssize_t bytes_read = aio_return(&cb);

On paper, this looks ideal. The kernel handles I/O in the background while your application continues executing. So why don't frameworks use it?

Why Frameworks Avoid POSIX AIO

1. Platform Fragmentation and Incompatibility

Multiple Incompatible APIs: Cross-platform runtimes must support:

Linux native AIO (io_submit, io_getevents)
POSIX AIO (different implementations across BSDs, macOS, Linux)
Windows Overlapped I/O (completely different model)

Each requires platform-specific code paths, different error handling, and separate testing matrices. A thread pool provides a single, uniform abstraction that works identically everywhere.

Platform-Specific Limitations:

macOS POSIX AIO caps concurrent operations at ~16, making it unusable for high-throughput applications
Some OSes lack async support for critical operations like close(), rename(), stat(), or fsync()
Behavior differs across filesystem types (ext4 vs XFS vs NFS)

2. Linux AIO's Strict Requirements

Direct I/O Mandate: To get true async behavior with Linux native AIO (io_submit), you must use the O_DIRECT flag, which has onerous requirements:

512-byte alignment: Both buffer addresses and file offsets must align to 512-byte boundaries
Bypasses page cache: O_DIRECT skips the kernel's buffer cache, which can hurt performance for small, cached operations
Complex buffer management: Arbitrary reads/writes require custom allocators and padding

// Linux AIO with O_DIRECT requires this complexity
void* buffer;
posix_memalign(&buffer, 512, ALIGNED_SIZE);  // Must align to 512 bytes
off_t offset = 1024;  // Must be multiple of 512

// This would FAIL with EINVAL:
// off_t offset = 1000;  // Not 512-aligned

Why This Breaks Node.js/Python: Applications frequently work with arbitrary offsets and sizes (reading JSON configs at byte 127, writing logs of variable length). Handling alignment for every operation adds enormous complexity.

3. Problematic Completion Mechanisms

POSIX AIO Notification Issues:

Signals: Process-global, conflict with application code and other libraries, difficult to multiplex
aio_suspend: Scales poorly—you must pass an array of all pending operations to check for completion
Neither integrates cleanly with epoll/kqueue-based event loops

Linux AIO: Uses eventfd or io_getevents, which can integrate with event loops but adds significant complexity compared to "submit job to thread pool, get callback when done."

4. Incomplete Operation Coverage

Even if you solve read/write asynchrony, many filesystem operations remain blocking:

open() - directory traversal can block on network filesystems
close() - may block flushing buffers (especially on macOS HFS+)
stat() / fstat() - metadata lookup can be slow
fsync() / fdatasync() - explicit flushes
rename(), unlink(), mkdir() - directory operations

Thread Pool Advantage: Treating all filesystem operations uniformly as "potentially blocking" simplifies the programming model. You don't need special cases for which operations are async vs sync.

5. Thread Pools Are Actually Good

Efficiency: Modern thread pools (like libuv's in Node.js) are highly optimized:

Thread reuse eliminates creation/destruction overhead
Bounded concurrency prevents system overload
Work stealing and priority queues optimize scheduling
Blocking in a worker thread doesn't affect the event loop

Advanced I/O Support:

readv/writev for scatter-gather I/O (POSIX AIO lacks this)
Easy integration with compression, hashing, or encryption pipelines
Straightforward cancellation semantics

Real-World Performance: For many workloads, thread pool overhead is negligible compared to actual I/O time. A 0.1ms thread scheduling cost is insignificant when disk latency is 5-10ms.

Comparison Table

Feature	Non-Blocking I/O + Thread Pool	POSIX AIO
Network Operations	✅ True async via epoll/kqueue	❌ Not designed for sockets
File Operations	⚠️ Blocking calls in thread pool	✅ True async (in theory)
Cross-Platform	✅ Works everywhere identically	❌ Platform-specific quirks
Buffer Alignment	✅ No restrictions	❌ Linux requires O_DIRECT alignment
Operation Coverage	✅ All operations uniform	❌ Many operations remain blocking
Event Loop Integration	✅ Clean callback model	⚠️ Complex signal/suspend mechanisms
Vector I/O	✅ readv/writev support	❌ Not available
Learning Curve	✅ Straightforward	❌ Steep, error-prone

Where AIO Actually Succeeds

It's important to note that AIO isn't universally avoided. It works well in specialized contexts:

Database Systems: PostgreSQL, MySQL/InnoDB use Linux native AIO for tablespace I/O. They benefit because:

Database pages are naturally aligned (typically 8KB or 16KB blocks)
O_DIRECT is desirable to avoid double buffering (page cache + database buffer pool)
They control the entire I/O path and can absorb implementation complexity

High-Performance Applications: Custom storage engines, video processing pipelines, and HPC applications use AIO when they need maximum throughput and can handle the alignment requirements.

The Key Difference: These applications control their data format and can architect around AIO's constraints. General-purpose runtimes like Node.js must handle arbitrary user code.

The Future: io_uring

What Is io_uring?

Introduced in Linux 5.1 (2019), io_uring is a modern asynchronous I/O interface that addresses many of AIO's shortcomings:

#include <liburing.h>

int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags);
void io_uring_prep_read(struct io_uring_sqe *sqe, int fd, void *buf, unsigned nbytes, off_t offset);
int io_uring_submit(struct io_uring *ring);
int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe);

Architecture: Uses two ring buffers (submission queue and completion queue) shared between user space and kernel space, minimizing system call overhead.

// Simplified io_uring usage
struct io_uring ring;
io_uring_queue_init(256, &ring, 0);

// Prepare a read operation
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buffer, size, offset);

// Submit to kernel (batches multiple operations)
io_uring_submit(&ring);

// Later, retrieve completions
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
ssize_t result = cqe->res;
io_uring_cqe_seen(&ring, cqe);

Key Advantages Over AIO

No O_DIRECT Requirement: Works with buffered I/O, eliminating alignment constraints. Applications can use the page cache naturally.
Lower System Call Overhead:
- Batch multiple operations in a single io_uring_submit()
- Polling mode can eliminate syscalls entirely for high-throughput scenarios
- Reduces context switches by orders of magnitude
Unified Interface:
- Handles file I/O, network I/O, and more through a single API
- Supports read, write, accept, send, recv, fsync, openat, close, etc.
- Even supports operations like timeout and poll
Fixed Buffers and Files:
- Pre-register buffers and file descriptors for zero-copy operations
- Eliminates per-operation memory pinning overhead
Modern Event Loop Integration: Designed from the ground up to work with epoll-style programming models.

Real-World Performance

Benchmarks show significant improvements:

50-80% reduction in CPU usage for I/O-heavy workloads
2-3x throughput for high-concurrency scenarios
Particularly impressive for mixed read/write workloads

Current Adoption Status

Node.js: Experimental support added in Node.js 21 (2023) behind the --experimental-io-uring flag. Disabled by default due to:

Security concerns: io_uring can bypass certain permission checks with symbolic links
File descriptor exposure: Potential for unauthorized access if misused
Kernel version requirements: Requires Linux 5.10+ for stable operation

Python: python-liburing bindings exist, but not yet integrated into asyncio core. Community experimentation ongoing.

Rust: The io-uring and tokio-uring crates provide production-ready wrappers, seeing increasing adoption.

C/C++: Libraries like liburing are mature and widely used in performance-critical applications.

Challenges and Limitations

Linux-Only: No cross-platform support (yet). Runtimes still need thread pool fallbacks for other OSes.
Security Surface: New CVEs discovered periodically (SQPOLL mode issues, privilege escalation bugs). Some distros disable io_uring by sysctl.
Complexity: While better than AIO, it's still more complex than "submit job to thread pool."
Ecosystem Maturity: Fewer resources, examples, and debugging tools compared to established patterns.

When to Use What

Use Thread Pools + Blocking I/O When:

Building cross-platform applications
Working with arbitrary file formats and offsets
Simplicity and maintainability are priorities
I/O isn't the primary bottleneck

This is the right default for most applications

Use io_uring When:

Building Linux-specific high-performance systems
I/O throughput is critical (storage systems, proxies, databases)
You can absorb the complexity and security considerations
Targeting modern kernels (5.10+)

Use POSIX AIO When:

You have legacy requirements
Operating in a controlled environment with aligned I/O patterns

Generally avoid in new projects

Conclusion

The question "why don't frameworks use Linux AIO?" reveals a deeper truth about systems programming: elegant APIs don't always win. POSIX AIO offers theoretical purity—true asynchronous I/O at the kernel level—but fails in practice due to platform fragmentation, onerous constraints, and incomplete coverage.

Thread pools, despite appearing crude, provide a pragmatic solution that works reliably across platforms, handles all operations uniformly, and performs well enough for most use cases. The overhead of thread scheduling is negligible compared to actual I/O latency.

The emergence of io_uring represents a potential shift. By learning from AIO's failures—eliminating alignment requirements, reducing syscall overhead, and providing unified operation coverage—it may finally deliver on the promise of kernel-level async I/O. However, adoption remains cautious due to its Linux-only nature and evolving security posture.

The takeaway: When designing systems APIs, developer experience, cross-platform consistency, and practical constraints often matter more than theoretical elegance. The "best" solution isn't always the most sophisticated one—it's the one developers can actually use reliably.

The Puzzle​

Three I/O Models in Linux​

1. Blocking I/O​

2. Non-Blocking I/O​

3. POSIX Asynchronous I/O (AIO)​

Why Frameworks Avoid POSIX AIO​

1. Platform Fragmentation and Incompatibility​

2. Linux AIO's Strict Requirements​

3. Problematic Completion Mechanisms​

4. Incomplete Operation Coverage​

5. Thread Pools Are Actually Good​

Comparison Table​

Where AIO Actually Succeeds​

The Future: io_uring​

What Is io_uring?​

Key Advantages Over AIO​

Real-World Performance​

Current Adoption Status​

Challenges and Limitations​

When to Use What​

Use Thread Pools + Blocking I/O When:​

Use io_uring When:​

Use POSIX AIO When:​

Conclusion​

Further Reading​