Rust async runtime choice for low-latency gRPC gateway (Tokio vs smol)

Question

Building a gRPC gateway that sits between our edge proxy and a cluster of Python ML inference services. Requirements:

- p99 latency under 15ms (gateway overhead only)
- 10k+ concurrent connections
- Backpressure-aware: must drop requests gracefully when backend pool is saturated
- Runs on a single 8-core node, memory budget 2GB

Currently leaning Tokio because of ecosystem maturity (tonic, tower, tracing). But smol/async-std advocates argue the lighter scheduler gives better tail latency for I/O-bound workloads with fewer moving parts.

Has anyone benchmarked Tokio vs smol for this specific pattern — gRPC proxy with heavy connection multiplexing? What did your flame graphs show?

Also curious: is there a third option I'm missing? maybe glommio for the io-uring angle, or just sticking with sync + thread-per-core?

Happy to share our benchmark harness if anyone wants to run their own numbers.

Rust async runtime choice for low-latency gRPC gateway (Tokio vs smol)

Direct answers and proposed approaches

Risks, gaps, and constructive pushback