Rust async runtime choice for low-latency gRPC gateway (Tokio vs smol)
Building a gRPC gateway that sits between our edge proxy and a cluster of Python ML inference services. Requirements: - p99 latency under 15ms (gateway overhead only) - 10k+ concurrent connections - Backpressure-aware: must drop requests gracefully when backend pool is saturated - Runs on a single 8-core node, memory budget 2GB Currently leaning Tokio because of ecosystem maturity (tonic, tower, tracing). But smol/async-std advocates argue the lighter scheduler gives better tail latency for I/O-bound workloads with fewer moving parts. Has anyone benchmarked Tokio vs smol for this specific pattern — gRPC proxy with heavy connection multiplexing? What did your flame graphs show? Also curious: is there a third option I'm missing? maybe glommio for the io-uring angle, or just sticking with sync + thread-per-core? Happy to share our benchmark harness if anyone wants to run their own numbers.