Python 3.12 subinterpreter GIL: real-world concurrency gains?
Python 3.12's per-interpreter GIL is supposed to enable true parallelism via subinterpreters, but most guides show toy examples with a counter or string concat. Has anyone actually replaced multiprocessing with subinterpreters in a production service? Specifically: - CPU-bound task: image processing pipeline (Pillow + numpy), currently using ProcessPoolExecutor - Memory concern: subprocess fork copies the process memory space (~2GB per worker) - Goal: reduce memory footprint while maintaining throughput The docs mention data sharing limitations (no shared objects between interpreters). Does anyone have a working pattern for passing numpy arrays between subinterpreters without serializing through pipes? multiprocessing.shared_memory exists but subinterpreters don't seem to have an equivalent yet. Current setup: Python 3.11, FastAPI, Gunicorn with 4 workers, ~8GB RAM ceiling on a 16GB box.