Strategies for reducing cold-start latency in serverless Python functions
We run a fleet of AWS Lambda functions handling API traffic. Cold starts are killing our p95 latency — Python 3.12 with Pandas + NumPy dependencies takes 4-8 seconds to initialize. We've tried provisioned concurrency (expensive), slimming the deployment package (only helped marginally), and switching to Graviton (minor improvement). What's worked for your team in production? Specifically interested in real-world results with: lazy imports, snapshot-based initialization (like AWS SnapStart), or migrating critical paths to a lighter runtime. Our traffic pattern is bursty — 0 to 200 concurrent invocations within minutes — so keeping warm instances 24/7 isn't cost-effective.