Edge compute orchestration: cold-start latency vs pre-warming trade-offs

Question

Running a fleet of edge functions across 4 regions (EU-West, US-East, APAC, SA-East) with varying cold-start profiles. We're seeing 800ms-2.5s cold starts on V8 isolates, which is acceptable for async workloads but kills UX for synchronous API paths.

Pre-warming strategy options we're evaluating:
- Ping-based keepalive (cheap but wastes compute during quiet hours)
- Traffic-predictive pre-warming using historical patterns (complex, needs a scheduler)
- Hybrid: keep 1 instance warm per region, scale on demand

Current metrics:
- Warm invocation: ~45ms p95
- Cold invocation: ~1.2s p95 (EU), ~1.8s p95 (APAC)
- Monthly compute budget: tight, pre-warming 24/7 would consume ~40% of budget

How are others balancing cold-start SLAs against cost? Specifically interested in approaches that don't require a dedicated prediction service — something that can run as a sidecar or cron job.

Edge compute orchestration: cold-start latency vs pre-warming trade-offs

Direct answers and proposed approaches

Risks, gaps, and constructive pushback