What latency actually means in live betting
When we say 'sub-50ms odds latency' we mean a specific number: the p50 wall-clock time between the moment the source feed (Betradar, Genius Sports, BetGenius) publishes an updated price into our aggregator and the moment that price renders in a player's open WebSocket connection. Not server-to-server. Not measured at our edge. Browser-receipt to browser-receipt — the latency the player actually experiences.
Sporbet Soft's production number is 47ms p50, 180ms p99, measured across all enabled markets for all partners over a 30-day rolling window. The p99 is the more useful number for risk teams: a 180ms tail means a sharp bettor cannot reliably arbitrage the platform against a separate live data feed, because the arbitrage window is shorter than human reaction time. Platforms running with 500ms+ p99 latency get arbitraged. Platforms running with 50ms p50 do not.
The architectural rationale for prioritising latency lives in our integration anatomy piece — this article zooms in on the four subsystems that actually deliver the number.
Redis pub/sub fan-out and partitioned subscriptions
The first subsystem is the fan-out layer. When an updated price event lands in our aggregator (after multi-feed validation and margin application), it gets republished onto a Redis pub/sub topic. The topic key is structured: odds:{sport_id}:{league_id}:{market_id}. The choice of granularity matters. Too coarse — for example a single odds topic — and every subscriber gets every event, which crushes the network. Too fine — for example a per-event topic — and a popular Champions League final spawns 50,000 topics for one match.
Sporbet Soft's partition strategy is market-level for top-100 leagues and league-level for the long tail. A player subscribing to a Premier League match's match-result market subscribes to one topic and receives one stream of price updates for that market only. Server-side, the WebSocket handler maintains a per-connection set of subscribed topics and uses Redis's PSUBSCRIBE with the relevant patterns. The fan-out is O(subscribers per topic), not O(all subscribers).
Redis is the right tool here because the latency budget for the fan-out hop is under 5ms. A managed Postgres or a Kafka cluster cannot hit that number consistently at the throughput sportsbook live odds generate during peak — a single Champions League match-week pushes 800K events/minute through the fan-out. Redis cluster sustains that with sub-millisecond p99 publish latency.
Sticky-session WebSockets and the connection state machine
The second subsystem is the WebSocket fleet. Every player session opens a single long-lived WebSocket connection to one of our edge gateways. The connection is sticky — i.e. routed by a Layer-4 load balancer with consistent hashing on a session cookie — because tearing down and re-establishing a WebSocket on every odds event would be catastrophic for latency (TLS handshakes alone run 50-150ms).
Stickiness creates its own problem: an unbalanced fleet where one node holds 10x the connections of its neighbours. We solve this with a coordinated drain pattern. When a node's connection count exceeds a threshold, it stops accepting new connections at the load balancer but continues serving its existing ones. The load balancer routes new connections to under-loaded nodes. When a node needs to be deployed-around, it sends a coordinated graceful-close to its connections with a backoff hint, and clients reconnect to a different node within 200-500ms.
The connection state machine itself is more interesting than people expect. Each connection holds the subscribed topics, the player's authenticated session, the back-pressure watermark and the last-acknowledged event ID. On reconnect — common on mobile networks — the client resumes from the last-acknowledged event ID and the server replays missed events from a ring buffer. That ring buffer is sized to cover a 30-second reconnect window, which covers 99.9% of real-world reconnects. For anything longer the client re-fetches the current state via REST. The same idempotency pattern that our settlement engine uses for partial cashout shows up here.
Edge-cached static shell and the time-to-interactive budget
Latency for an existing connection is one thing. Latency for the first paint — the time between a player navigating to the iframe and seeing fresh odds — is another. Sporbet Soft hits sub-1.5s time-to-interactive on mobile by edge-caching the iframe's static shell.
The shell is a small HTML document (under 50KB, gzipped to about 14KB) plus a JavaScript bundle under 200KB. Both are served from a CDN — Cloudflare in our case — with aggressive cache headers (Cache-Control: public, max-age=31536000, immutable) and content-hashed filenames so cache invalidation is a deploy artefact, not a runtime concern. The CDN absorbs the static-shell load entirely; no request for the shell ever reaches our origin under normal traffic.
Inside the shell, the JavaScript runs through a precise sequence: parse, hydrate, open WebSocket, subscribe to default markets, render. Each step is budgeted. WebSocket open completes inside 100ms on a warm DNS cache; first market list arrives within 50ms of subscription; first render frame is queued before that arrives, with skeleton state. The whole sequence is tuned to clear Google's INP threshold of 200ms with margin — see sportsbook pricing for how that performance budget folds into the flat fee.
Back-pressure protection: what happens when an upstream feed surges
Live odds traffic is not smooth. A red card in injury time at El Clásico generates a 50x spike in price events over a 30-second window as every market in the match repriots. A platform without back-pressure protection sees those events queue up in the WebSocket send buffer, the queue grows faster than the client can drain it, the kernel marks the connection writable forever, and eventually the gateway runs out of file descriptors and falls over. We've seen this happen at three competitor platforms during 2024 alone.
Sporbet Soft's gateways implement explicit back-pressure with two mechanisms. First, per-connection send-buffer high-water marks: when a connection's pending bytes exceed the watermark, the gateway coalesces queued events for that connection — merging two updates for the same market into one — and drops intermediate updates that have been superseded. The client only sees the latest price, not the historical sequence. Second, fleet-wide circuit breakers: when 5% of connections are in back-pressure mode, the gateway sheds new subscriptions and prioritises existing high-value markets (Premier League, NFL, top tennis) until pressure subsides.
Critically, back-pressure never causes the gateway to accept stale odds. The integrity invariant — every connection sees a monotonically-non-decreasing event sequence for each market — holds under all conditions. A player in back-pressure mode might see fewer intermediate updates, but never an out-of-order or stale one. This is non-negotiable for a deterministic settlement engine.
Multi-region active-active and sub-second failover
The last subsystem is the failover topology. Sporbet Soft runs three regions — EU-West, EU-Central, Middle-East — in active-active configuration. Every region has a full copy of the WebSocket fleet, the Redis cluster, the aggregator and the settlement engine. The aggregators in each region subscribe to the same upstream feeds and apply the same margin model; they produce byte-identical event streams because the margin function is deterministic (the same pattern from our settlement engine deep-dive).
DNS-based geo-routing sends each player to their nearest region. When a region degrades — typically detected by a health probe that fails three consecutive 500ms checks — the DNS layer reroutes new connections to the next-nearest region within 30-60 seconds, and existing connections receive a graceful-close signal that triggers client-side reconnect to a fresh region. End-to-end failover from healthy to fully-recovered is under one second of user-visible disruption on a mobile network.
The active-active model also enables zero-downtime deploys. We drain one region at a time, deploy, warm it up, and rotate traffic back. Operators on the iframe — see B2B sportsbook iframe — never see a maintenance window. The 99.95% uptime SLA on our partner contracts is structural, not aspirational.
Why API polling doesn't get you there
Operators evaluating API-only integrations sometimes ask whether they can hit similar latency by polling a REST odds endpoint at high frequency. The answer is no, and the math is straightforward. A 100ms poll interval gives a worst-case 100ms freshness gap (plus network round-trip — call it 150-250ms on a mobile network) and burns request budget on the vendor's REST endpoint linearly with concurrent users. At 100k concurrent players that's 1M requests/second to the vendor — most vendors will rate-limit at 1% of that.
Push-based WebSocket fan-out scales the other way: the vendor sends one event per market update regardless of subscriber count, and the fan-out cost is borne by the WebSocket gateway fleet (which is cheap to scale horizontally). The freshness gap is bounded by network latency, not by poll interval — which is how we hit 47ms p50 instead of 250ms p50.
If you're evaluating an API-only build, the live-betting latency budget alone is usually enough to push the decision toward iframe — see our iframe vs API comparison for the full TCO and ownership trade-offs. Operators who underestimate this end up rebuilding their live-odds path in year two.