AI / ML · inference egressUpdated 2026-04

LLM inference egress: per-token bandwidth math

For self-hosted LLM APIs, every response token leaves your cloud as egress. One token averages 4 bytes (UTF-8) plus SSE/JSON framing overhead (~2 bytes), so call it ~6 bytes per token. A 1000-token response is ~6 KB.

Per-million-token egress cost

Volume	Bytes egressed	AWS direct	CloudFront NA/EU	Cloudflare R2
1M tokens	~6 MB	$0.00 (well within free tier)	$0.00	$0.00
1B tokens	~6 GB	$0.54	$0.51 (post free tier)	$0.00
100B tokens	~600 GB	$54	$0 (under 1 TB free)	$0.00
10T tokens	~60 TB	$5,341.60	~$5,090	$0.00
100T tokens	~600 TB	$45,323	~$35,810	$0.00

Excludes inbound prompt tokens (free as ingress on all clouds). Assumes typical OpenAI-style streaming JSON envelope ~6 bytes / token. Larger envelopes (function calling, tool use) can be 12-20 bytes / token.

Egress vs compute cost

For an H100 GPU serving Llama-3 70B at ~2,000 output tokens/sec on a single GPU, hourly compute is ~$3 (AWS p5.48xlarge fractional). At full throughput that is 7.2M tokens/hr = 5.2B tokens/month. Egress at AWS direct rates: ~$2.78 / month for 31 GB of token bytes. Egress is <0.05 percent of inference cost at this scale.

For high-RPS API serving with multi-modal outputs (image generation, audio synthesis), egress becomes material:

Image generation: 1 MP PNG at 200 KB. 1M requests / month = 200 GB egress = $18 AWS direct.
Audio synthesis (TTS): 1 minute at 192kbps MP3 ~1.5 MB. 1M generations = 1.5 TB = $135 AWS direct.
Video generation: 5-second clip at 720p ~10 MB. 1M generations = 10 TB = $904 AWS direct.

At those volumes, putting outputs on R2 with a signed-URL handoff cuts egress to zero and adds only R2 storage cost ($0.015/GB/month).