Live pricingverified 2026-04
AI / ML · inference egressUpdated 2026-04

LLM inference egress: per-token bandwidth math

For self-hosted LLM APIs, every response token leaves your cloud as egress. One token averages 4 bytes (UTF-8) plus SSE/JSON framing overhead (~2 bytes), so call it ~6 bytes per token. A 1000-token response is ~6 KB.

Per-million-token egress cost

VolumeBytes egressedAWS directCloudFront NA/EUCloudflare R2
1M tokens~6 MB$0.00 (well within free tier)$0.00$0.00
1B tokens~6 GB$0.54$0.51 (post free tier)$0.00
100B tokens~600 GB$54$0 (under 1 TB free)$0.00
10T tokens~60 TB$5,341.60~$5,090$0.00
100T tokens~600 TB$45,323~$35,810$0.00

Excludes inbound prompt tokens (free as ingress on all clouds). Assumes typical OpenAI-style streaming JSON envelope ~6 bytes / token. Larger envelopes (function calling, tool use) can be 12-20 bytes / token.

Egress vs compute cost

For an H100 GPU serving Llama-3 70B at ~2,000 output tokens/sec on a single GPU, hourly compute is ~$3 (AWS p5.48xlarge fractional). At full throughput that is 7.2M tokens/hr = 5.2B tokens/month. Egress at AWS direct rates: ~$2.78 / month for 31 GB of token bytes. Egress is <0.05 percent of inference cost at this scale.

For high-RPS API serving with multi-modal outputs (image generation, audio synthesis), egress becomes material:

  • Image generation: 1 MP PNG at 200 KB. 1M requests / month = 200 GB egress = $18 AWS direct.
  • Audio synthesis (TTS): 1 minute at 192kbps MP3 ~1.5 MB. 1M generations = 1.5 TB = $135 AWS direct.
  • Video generation: 5-second clip at 720p ~10 MB. 1M generations = 10 TB = $904 AWS direct.

At those volumes, putting outputs on R2 with a signed-URL handoff cuts egress to zero and adds only R2 storage cost ($0.015/GB/month).

See also: Training data egress, Cloudflare R2 for media outputs.

Updated 2 May 2026