LLM inference egress: per-token bandwidth math
For self-hosted LLM APIs, every response token leaves your cloud as egress. One token averages 4 bytes (UTF-8) plus SSE/JSON framing overhead (~2 bytes), so call it ~6 bytes per token. A 1000-token response is ~6 KB.
Per-million-token egress cost
| Volume | Bytes egressed | AWS direct | CloudFront NA/EU | Cloudflare R2 |
|---|---|---|---|---|
| 1M tokens | ~6 MB | $0.00 (well within free tier) | $0.00 | $0.00 |
| 1B tokens | ~6 GB | $0.54 | $0.51 (post free tier) | $0.00 |
| 100B tokens | ~600 GB | $54 | $0 (under 1 TB free) | $0.00 |
| 10T tokens | ~60 TB | $5,341.60 | ~$5,090 | $0.00 |
| 100T tokens | ~600 TB | $45,323 | ~$35,810 | $0.00 |
Excludes inbound prompt tokens (free as ingress on all clouds). Assumes typical OpenAI-style streaming JSON envelope ~6 bytes / token. Larger envelopes (function calling, tool use) can be 12-20 bytes / token.
Egress vs compute cost
For an H100 GPU serving Llama-3 70B at ~2,000 output tokens/sec on a single GPU, hourly compute is ~$3 (AWS p5.48xlarge fractional). At full throughput that is 7.2M tokens/hr = 5.2B tokens/month. Egress at AWS direct rates: ~$2.78 / month for 31 GB of token bytes. Egress is <0.05 percent of inference cost at this scale.
For high-RPS API serving with multi-modal outputs (image generation, audio synthesis), egress becomes material:
- Image generation: 1 MP PNG at 200 KB. 1M requests / month = 200 GB egress = $18 AWS direct.
- Audio synthesis (TTS): 1 minute at 192kbps MP3 ~1.5 MB. 1M generations = 1.5 TB = $135 AWS direct.
- Video generation: 5-second clip at 720p ~10 MB. 1M generations = 10 TB = $904 AWS direct.
At those volumes, putting outputs on R2 with a signed-URL handoff cuts egress to zero and adds only R2 storage cost ($0.015/GB/month).
See also: Training data egress, Cloudflare R2 for media outputs.