The Bandwidth Saturation Problem
Serving static text, HTML, or even optimized JPEG images is a solved problem for modern infrastructure. However, when an application needs to serve large media files—specifically high-definition video—the standard HTTP delivery models break down completely.
If a user clicks "Play" on a 500MB MP4 file, the naive approach is for the edge network to attempt to download the entire 500MB file from the origin storage bucket and stream it to the user.
The Buffering Crisis: If the user watches exactly 10 seconds of the video and closes the tab, the edge network has already downloaded 200MB of unnecessary data from the origin. You are charged for the massive egress bandwidth, and the user experiences severe buffering while the edge node struggles to pull the massive file into its local cache.
HTTP Range Requests
To serve large media files efficiently, the storage architecture must support and optimize for HTTP Range Requests.
When a modern video player (like the HTML5 tag) requests a file, it does not ask for the entire file. It asks for a specific byte range.
- The browser requests the first 2MB:
Range: bytes=0-2097151 - The server responds with a
206 Partial Contentand delivers those 2MB. - As the user continues watching, the browser seamlessly requests the next chunk:
Range: bytes=2097152-4194303.
Slicing the Edge Cache
The challenge with Range Requests is how they interact with the edge caching topology (the L1/L2 caches).
If a user requests bytes 0-2MB, should the edge node cache just those 2MB? Or should it preemptively pull the entire 500MB file from the origin to cache it for the next user?
Modern globally distributed object storage layers utilize Cache Slicing.
When the edge node receives a Range Request, it does not fetch the entire file. Instead, the edge node logically divides the file into standardized, manageable chunks (e.g., 5MB slices).
If the user requests the first 2MB, the edge node fetches the first 5MB slice from the origin, caches it locally, and serves the requested 2MB to the user. If the user continues watching, the edge node seamlessly fetches and caches the next 5MB slice.
// Example: Sliced Range Request Execution
Client -> Edge: GET /video.mp4 Range: 0-2MB
Edge -> Origin: GET /video.mp4 Range: 0-5MB (Fetch Slice 1)
Origin -> Edge: 206 Partial Content (5MB)
Edge [Caches Slice 1] -> Client: 206 Partial Content (2MB)
Eliminating Egress Waste
By implementing Cache Slicing, the infrastructure solves the Bandwidth Saturation Problem entirely.
If 1,000 users watch the first 10 seconds of a marketing video hosted on MyFunnelAPI, the edge nodes only pull and cache the very first 5MB slice from the origin bucket. The remaining 495MB of the video is never downloaded, never cached, and never billed for egress.
Furthermore, because the slices are standardized, if User B skips exactly to the middle of the video, the edge node simply fetches and caches the specific 5MB slice corresponding to that timestamp.
By pairing HTTP Range Requests with intelligent, sliced edge caching, engineering teams can serve massive media payloads globally with near-zero buffering and drastically reduced infrastructure costs.