Advanced22 min read

Designing a video streaming platform

From one origin server to a CDN-backed, transcoded, regionally aware playback platform with reliability, content protection, and cost controls across six revisions.

Why this design question bites

A video streaming platform looks like storage plus playback. Then you trace one viewer's session from clicking play to the credits, and it stops looking simple. The system has to swallow large uploads, turn them into playable renditions, serve tiny segments reliably to a phone on a 4G connection, keep search and browsing fast, enforce who's allowed to watch what, and pull quality and cost signals out of all of it — without ever putting analytics or rights checks onto the segment fetch path.

The mistake is to optimize the parts in isolation. Your CDN can make media cheap and close to viewers, while playback still takes four seconds to start because the manifest call crossed a continent. The transcoder fleet can produce beautiful HLS ladders, while failed jobs, oversized renditions, search, and continue-watching all hammer the same database. The rights check can be perfectly correct, and far too slow or too brittle to run on every segment fetch. A real design pulls those workloads apart on purpose.

Requirements we'll design against

  • Functional: upload videos, transcode into adaptive renditions, publish playable manifests, browse and search a catalog, track watch progress, enforce subscriptions and territory windows, and measure playback quality.
  • Non-functional: start playback in under 2 seconds at p95, keep playback starts reliable through retries and regional read models, serve high-volume media through CDN cache, control egress/transcoding/storage cost, process uploads asynchronously with retries, and protect premium content without adding a database call to every segment fetch.
Monthly viewers
50M+
global VOD
Peak playback
1M+
concurrent streams
New uploads
100k/day
bursty ingest
Playback SLO
99.95%
reliable starts

Those headline numbers turn into specific capacity slices the design has to absorb. Segment fetches: at 1M concurrent streams and 4-second HLS segments, you're serving roughly 250k segment GETs/sec (1M / 4), nearly all of them from CDN cache. CDN miss budget: at a 99.5% hit rate, that's about 1.25k segment requests/sec hitting origin shield; size for 5× burst headroom around regional cache fills. Transcode throughput: 100k uploads/day averages around 1.2 jobs/sec, but realistic peaks are 5–10× that, and each job fans out to roughly five renditions with real-time-factors between 0.25× and 1× depending on resolution. Plan on 50–200 concurrent encoder slots with elastic burst capacity, and design the queue to drain in under 30 minutes at p99 so the processing → ready window stays tight. Startup latency budget (2s p95): roughly 200 ms manifest, 100 ms entitlement, 200 ms DRM license, 400 ms first segment, plus about 1.1s of slack for DNS, TCP, TLS, and ABR ramp-up. Each of those is its own SLO to hit, not one big budget to spend however you like.

Kata setup

Scored characteristics

Archie reviews every video streaming platform iteration against the same three quality attributes, then turns the lowest-scoring gaps into the next concrete design move.

Reliability

The probability that the system will operate without failure for a specified period under specified conditions. Safety relates to the potential for harm to life or property.

Cost Efficiency

The ability to meet required performance and reliability outcomes while minimizing infrastructure and operational cost.

Content protection

Rights enforcement, entitlement checks, DRM, piracy response, abuse controls, and safe operational feedback loops.

Six versions, six failure modes

This case study runs six iterations instead of the usual five. The reason is that video streaming has two separate late-stage moves and bundling them into one obscures both: first getting playback startup regional so global viewers don't go transcontinental for a manifest, and then adding the rights, reliability, and cost loops a licensed platform actually needs to operate.

v1 - Origin-only upload and playback

Start with the naive version. One application server takes uploads, drops the file on local disk next to its metadata row, and serves byte-range GETs back to viewers. For a prototype this is fine — the control path and the data path are the same thing.

It collapses the moment one video gets popular. Video bytes are orders of magnitude heavier than metadata, so a single hit title can saturate the same network and CPU that login, browsing, upload, and playback-start are all trying to share.

v1 - origin-only
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

v2 - Object storage and CDN delivery

Push the media into object storage and put a CDN in front of it. The API keeps metadata and hands out signed playback URLs; viewers pull the video bytes through the CDN. That's the first real split: the app servers stop being the place media fails, and the expensive part of delivery moves onto infrastructure designed for cache hit rate and egress control.

v2 - + CDN origin
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

A CDN does not make a file a stream

Serving one source file from a CDN is better than serving it from your app server, but you still don't have bitrate adaptation, seek behavior, captions, thumbnails, codec coverage for older devices, or any way to control delivery cost. The next bottleneck isn't raw delivery — it's processing.

v3 - Async transcoding and adaptive playback

Once the original lands durably, emit a transcode job and let the upload request finish. Workers pick the job up, generate the bitrate ladder, HLS/DASH manifests, thumbnails, captions, and sidecar files, then publish the packaged output to the CDN. Now the upload isn't blocked on encoding, a failed job can retry against the raw original, and clients can pick a rendition that doesn't waste their bandwidth.

v3 - + transcode pipeline
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

Readiness is a product state

Once transcoding is async, an uploaded video isn't immediately playable. Model the states explicitly — uploaded, processing, ready, failed — and let the product show progress, not a broken player while a worker is still encoding.

Like where this is going?

Try ArchieGuru on your own design — Archie will review the result.

Try it now

v4 - Catalog, search, recommendations, and watch events

Once playback itself is reliable, product traffic starts to take over. Browse pages, search, home rails, continue-watching, recommendations — none of these should be running through the playback API or hammering the primary metadata database. Move them into their own lanes: a catalog API on its own read store, a search index, a recommendation service with its own feature data, and a watch-event stream for engagement signals. One expensive product feature should never be able to take playback down.

v4 - + discovery lanes
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

The catalog is a read model, not one table

A title page, a search query, and a personalized home row are all part of the catalog from a viewer's point of view, but they need very different data shapes and cost profiles underneath. Cram them all into one relational query path and your first big product launch will look exactly like a database incident.

v5 - Regional playback control

The CDN serves media globally, but starting playback still requires a few control-plane decisions: is this title available in your region, which manifest do you get, are you entitled to it, what's your session id. Push those reads into regional playback APIs backed by replicated catalog snapshots and local event buffers. Origin media stays authoritative; catalog writes still go to one place.

v5 - + regional control
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

Regional reads, centralized authority

For VOD, regions don't usually need to accept catalog writes. Replicate snapshots outward, put explicit freshness budgets on entitlement and availability data, and keep the conflict-prone writes — publish, rights windows, takedowns — in one authoritative control plane. You pay for regional redundancy, and you skip the operational tax of running an active-active write topology.

v6 - Rights, reliability, and cost feedback

The last version adds the controls a licensed streaming platform actually needs before you can run it safely: an entitlement service, a rights ledger, DRM license exchange, forensic watermarking, playback observability, and a cost governor that nudges cache TTLs, origin shielding, and bitrate policy through guarded feedback rather than ad-hoc changes during incidents. Segment fetches stay CDN-fast. Rights, reliability telemetry, and cost analytics all shape what happens around the session, not what happens during a segment GET.

Drill the failure modes the design is supposed to survive

Listing components isn't a design. The design is what happens when one of them fails. So write down — and drill, quarterly — the expected behavior for at least these scenarios. DRM license service down: premium fails closed with a clear UI, free tier keeps playing. CDN POP loss: DNS re-anchor and a warm-cache window, with rebuffer ratio tracked through the cutover. Transcode fleet 50% loss: priority lanes drain first, low-priority backlog gets a published SLA, escalate if drain time crosses 30 minutes. Catalog snapshot replication stalls: regional playback API serves last-known-good, no new title pages, alert when staleness crosses N minutes. Entitlement cache stale and origin unreachable: a per-route policy decides which titles fail open and which fail closed. Leaked watermark variant: revoke the session, force re-license on next start, audit-log lookup by variant id. Each one is a runbook, an alert, and a quarterly drill, not a sentence in a doc.

v6 - production-grade VOD
Loading canvas…

Design a video streaming platform that ingests uploads, transcodes adaptive media, serves global playback, and enforces rights safely.

Known limitations

v6 is production-grade for licensed video-on-demand. It doesn't cover live streaming, ultra-low-latency sports, offline downloads, creator monetization, ad insertion, comments, or royalty settlement. Each of those shifts the reliability model and the cost profile enough to deserve its own design pass.

It also keeps authoritative catalog and rights writes centralized. That's deliberate for this scope. Regional read models keep playback fast; centralized writes avoid the conflicts you'd otherwise hit around release windows, partner takedowns, and rights audits.

What Archie unlocked, step by step

Archie's reviews walk the design through the workload boundaries that actually matter for streaming. Pull media bytes off the app servers. Push CPU-heavy transcoding off the request path. Split product discovery away from playback. Move startup reads into regions. Then close the loop with rights, reliability telemetry, and cost controls.

Score lift under Archie's reviews

Each step is one Archie review — the gap it flagged became the next version's design move. Rows are ordered from the area Archie still nudges hardest to the one it took furthest, with the overall score on the bottom row. Deltas show the total lift from v1 to v6.

  • Content protection

    2284+62

    v1
    v2
    v3
    v4
    v5
    v6
  • Cost Efficiency

    3686+50

    v1
    v2
    v3
    v4
    v5
    v6
  • Reliability

    2488+64

    v1
    v2
    v3
    v4
    v5
    v6
  • Overall

    2786+59

    v1
    v2
    v3
    v4
    v5
    v6
ArchieGuru

System design is a skill. Treat it like one.

Pick a kata, design the system, and get expert-grade feedback before your next real review, next interview, or next prod decision.

No credit card required · 100 free trial Arcs included · Arcs never expire