Design a Video Streaming Platform (YouTube) - High-Level Design

We’re designing a video streaming platform like YouTube or Netflix. This is a favorite in senior-level interviews because it touches on almost everything — massive storage, heavy compute (transcoding), CDNs, adaptive streaming, and recommendation systems.

The core challenge: users upload hundreds of hours of video every minute. We need to process each video into multiple formats and resolutions, store it all, and then deliver it to a billion viewers worldwide with zero buffering. Let’s break it down.

Step 1: Requirements

Functional Requirements

Users can upload videos (up to 1 hour, up to 10 GB)
Videos are transcoded into multiple resolutions (360p, 480p, 720p, 1080p, 4K)
Users can stream videos with adaptive bitrate (quality adjusts to network speed)
Video search by title, tags, and description
Like, comment, and subscribe
Personalized video recommendations on the home page

Non-Functional Requirements

High availability — the platform should be up 99.99% of the time
Low latency playback — video should start playing in < 2 seconds
Smooth streaming — no buffering on a decent connection
Durability — uploaded videos must never be lost
Global reach — fast video delivery worldwide via CDN
Scale — 500 hours of video uploaded per minute, 1B video views per day

Step 2: Estimation

Assumptions:

2B total users, 800M daily active users
500 hours of video uploaded per minute (YouTube’s real number)
1B video views per day
Average video length: 5 minutes
Average video size after transcoding: 500 MB across all resolutions

QPS:

Upload rate:     500 hours/min = 30,000 hours/day = ~720,000 videos/day
Upload QPS:      720,000 / 86,400 ≈ ~8 uploads/sec

View QPS:        1B / 86,400 ≈ ~12,000 views/sec
Peak view QPS:   ~30,000 views/sec

Storage:

Raw upload/day:     720K videos × 1 GB avg raw = 720 TB/day
Transcoded/day:     720K videos × 500 MB (all resolutions) = 360 TB/day
Total storage/day:  ~1 PB/day (raw + transcoded)
Per year:           ~365 PB

Bandwidth:

Outgoing (streaming): 1B views × 5 min avg × 2.5 MB/min (720p avg) = ~12.5 PB/day
                      12.5 PB / 86,400 ≈ ~150 GB/sec outgoing

That outgoing bandwidth number is exactly why we need CDNs. No single data center can push 150 GB/sec.

Step 3: High-Level Design

Video Streaming — High-Level Architecture

Upload Path

1. Creator uploads video

2. Upload to Object Storage (S3)

3. Message Queue triggers transcoding

4. Transcode to 360p/480p/720p/1080p/4K

5. Store chunks in Object Storage

6. Generate thumbnails

7. Update Metadata DB (ready)

8. Push to CDN edge nodes

Streaming Path

1. Viewer requests video

2. API returns video metadata

3. Player fetches manifest file (HLS)

4. Manifest lists available qualities

5. Player picks quality based on bandwidth

6. Fetch video segments from CDN

7. Adaptive: switch quality mid-stream

Transcoding Workers CDN (Edge Nodes) Object Storage (S3) Metadata DB Message Queue

Upload: Creator → Object Storage → Queue → Transcoder → Object Storage → CDN

Stream: Viewer → API → CDN Edge Node → Video Segments (HLS/DASH)

The key insight is that upload and streaming are completely separate paths. Uploading is async and compute-heavy (transcoding). Streaming is read-heavy and latency-sensitive (served from CDN). These two paths scale independently.

Component breakdown:

API Servers — handle user requests (upload metadata, search, likes, comments, feed)
Object Storage (S3) — stores raw uploads and transcoded video chunks. Cheap, durable, infinitely scalable.
Transcoding Workers — CPU/GPU-intensive workers that convert raw video to multiple formats and resolutions
Message Queue (Kafka/SQS) — decouples upload from transcoding. The upload finishes fast, transcoding happens async.
Metadata DB — video titles, descriptions, view counts, user data. PostgreSQL or a similar relational DB.
CDN — the star of the show. Distributes video segments to edge servers worldwide. 90%+ of video traffic is served from CDN, not our origin servers.

Step 4: API Design

POST /api/v1/videos/upload
  → Returns a pre-signed URL for direct upload to object storage
  Body: { "title": "My Video", "description": "...", "tags": ["coding"] }
  Response: { "video_id": "vid_123", "upload_url": "https://s3.../presigned" }

GET /api/v1/videos/{video_id}
  → Returns video metadata + streaming URLs
  Response: {
    "video_id": "vid_123",
    "title": "My Video",
    "status": "ready",
    "manifest_url": "https://cdn.example.com/vid_123/master.m3u8",
    "thumbnail_url": "https://cdn.example.com/vid_123/thumb.jpg",
    "views": 1500000,
    "likes": 82000,
    "created_at": "2026-03-30T10:00:00Z"
  }

GET /api/v1/videos/search?q=system+design&limit=20&cursor=abc
GET /api/v1/feed?limit=20&cursor=abc                — personalized home feed

POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/comments
  Body: { "text": "Great video!" }
POST /api/v1/users/{user_id}/subscribe

Why pre-signed URLs for upload? We don’t want the video file to flow through our API servers. That would bottleneck them. Instead, we give the client a pre-signed URL that lets them upload directly to object storage (S3). Our server never touches the raw video bytes — it just manages metadata.

Step 5: Data Model

-- Users table (PostgreSQL)
CREATE TABLE users (
    user_id         BIGINT PRIMARY KEY,
    username        VARCHAR(50) UNIQUE,
    display_name    VARCHAR(100),
    avatar_url      TEXT,
    subscriber_count INT DEFAULT 0,
    created_at      TIMESTAMP
);

-- Videos table (PostgreSQL)
CREATE TABLE videos (
    video_id        BIGINT PRIMARY KEY,      -- Snowflake ID
    user_id         BIGINT NOT NULL,
    title           VARCHAR(200),
    description     TEXT,
    status          VARCHAR(20),             -- 'uploading', 'transcoding', 'ready', 'failed'
    duration_sec    INT,
    manifest_url    TEXT,                    -- HLS master playlist URL
    thumbnail_url   TEXT,
    view_count      BIGINT DEFAULT 0,
    like_count      BIGINT DEFAULT 0,
    tags            TEXT[],
    created_at      TIMESTAMP,
    INDEX idx_user_time (user_id, created_at DESC)
);

-- Video chunks (stored in object storage, tracked in metadata)
-- Each resolution has its own set of chunks
-- Path pattern: s3://videos/{video_id}/{resolution}/{segment_number}.ts
-- Manifest:     s3://videos/{video_id}/master.m3u8

-- Comments table (PostgreSQL or Cassandra for heavy-write scenarios)
CREATE TABLE comments (
    comment_id      BIGINT PRIMARY KEY,
    video_id        BIGINT NOT NULL,
    user_id         BIGINT NOT NULL,
    text            TEXT,
    like_count      INT DEFAULT 0,
    created_at      TIMESTAMP,
    INDEX idx_video_time (video_id, created_at DESC)
);

-- Likes table
CREATE TABLE likes (
    user_id         BIGINT,
    video_id        BIGINT,
    created_at      TIMESTAMP,
    PRIMARY KEY (user_id, video_id)
);

-- Subscriptions table
CREATE TABLE subscriptions (
    subscriber_id   BIGINT,
    creator_id      BIGINT,
    created_at      TIMESTAMP,
    PRIMARY KEY (subscriber_id, creator_id),
    INDEX idx_creator (creator_id)
);

Step 6: Deep Dives

Deep Dive 1: Video Upload and Transcoding Pipeline

When a creator uploads a video, a LOT happens behind the scenes before viewers can watch it. Let’s trace the full pipeline.

Step 1: Upload

The creator’s app gets a pre-signed URL from our API. The raw video (could be 5 GB) uploads directly to object storage. This bypasses our servers entirely — S3 handles the heavy lifting.

For large files, we use multipart upload. The file gets split into chunks (say 10 MB each), each chunk uploads in parallel, and S3 reassembles them. If a chunk fails, we retry just that chunk — not the whole file.

Step 2: Trigger transcoding

Once the upload completes, S3 sends an event notification to our message queue (SQS/Kafka). A transcoding orchestrator picks up the event.

Step 3: Transcoding

This is the compute-heavy part. We need to produce multiple versions of the video:

Input:  raw_video.mp4 (1080p, 5 GB)

Output:
  → 360p  @ 400 kbps   (mobile on slow network)
  → 480p  @ 800 kbps   (mobile on decent network)
  → 720p  @ 2.5 Mbps   (laptop)
  → 1080p @ 5 Mbps     (desktop/TV)
  → 4K    @ 20 Mbps    (high-end TV)

Each resolution also gets chunked into small segments (typically 2-10 seconds each). These segments are the fundamental unit of streaming — we’ll see why in the next deep dive.

Transcoding is embarrassingly parallel. We can split the raw video into sections and transcode each section independently on different workers. A 30-minute video can be transcoded in minutes by throwing enough workers at it.

Step 4: Generate thumbnails and metadata

While transcoding, we also extract thumbnails (multiple frames for the preview scrubber), video duration, and other metadata.

Step 5: Upload transcoded chunks to object storage

All the segments and manifest files get stored in S3 with a predictable path structure:

s3://videos/vid_123/360p/segment_001.ts
s3://videos/vid_123/360p/segment_002.ts
...
s3://videos/vid_123/1080p/segment_001.ts
s3://videos/vid_123/master.m3u8

Step 6: Update metadata and notify CDN

The video status changes from “transcoding” to “ready” in the database. The CDN gets the new content pushed to its edge nodes (or more commonly, it pulls on first request).

The whole pipeline takes 5-30 minutes depending on video length and our transcoding capacity.

Deep Dive 2: Adaptive Bitrate Streaming (HLS)

This is probably the most interesting technical detail. When we watch a YouTube video, the quality adjusts automatically based on our network speed. That’s adaptive bitrate streaming, and it uses a protocol called HLS (HTTP Live Streaming).

Here’s how it works:

The manifest file (master.m3u8):

When the video player starts, it fetches a master manifest file. This file lists all available quality levels:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

The quality-specific playlist:

Each quality level has its own playlist listing the individual segments:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXTINF:6.0,
segment_001.ts
#EXTINF:6.0,
segment_002.ts
#EXTINF:6.0,
segment_003.ts

The adaptive magic:

The player starts by measuring available bandwidth
It picks the highest quality that fits within the bandwidth
It downloads segments one by one
After each segment download, it measures the actual download speed
If the network got slower, it switches to a lower quality for the next segment
If the network got faster, it switches up

In simple language, the video is chopped into tiny pieces (6-second segments), and each piece exists in multiple quality levels. The player picks the best quality for each piece based on current network speed. That’s why we sometimes see quality shift mid-video — we literally switched to a different resolution’s segment.

Why segments over CDN? Each 6-second segment is a regular HTTP file. It gets cached by the CDN just like any other file. No special streaming protocol needed — just plain HTTP requests. That’s the genius of HLS. It piggybacks on all existing HTTP infrastructure.

Deep Dive 3: Video Recommendation Basics

When we open YouTube’s home page, we see a personalized feed of videos. How does the recommendation engine work at a high level?

Two main approaches:

Collaborative filtering — “Users who watched X also watched Y.” We don’t even need to understand the video content. We just look at patterns in watch behavior.

User A watched: [Video 1, Video 2, Video 3]
User B watched: [Video 1, Video 2, Video 4]

User A and B are similar (both watched 1 and 2).
Recommend Video 4 to User A, Video 3 to User B.

Content-based filtering — “This video has similar tags/categories to videos you liked.” We analyze the content attributes (title, tags, category, description) and find similar videos.

In practice, it’s a pipeline:

Candidate generation — narrow down from billions of videos to a few thousand candidates using quick, rough signals (user’s watch history, subscriptions, trending in their region)
Ranking — score each candidate using detailed features (watch time prediction, click-through rate, user engagement signals)
Re-ranking — apply business rules (diversity, freshness, remove duplicates, filter out stuff they’ve already watched)

The ranking model is typically a deep neural network trained on features like:

User’s watch history and preferences
Video metadata (title, category, upload date)
Engagement metrics (average watch time, like ratio)
Context (time of day, device type)

For a system design interview, we don’t need to design the ML model. We just need to show we understand the pipeline: generate candidates → rank → re-rank → serve.

Step 7: Scaling

Object storage:

S3 handles the storage scaling for us — it’s practically infinite
Use S3 lifecycle policies: move videos with < 10 views after 90 days to cheaper storage tiers (S3 Glacier)
Hot/cold separation: popular videos on fast storage, old/unpopular on archive storage

CDN is everything:

90%+ of video traffic should be served from CDN edge nodes
Use multiple CDN providers for redundancy (YouTube uses Google’s own CDN, Netflix uses Open Connect)
Popular videos get cached at every edge location. Long-tail videos get cached on demand.
Pre-warm popular content: push trending videos to edge nodes proactively

Transcoding at scale:

Use a managed service (AWS Elastic Transcoder, MediaConvert) or a fleet of GPU instances
Auto-scale based on queue depth — if there are 10,000 videos waiting, spin up more workers
Priority queue: premium creators or popular channels get transcoded first
Cost optimization: don’t transcode to 4K if the source is only 720p

Database scaling:

Videos metadata: shard by video_id across PostgreSQL instances
View counts: don’t update the DB on every view. Use Redis counters, flush to DB in batches
Comments: for viral videos with millions of comments, shard by video_id. Use Cassandra if write volume is extreme.

Search:

Elasticsearch cluster for video search (title, description, tags)
Index gets updated whenever a video status changes to “ready”
Autocomplete using prefix queries on Elasticsearch

Global deployment:

Upload to the nearest region’s object storage, then replicate
Transcoding can happen in any region — send the job to wherever we have spare capacity
Metadata DB: primary in one region, read replicas globally
The CDN handles global delivery — that’s its whole job

In simple language, a video streaming platform is really two separate systems glued together. The upload side is an async processing pipeline — take a raw video, transcode it into segments at multiple qualities, and push to storage. The streaming side is just serving static files (video segments) through a CDN using the HLS protocol. The player fetches a manifest file, picks a quality, and downloads segments one by one. The CDN does the heavy lifting of global delivery. Everything else — search, recommendations, comments — is supporting infrastructure around these two core paths.