We’re designing a video streaming platform like YouTube or Netflix. This is a favorite in senior-level interviews because it touches on almost everything — massive storage, heavy compute (transcoding), CDNs, adaptive streaming, and recommendation systems.
The core challenge: users upload hundreds of hours of video every minute. We need to process each video into multiple formats and resolutions, store it all, and then deliver it to a billion viewers worldwide with zero buffering. Let’s break it down.
Step 1: Requirements
Functional Requirements
- Users can upload videos (up to 1 hour, up to 10 GB)
- Videos are transcoded into multiple resolutions (360p, 480p, 720p, 1080p, 4K)
- Users can stream videos with adaptive bitrate (quality adjusts to network speed)
- Video search by title, tags, and description
- Like, comment, and subscribe
- Personalized video recommendations on the home page
Non-Functional Requirements
- High availability — the platform should be up 99.99% of the time
- Low latency playback — video should start playing in < 2 seconds
- Smooth streaming — no buffering on a decent connection
- Durability — uploaded videos must never be lost
- Global reach — fast video delivery worldwide via CDN
- Scale — 500 hours of video uploaded per minute, 1B video views per day
Step 2: Estimation
Assumptions:
- 2B total users, 800M daily active users
- 500 hours of video uploaded per minute (YouTube’s real number)
- 1B video views per day
- Average video length: 5 minutes
- Average video size after transcoding: 500 MB across all resolutions
QPS:
Upload rate: 500 hours/min = 30,000 hours/day = ~720,000 videos/day
Upload QPS: 720,000 / 86,400 ≈ ~8 uploads/sec
View QPS: 1B / 86,400 ≈ ~12,000 views/sec
Peak view QPS: ~30,000 views/sec
Storage:
Raw upload/day: 720K videos × 1 GB avg raw = 720 TB/day
Transcoded/day: 720K videos × 500 MB (all resolutions) = 360 TB/day
Total storage/day: ~1 PB/day (raw + transcoded)
Per year: ~365 PB
Bandwidth:
Outgoing (streaming): 1B views × 5 min avg × 2.5 MB/min (720p avg) = ~12.5 PB/day
12.5 PB / 86,400 ≈ ~150 GB/sec outgoing
That outgoing bandwidth number is exactly why we need CDNs. No single data center can push 150 GB/sec.
Step 3: High-Level Design
The key insight is that upload and streaming are completely separate paths. Uploading is async and compute-heavy (transcoding). Streaming is read-heavy and latency-sensitive (served from CDN). These two paths scale independently.
Component breakdown:
- API Servers — handle user requests (upload metadata, search, likes, comments, feed)
- Object Storage (S3) — stores raw uploads and transcoded video chunks. Cheap, durable, infinitely scalable.
- Transcoding Workers — CPU/GPU-intensive workers that convert raw video to multiple formats and resolutions
- Message Queue (Kafka/SQS) — decouples upload from transcoding. The upload finishes fast, transcoding happens async.
- Metadata DB — video titles, descriptions, view counts, user data. PostgreSQL or a similar relational DB.
- CDN — the star of the show. Distributes video segments to edge servers worldwide. 90%+ of video traffic is served from CDN, not our origin servers.
Step 4: API Design
POST /api/v1/videos/upload
→ Returns a pre-signed URL for direct upload to object storage
Body: { "title": "My Video", "description": "...", "tags": ["coding"] }
Response: { "video_id": "vid_123", "upload_url": "https://s3.../presigned" }
GET /api/v1/videos/{video_id}
→ Returns video metadata + streaming URLs
Response: {
"video_id": "vid_123",
"title": "My Video",
"status": "ready",
"manifest_url": "https://cdn.example.com/vid_123/master.m3u8",
"thumbnail_url": "https://cdn.example.com/vid_123/thumb.jpg",
"views": 1500000,
"likes": 82000,
"created_at": "2026-03-30T10:00:00Z"
}
GET /api/v1/videos/search?q=system+design&limit=20&cursor=abc
GET /api/v1/feed?limit=20&cursor=abc — personalized home feed
POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/comments
Body: { "text": "Great video!" }
POST /api/v1/users/{user_id}/subscribe
Why pre-signed URLs for upload? We don’t want the video file to flow through our API servers. That would bottleneck them. Instead, we give the client a pre-signed URL that lets them upload directly to object storage (S3). Our server never touches the raw video bytes — it just manages metadata.
Step 5: Data Model
-- Users table (PostgreSQL)
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
display_name VARCHAR(100),
avatar_url TEXT,
subscriber_count INT DEFAULT 0,
created_at TIMESTAMP
);
-- Videos table (PostgreSQL)
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY, -- Snowflake ID
user_id BIGINT NOT NULL,
title VARCHAR(200),
description TEXT,
status VARCHAR(20), -- 'uploading', 'transcoding', 'ready', 'failed'
duration_sec INT,
manifest_url TEXT, -- HLS master playlist URL
thumbnail_url TEXT,
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
tags TEXT[],
created_at TIMESTAMP,
INDEX idx_user_time (user_id, created_at DESC)
);
-- Video chunks (stored in object storage, tracked in metadata)
-- Each resolution has its own set of chunks
-- Path pattern: s3://videos/{video_id}/{resolution}/{segment_number}.ts
-- Manifest: s3://videos/{video_id}/master.m3u8
-- Comments table (PostgreSQL or Cassandra for heavy-write scenarios)
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY,
video_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
text TEXT,
like_count INT DEFAULT 0,
created_at TIMESTAMP,
INDEX idx_video_time (video_id, created_at DESC)
);
-- Likes table
CREATE TABLE likes (
user_id BIGINT,
video_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, video_id)
);
-- Subscriptions table
CREATE TABLE subscriptions (
subscriber_id BIGINT,
creator_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (subscriber_id, creator_id),
INDEX idx_creator (creator_id)
);
Step 6: Deep Dives
Deep Dive 1: Video Upload and Transcoding Pipeline
When a creator uploads a video, a LOT happens behind the scenes before viewers can watch it. Let’s trace the full pipeline.
Step 1: Upload
The creator’s app gets a pre-signed URL from our API. The raw video (could be 5 GB) uploads directly to object storage. This bypasses our servers entirely — S3 handles the heavy lifting.
For large files, we use multipart upload. The file gets split into chunks (say 10 MB each), each chunk uploads in parallel, and S3 reassembles them. If a chunk fails, we retry just that chunk — not the whole file.
Step 2: Trigger transcoding
Once the upload completes, S3 sends an event notification to our message queue (SQS/Kafka). A transcoding orchestrator picks up the event.
Step 3: Transcoding
This is the compute-heavy part. We need to produce multiple versions of the video:
Input: raw_video.mp4 (1080p, 5 GB)
Output:
→ 360p @ 400 kbps (mobile on slow network)
→ 480p @ 800 kbps (mobile on decent network)
→ 720p @ 2.5 Mbps (laptop)
→ 1080p @ 5 Mbps (desktop/TV)
→ 4K @ 20 Mbps (high-end TV)
Each resolution also gets chunked into small segments (typically 2-10 seconds each). These segments are the fundamental unit of streaming — we’ll see why in the next deep dive.
Transcoding is embarrassingly parallel. We can split the raw video into sections and transcode each section independently on different workers. A 30-minute video can be transcoded in minutes by throwing enough workers at it.
Step 4: Generate thumbnails and metadata
While transcoding, we also extract thumbnails (multiple frames for the preview scrubber), video duration, and other metadata.
Step 5: Upload transcoded chunks to object storage
All the segments and manifest files get stored in S3 with a predictable path structure:
s3://videos/vid_123/360p/segment_001.ts
s3://videos/vid_123/360p/segment_002.ts
...
s3://videos/vid_123/1080p/segment_001.ts
s3://videos/vid_123/master.m3u8
Step 6: Update metadata and notify CDN
The video status changes from “transcoding” to “ready” in the database. The CDN gets the new content pushed to its edge nodes (or more commonly, it pulls on first request).
The whole pipeline takes 5-30 minutes depending on video length and our transcoding capacity.
Deep Dive 2: Adaptive Bitrate Streaming (HLS)
This is probably the most interesting technical detail. When we watch a YouTube video, the quality adjusts automatically based on our network speed. That’s adaptive bitrate streaming, and it uses a protocol called HLS (HTTP Live Streaming).
Here’s how it works:
The manifest file (master.m3u8):
When the video player starts, it fetches a master manifest file. This file lists all available quality levels:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
The quality-specific playlist:
Each quality level has its own playlist listing the individual segments:
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXTINF:6.0,
segment_001.ts
#EXTINF:6.0,
segment_002.ts
#EXTINF:6.0,
segment_003.ts
The adaptive magic:
- The player starts by measuring available bandwidth
- It picks the highest quality that fits within the bandwidth
- It downloads segments one by one
- After each segment download, it measures the actual download speed
- If the network got slower, it switches to a lower quality for the next segment
- If the network got faster, it switches up
In simple language, the video is chopped into tiny pieces (6-second segments), and each piece exists in multiple quality levels. The player picks the best quality for each piece based on current network speed. That’s why we sometimes see quality shift mid-video — we literally switched to a different resolution’s segment.
Why segments over CDN? Each 6-second segment is a regular HTTP file. It gets cached by the CDN just like any other file. No special streaming protocol needed — just plain HTTP requests. That’s the genius of HLS. It piggybacks on all existing HTTP infrastructure.
Deep Dive 3: Video Recommendation Basics
When we open YouTube’s home page, we see a personalized feed of videos. How does the recommendation engine work at a high level?
Two main approaches:
Collaborative filtering — “Users who watched X also watched Y.” We don’t even need to understand the video content. We just look at patterns in watch behavior.
User A watched: [Video 1, Video 2, Video 3]
User B watched: [Video 1, Video 2, Video 4]
User A and B are similar (both watched 1 and 2).
Recommend Video 4 to User A, Video 3 to User B.
Content-based filtering — “This video has similar tags/categories to videos you liked.” We analyze the content attributes (title, tags, category, description) and find similar videos.
In practice, it’s a pipeline:
- Candidate generation — narrow down from billions of videos to a few thousand candidates using quick, rough signals (user’s watch history, subscriptions, trending in their region)
- Ranking — score each candidate using detailed features (watch time prediction, click-through rate, user engagement signals)
- Re-ranking — apply business rules (diversity, freshness, remove duplicates, filter out stuff they’ve already watched)
The ranking model is typically a deep neural network trained on features like:
- User’s watch history and preferences
- Video metadata (title, category, upload date)
- Engagement metrics (average watch time, like ratio)
- Context (time of day, device type)
For a system design interview, we don’t need to design the ML model. We just need to show we understand the pipeline: generate candidates → rank → re-rank → serve.
Step 7: Scaling
Object storage:
- S3 handles the storage scaling for us — it’s practically infinite
- Use S3 lifecycle policies: move videos with < 10 views after 90 days to cheaper storage tiers (S3 Glacier)
- Hot/cold separation: popular videos on fast storage, old/unpopular on archive storage
CDN is everything:
- 90%+ of video traffic should be served from CDN edge nodes
- Use multiple CDN providers for redundancy (YouTube uses Google’s own CDN, Netflix uses Open Connect)
- Popular videos get cached at every edge location. Long-tail videos get cached on demand.
- Pre-warm popular content: push trending videos to edge nodes proactively
Transcoding at scale:
- Use a managed service (AWS Elastic Transcoder, MediaConvert) or a fleet of GPU instances
- Auto-scale based on queue depth — if there are 10,000 videos waiting, spin up more workers
- Priority queue: premium creators or popular channels get transcoded first
- Cost optimization: don’t transcode to 4K if the source is only 720p
Database scaling:
- Videos metadata: shard by video_id across PostgreSQL instances
- View counts: don’t update the DB on every view. Use Redis counters, flush to DB in batches
- Comments: for viral videos with millions of comments, shard by video_id. Use Cassandra if write volume is extreme.
Search:
- Elasticsearch cluster for video search (title, description, tags)
- Index gets updated whenever a video status changes to “ready”
- Autocomplete using prefix queries on Elasticsearch
Global deployment:
- Upload to the nearest region’s object storage, then replicate
- Transcoding can happen in any region — send the job to wherever we have spare capacity
- Metadata DB: primary in one region, read replicas globally
- The CDN handles global delivery — that’s its whole job
In simple language, a video streaming platform is really two separate systems glued together. The upload side is an async processing pipeline — take a raw video, transcode it into segments at multiple qualities, and push to storage. The streaming side is just serving static files (video segments) through a CDN using the HLS protocol. The player fetches a manifest file, picks a quality, and downloads segments one by one. The CDN does the heavy lifting of global delivery. Everything else — search, recommendations, comments — is supporting infrastructure around these two core paths.