Blob Storage and Object Storage - High-Level Design

Every app eventually needs to store files — profile pictures, uploaded documents, video content, log exports. We can’t just throw these into our database. That’s where object storage comes in. It’s purpose-built for storing massive amounts of unstructured data cheaply and reliably.

What Is Object Storage?

Object storage treats every file as a standalone object with three parts:

Data — the actual file bytes (an image, a PDF, a video)
Metadata — info about the file (content type, upload date, custom tags)
Unique key — a flat identifier like users/123/avatar.jpg

There’s no folder hierarchy like a regular filesystem. The “folders” we see in S3 are just key prefixes — it’s all a flat namespace under the hood.

Three Types of Storage Compared

Storage Types

Block Storage

Raw disk volumes

Attached to one server

Low latency

Like a hard drive

AWS EBS, Azure Disk

File Storage

Shared filesystem

Multiple servers access

Hierarchical (folders)

Like a network drive

AWS EFS, NFS, GCP Filestore

Object Storage

Flat key-value blobs

Access via HTTP API

Virtually unlimited

Like a giant locker

AWS S3, GCS, Azure Blob

The key insight: block storage is for OS and databases (needs to be fast, attached to one machine). File storage is for shared access between servers. Object storage is for everything else — and that “everything else” is usually the bulk of our data.

When to Use Object Storage

Pretty much anytime we deal with files:

User uploads — profile pictures, documents, attachments
Media — images, videos, audio files
Static assets — CSS, JS, fonts for a website
Backups — database dumps, log archives
Data lakes — raw data for analytics pipelines
ML artifacts — model files, training datasets

The rule of thumb: if it’s a file and we don’t need to query inside it, object storage is the answer.

Popular Object Storage Services

Amazon S3 — The OG. “S3” has basically become a generic term for object storage. 11 nines of durability (99.999999999%). Pretty wild.
Google Cloud Storage (GCS) — Google’s version. Very similar API. Tight integration with BigQuery and other GCP services.
Azure Blob Storage — Microsoft’s offering. Three tiers: Hot, Cool, and Archive based on access frequency.
MinIO — Open-source, S3-compatible. Great for self-hosting or local development.
Cloudflare R2 — No egress fees (a big deal — S3 egress costs add up fast).

Pre-Signed URLs — The Smart Pattern

Here’s a common mistake: uploading files through our backend server. The file goes Client —> Our Server —> S3. Our server becomes a bottleneck and wastes bandwidth.

The better approach is pre-signed URLs. Our server generates a temporary, signed URL that lets the client upload directly to S3.

1. Client asks our server: "I want to upload a file"
2. Server generates a pre-signed URL (valid for 15 minutes)
3. Client uploads directly to S3 using that URL
4. Client tells our server: "Upload done, here's the key"
5. Server saves the file reference in the database

Same pattern works for downloads. Instead of streaming the file through our server, we generate a pre-signed download URL and redirect the client to it.

# Generating a pre-signed upload URL (Python + boto3)
import boto3

s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    'put_object',
    Params={'Bucket': 'my-bucket', 'Key': 'uploads/photo.jpg'},
    ExpiresIn=900  # 15 minutes
)
# Give this URL to the client — they can PUT directly to S3

Benefits:

Our server doesn’t touch the file data — no bandwidth or CPU wasted
Scales better — S3 handles the heavy lifting
Secure — URL expires, and we control who can generate them

CDN Integration

Object storage and CDNs are best friends. The pattern:

Store files in S3 (or any object storage)
Put a CDN (CloudFront, Cloudflare) in front of it
Users download from the CDN edge server closest to them instead of hitting S3 directly

User in Tokyo → CloudFront Edge (Tokyo) → S3 (us-east-1)
                ↑ cached here after first request

This gives us:

Lower latency — files served from the nearest edge location
Lower cost — CDN egress is often cheaper than S3 egress, and caching reduces S3 requests
Less load on S3 — the CDN absorbs most of the traffic

For static websites, this combo (S3 + CloudFront) is the go-to architecture. Fast, cheap, and almost infinitely scalable.

Storage Classes and Costs

Most providers offer tiered storage for different access patterns:

Standard/Hot — Frequently accessed data. Highest storage cost, lowest retrieval cost.
Infrequent Access — Data accessed less than once a month. Cheaper storage, small retrieval fee.
Archive/Cold — Data rarely accessed (compliance, old backups). Very cheap storage, expensive and slow retrieval (hours).

Setting up lifecycle policies to automatically move old data to cheaper tiers is one of the easiest cost optimizations we can make.

Key Takeaway

In simple language, object storage is where we put files that our application needs but our database shouldn’t hold. Use pre-signed URLs so files go directly between the client and storage without touching our server. Pair it with a CDN for fast delivery worldwide. It’s one of the simplest and most impactful architectural decisions we’ll make.