Blob Storage and Object Storage

intermediate 2-4 YOE object-storage blob-storage S3 CDN storage

Every app eventually needs to store files — profile pictures, uploaded documents, video content, log exports. We can’t just throw these into our database. That’s where object storage comes in. It’s purpose-built for storing massive amounts of unstructured data cheaply and reliably.

What Is Object Storage?

Object storage treats every file as a standalone object with three parts:

  1. Data — the actual file bytes (an image, a PDF, a video)
  2. Metadata — info about the file (content type, upload date, custom tags)
  3. Unique key — a flat identifier like users/123/avatar.jpg

There’s no folder hierarchy like a regular filesystem. The “folders” we see in S3 are just key prefixes — it’s all a flat namespace under the hood.

Three Types of Storage Compared

Storage Types
Block Storage
Raw disk volumes
Attached to one server
Low latency
Like a hard drive
AWS EBS, Azure Disk
File Storage
Shared filesystem
Multiple servers access
Hierarchical (folders)
Like a network drive
AWS EFS, NFS, GCP Filestore
Object Storage
Flat key-value blobs
Access via HTTP API
Virtually unlimited
Like a giant locker
AWS S3, GCS, Azure Blob

The key insight: block storage is for OS and databases (needs to be fast, attached to one machine). File storage is for shared access between servers. Object storage is for everything else — and that “everything else” is usually the bulk of our data.

When to Use Object Storage

Pretty much anytime we deal with files:

  • User uploads — profile pictures, documents, attachments
  • Media — images, videos, audio files
  • Static assets — CSS, JS, fonts for a website
  • Backups — database dumps, log archives
  • Data lakes — raw data for analytics pipelines
  • ML artifacts — model files, training datasets

The rule of thumb: if it’s a file and we don’t need to query inside it, object storage is the answer.

  • Amazon S3 — The OG. “S3” has basically become a generic term for object storage. 11 nines of durability (99.999999999%). Pretty wild.
  • Google Cloud Storage (GCS) — Google’s version. Very similar API. Tight integration with BigQuery and other GCP services.
  • Azure Blob Storage — Microsoft’s offering. Three tiers: Hot, Cool, and Archive based on access frequency.
  • MinIO — Open-source, S3-compatible. Great for self-hosting or local development.
  • Cloudflare R2 — No egress fees (a big deal — S3 egress costs add up fast).

Pre-Signed URLs — The Smart Pattern

Here’s a common mistake: uploading files through our backend server. The file goes Client —> Our Server —> S3. Our server becomes a bottleneck and wastes bandwidth.

The better approach is pre-signed URLs. Our server generates a temporary, signed URL that lets the client upload directly to S3.

1. Client asks our server: "I want to upload a file"
2. Server generates a pre-signed URL (valid for 15 minutes)
3. Client uploads directly to S3 using that URL
4. Client tells our server: "Upload done, here's the key"
5. Server saves the file reference in the database

Same pattern works for downloads. Instead of streaming the file through our server, we generate a pre-signed download URL and redirect the client to it.

# Generating a pre-signed upload URL (Python + boto3)
import boto3

s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    'put_object',
    Params={'Bucket': 'my-bucket', 'Key': 'uploads/photo.jpg'},
    ExpiresIn=900  # 15 minutes
)
# Give this URL to the client — they can PUT directly to S3

Benefits:

  • Our server doesn’t touch the file data — no bandwidth or CPU wasted
  • Scales better — S3 handles the heavy lifting
  • Secure — URL expires, and we control who can generate them

CDN Integration

Object storage and CDNs are best friends. The pattern:

  1. Store files in S3 (or any object storage)
  2. Put a CDN (CloudFront, Cloudflare) in front of it
  3. Users download from the CDN edge server closest to them instead of hitting S3 directly
User in Tokyo → CloudFront Edge (Tokyo) → S3 (us-east-1)
                ↑ cached here after first request

This gives us:

  • Lower latency — files served from the nearest edge location
  • Lower cost — CDN egress is often cheaper than S3 egress, and caching reduces S3 requests
  • Less load on S3 — the CDN absorbs most of the traffic

For static websites, this combo (S3 + CloudFront) is the go-to architecture. Fast, cheap, and almost infinitely scalable.

Storage Classes and Costs

Most providers offer tiered storage for different access patterns:

  • Standard/Hot — Frequently accessed data. Highest storage cost, lowest retrieval cost.
  • Infrequent Access — Data accessed less than once a month. Cheaper storage, small retrieval fee.
  • Archive/Cold — Data rarely accessed (compliance, old backups). Very cheap storage, expensive and slow retrieval (hours).

Setting up lifecycle policies to automatically move old data to cheaper tiers is one of the easiest cost optimizations we can make.

Key Takeaway

In simple language, object storage is where we put files that our application needs but our database shouldn’t hold. Use pre-signed URLs so files go directly between the client and storage without touching our server. Pair it with a CDN for fast delivery worldwide. It’s one of the simplest and most impactful architectural decisions we’ll make.