Database per Service, Outbox Pattern, and CDC

advanced microservices cdc debezium outbox-pattern saga distributed-systems

One of the first things we learn about microservices is that each service should own its data. No shared databases. Sounds clean in theory. In practice, it creates a whole set of new problems: how do services share data? How do we maintain consistency across services? How do we avoid dual writes?

Let’s break down the patterns that solve these problems.

Database per Service

In a monolith, all the code talks to one big database. Every module can JOIN any table, every query can reach any data. It works, but it means everything is tightly coupled at the data level.

In microservices, we flip this around. Each service gets its own database — and no other service is allowed to touch it directly.

Order Service   → orders_db (PostgreSQL)
User Service    → users_db (PostgreSQL)
Product Service → products_db (MongoDB)
Search Service  → search_index (Elasticsearch)

Why Separate Databases?

  • Loose coupling — we can change the Order Service’s schema without breaking the User Service
  • Independent deployment — each service can be deployed, scaled, and migrated independently
  • Right tool for the job — the Product Service can use MongoDB while the Order Service uses PostgreSQL
  • Blast radius — if one database goes down, only that service is affected

The Hard Part: Querying Across Services

With a shared database, getting a user’s order history is a simple JOIN. With separate databases, the Order Service doesn’t have access to the Users table. So how do we answer “show me all orders with customer names”?

Option 1: API calls — the Order Service calls the User Service API to get customer names. Simple but adds latency and creates runtime dependencies.

Option 2: Data replication — the Order Service keeps a local copy of the customer names it needs. More complex but faster queries and no runtime dependency.

Option 3: Events — when a user’s name changes, the User Service publishes an event. The Order Service listens and updates its local copy. Decoupled but eventually consistent.

The Dual Write Problem

Here’s a common mistake. Say we need to save an order AND send a message to Kafka:

1. Save order to database   ← succeeds
2. Publish event to Kafka   ← FAILS (network error)

Now we have an order in the database but no event was published. Other services never find out about the order. This is called the dual write problem — writing to two systems without a transaction across them.

We can’t just wrap both in a database transaction because Kafka isn’t part of the database’s transaction. That’s where the Outbox Pattern comes in.

The Outbox Pattern

The outbox pattern guarantees that if the business data is saved, the event will be published — eventually.

How It Works

  1. In the same database transaction, we write the business data AND an event to a local outbox table
  2. A separate process (the “relay”) reads the outbox table and publishes events to the message broker
  3. Once published, the relay marks the outbox entry as processed

Since both writes happen in the same transaction, they either both succeed or both fail. No inconsistency possible.

-- Step 1: In the same transaction
BEGIN;

-- Save the business data
INSERT INTO orders (id, customer_id, total, status)
VALUES (1001, 42, 99.99, 'placed');

-- Write the event to the outbox table
INSERT INTO outbox (
    aggregate_type, aggregate_id, event_type, payload
) VALUES (
    'Order', '1001', 'OrderPlaced',
    '{"orderId": 1001, "customerId": 42, "total": 99.99}'
);

COMMIT;

-- Step 2: A background relay process picks up outbox entries
-- and publishes them to Kafka/RabbitMQ
-- Step 3: After successful publish, marks them as processed

The outbox table looks something like this:

CREATE TABLE outbox (
    id BIGSERIAL PRIMARY KEY,
    aggregate_type VARCHAR(100) NOT NULL,
    aggregate_id VARCHAR(100) NOT NULL,
    event_type VARCHAR(100) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    processed_at TIMESTAMPTZ          -- NULL until published
);

Guarantees

The outbox pattern gives us at-least-once delivery. The relay might crash after publishing but before marking as processed, causing a re-publish. So consumers need to be idempotent — handling the same event twice should be harmless.

Change Data Capture (CDC)

CDC is another way to solve the “how do we get events from database changes” problem. But instead of writing to an outbox table, we tap directly into the database’s internal change log.

Every database already keeps a log of changes:

  • PostgreSQL has the WAL (Write-Ahead Log)
  • MySQL has the binlog
  • MongoDB has the oplog

CDC tools read these logs and turn them into event streams.

Debezium

Debezium is the most popular open-source CDC tool. It reads the database’s change log and publishes events to Kafka.

PostgreSQL
WAL (change log)
Debezium
reads WAL, emits events
Kafka
event stream
Search Service
Analytics
Cache Sync
Debezium captures every INSERT, UPDATE, DELETE and streams it to Kafka

CDC Use Cases

  • Syncing data across services — when the User Service updates a profile, CDC captures the change and other services receive it
  • Populating search indexes — every database change automatically updates Elasticsearch
  • Analytics pipelines — stream all changes to a data warehouse in real-time
  • Cache invalidation — when a row changes, invalidate the corresponding cache entry

CDC vs Outbox

Both solve similar problems but differently:

  • Outbox: application explicitly writes events. Full control over event format and content. Extra write per transaction.
  • CDC: captures changes automatically from the DB log. Zero application changes needed. But event format is tied to table structure.

A nice combo: use the outbox pattern with CDC. The application writes to an outbox table, and Debezium reads the outbox table’s changes from the WAL. This way we get explicit event control AND automatic publishing without a custom relay process. This is actually how Debezium’s recommended “outbox” connector works.

The Saga Pattern (Brief)

When an operation spans multiple services — like “place an order” which involves the Order Service, Payment Service, and Inventory Service — we can’t use a regular database transaction (because each service has its own database).

The Saga pattern breaks a distributed transaction into a sequence of local transactions, each in its own service. If one step fails, we run compensating transactions to undo the previous steps.

Two flavors:

Choreography — each service publishes events and listens for events. No central coordinator. Simple but hard to track and debug as the chain grows.

Order Service: OrderPlaced →
  Payment Service: PaymentProcessed →
    Inventory Service: StockReserved →
      Order Service: OrderConfirmed

If Payment fails:
  Payment Service: PaymentFailed →
    Order Service: OrderCancelled (compensating)

Orchestration — a central saga orchestrator tells each service what to do and handles failures. Easier to understand and debug, but the orchestrator is a single point of logic.

Orchestrator:
  1. Tell Order Service: create order → OK
  2. Tell Payment Service: charge payment → OK
  3. Tell Inventory Service: reserve stock → FAILED
  4. Tell Payment Service: refund payment (compensate)
  5. Tell Order Service: cancel order (compensate)

Important Caveat

Sagas give us eventual consistency, not strong consistency. Between steps, the system is in a partially-completed state. Our application needs to handle this — showing “order processing” instead of “order confirmed” until all steps complete.

Interview Tip

When discussing microservice data patterns, show that we understand the trade-offs. Database per service gives us independence but makes cross-service queries hard. The outbox pattern solves dual writes but adds a table and a relay process. CDC is elegant but ties events to database schema. Sagas handle distributed transactions but only provide eventual consistency. There’s no free lunch — every pattern solves one problem and introduces another.