Computer Networks

All 42 notes on one page

Networking Fundamentals

OSI Model (7 Layers)

beginner osi layers networking protocols fundamentals

The OSI (Open Systems Interconnection) Model is a 7-layer conceptual model that describes how two computers talk to each other over a network. It’s a teaching tool, not a real implementation.

In simple language: it slices the messy work of “send data from my laptop to a server” into 7 neat layers, where each layer has one job and only talks to the layer above and below it.

Why We Need It

Networking is hard. Cables, signals, IPs, ports, encryption, formats — all of that has to work together. The OSI model splits the chaos into chunks so we can reason about each piece independently.

When something breaks, we can ask: “is this a Layer 2 issue (switch) or a Layer 7 issue (the API)?”

The 7 Layers (Top to Bottom)

7. Application

What the user sees. HTTP, HTTPS, FTP, SMTP, DNS, SSH

6. Presentation

Encoding, encryption, compression. TLS/SSL, JPEG, ASCII

5. Session

Open/close conversations. NetBIOS, RPC, sockets

4. Transport

End-to-end delivery, ports. TCP, UDP

3. Network

Routing across networks. IP, ICMP, OSPF, BGP

2. Data Link

Frames on a single link. Ethernet, Wi-Fi (802.11), ARP, MAC

1. Physical

Bits on the wire. Cables, fiber, radio waves, voltages

Layer-by-Layer

Layer 1 — Physical. The actual hardware. Copper cables, fiber strands, radio antennas. It deals with raw 0s and 1s as voltages or light pulses.

Layer 2 — Data Link. Moves frames between two devices on the same network. Each frame has a source and destination MAC address. A switch operates here.

Layer 3 — Network. Routes packets between different networks. IP addresses live here. Routers operate here. This is where “the internet” starts feeling like the internet.

Layer 4 — Transport. Delivers data to the right process (port) on the destination machine. TCP gives us reliability. UDP gives us speed.

Layer 5 — Session. Manages the conversation — opening, keeping alive, and closing the dialogue between two apps.

Layer 6 — Presentation. Translates the data format. Encrypts (TLS), compresses (gzip), or converts (UTF-8 ↔ ASCII).

Layer 7 — Application. What we actually code against. HTTP, DNS lookups, SMTP for email, SSH for shell access.

The Mnemonic

Top to bottom: All People Seem To Need Data Processing.

Bottom to top: Please Do Not Throw Sausage Pizza Away (the classic interview one).

7  Application    - Pizza
6  Presentation   - Sausage
5  Session        - Throw
4  Transport      - Not
3  Network        - Do
2  Data Link      - Please (wait, reverse it)
1  Physical       - Away

Read top-down: Away Pizza Sausage Throw Not Do Please — bottom up the mnemonic, top down the layer numbers.

A Quick Real-World Trace

When we open https://example.com:

L7 — browser builds an HTTP GET request
L6 — TLS encrypts it
L5 — a TCP session is established
L4 — TCP segments the data, adds port 443
L3 — IP wraps each segment in a packet with src/dst IP
L2 — Ethernet wraps each packet in a frame with src/dst MAC
L1 — bits go out as electrical signals on the cable

The receiver does the reverse — strip headers from the bottom up.

Interview Tip

Don’t memorize layers in isolation. Pair each layer with a protocol example and a device example (router for L3, switch for L2). Interviewers often ask “which layer does TLS sit at?” — the answer is L6 conceptually, though in practice it bridges L5/L6/L7.

References

TCP/IP Model

beginner tcp-ip layers networking internet fundamentals

The TCP/IP model is the 4-layer model that the real internet runs on. While OSI is great for textbooks, TCP/IP is what your laptop actually uses when it loads Reddit.

In simple language: take the OSI model, squish a few layers together, and we get TCP/IP — leaner, more practical, and it actually maps to real protocols.

The 4 Layers

Link (Network Access) — physical hardware + frames. (OSI L1 + L2)
Internet — routing packets across networks. (OSI L3)
Transport — TCP / UDP, ports, reliability. (OSI L4)
Application — HTTP, DNS, SSH, everything user-facing. (OSI L5 + L6 + L7)

OSI vs TCP/IP Side-by-Side

OSI (7 layers)

7. Application

6. Presentation

5. Session

4. Transport

3. Network

2. Data Link

1. Physical

TCP/IP (4 layers)

4. Application
(HTTP, DNS, SSH)

3. Transport (TCP/UDP)

2. Internet (IP)

1. Link (Ethernet, Wi-Fi)

Layer Breakdown

Link Layer. Cables, Wi-Fi, Ethernet frames. The hardware that physically gets bits from one device to the next. Includes ARP and the MAC layer.

Internet Layer. IP — that’s it. Routes packets across the whole world based on IP addresses. ICMP (used by ping) lives here too.

Transport Layer. TCP for reliable streams (HTTP, SSH). UDP for fast fire-and-forget (DNS, video calls). Ports identify which app gets the data.

Application Layer. Everything we touch as developers. HTTP, HTTPS, DNS, SMTP, IMAP, FTP, SSH, MQTT, gRPC.

Real Protocols at Each Layer

Application  | HTTP, HTTPS, DNS, SMTP, FTP, SSH, WebSocket, gRPC
Transport    | TCP, UDP, QUIC
Internet     | IPv4, IPv6, ICMP, IPSec
Link         | Ethernet (802.3), Wi-Fi (802.11), PPP, ARP

Why TCP/IP Won

It came before OSI and had working code (ARPANET, then the internet).
It’s pragmatic — no useless layers.
The session/presentation concerns (TLS, encoding) just get folded into the application layer in practice.

OSI is still taught because the layered thinking is useful. TCP/IP is what we actually deploy.

Encapsulation Quick Look

[ HTTP request                                   ]   <- App
[ TCP header | HTTP request                      ]   <- Transport
[ IP header  | TCP header | HTTP request         ]   <- Internet
[ Eth header | IP | TCP | HTTP | Eth trailer     ]   <- Link

Each layer slaps its own header on, peels it off on the other side. We’ll cover this in detail in the encapsulation note.

Interview Tip

If asked “which model does the internet use?” — the honest answer is TCP/IP. OSI is a reference; TCP/IP is reality. Mention that TLS doesn’t fit cleanly into TCP/IP’s 4 layers, which is one of the model’s known weak spots.

References

How Data Travels (Encapsulation & Frames)

beginner encapsulation frames packets segments mtu

When we send data over a network, it doesn’t go as one big chunk. Each layer wraps it in its own header as it goes down the stack. The receiver unwraps them in reverse on the way up.

This wrapping is called encapsulation. The unwrapping is decapsulation.

The Russian Doll Analogy

Imagine sending a gift in nested boxes:

The gift = our data (e.g. an HTTP request)
Each layer adds a box around it with addressing info on the outside
The receiver opens box by box until they reach the gift

Each “box” is a header (and sometimes a trailer) added by a layer.

What Each Layer Adds

Application — Data

Raw payload (e.g. "GET /index.html HTTP/1.1...")

Transport — Segment

+ TCP/UDP header (src port, dst port, seq #, flags)

Internet — Packet (Datagram)

+ IP header (src IP, dst IP, TTL, protocol)

Link — Frame

+ Ethernet header (src MAC, dst MAC) + trailer (CRC)

Physical — Bits

Voltages, light pulses, radio waves on the medium

The PDU Names (Memorize These)

PDU = Protocol Data Unit. The chunk of data at each layer has a different name:

Application  -> Data / Message
Transport    -> Segment (TCP) or Datagram (UDP)
Internet     -> Packet
Link         -> Frame
Physical     -> Bits

Common interview shortcut: Data → Segments → Packets → Frames → Bits.

A Concrete Example

We type https://example.com and hit enter. Here’s what happens to the GET request as it goes down:

1. App layer:        "GET / HTTP/1.1\r\nHost: example.com..."
2. Transport (TCP):  [ TCP hdr (src:51234, dst:443, seq, ack, flags) | GET... ]
3. Internet (IP):    [ IP hdr (src:192.168.1.5, dst:93.184.216.34, TTL=64) | TCP segment ]
4. Link (Ethernet):  [ Eth hdr (src MAC, dst MAC) | IP packet | CRC trailer ]
5. Physical:         010110100110... (electrical/optical signal)

On the receiving server, the layers unwrap in reverse — Ethernet header is stripped, then IP, then TCP, until the HTTP request reaches the web server process.

Headers Carry the “How”

Each header answers a different question:

Ethernet header: which device on this LAN should pick this up? (MAC address)
IP header: which machine on the internet is this going to? (IP address)
TCP/UDP header: which app on that machine? (port number)
HTTP body: what does the app actually want? (the request)

MTU (Maximum Transmission Unit)

A frame can only be so big. Most Ethernet networks use an MTU of 1500 bytes. If our packet is bigger, it gets fragmented into smaller pieces.

# Check MTU on Linux/macOS
ifconfig en0 | grep mtu
# en0: ... mtu 1500

# Or check path MTU to a host
ping -D -s 1472 example.com   # -D sets don't-fragment flag (Linux)

Why 1472 above? 1500 (MTU) - 20 (IP hdr) - 8 (ICMP hdr) = 1472 bytes of payload before fragmentation.

If a packet is too big and the don’t-fragment bit is set, the router drops it and sends back an ICMP “fragmentation needed” message. This is path MTU discovery.

Decapsulation on the Receiver

bits arrive on cable
  -> NIC assembles a frame
  -> Strip Ethernet header / trailer (verify MAC, CRC)
    -> Strip IP header (verify dst IP, decrement nothing — already arrived)
      -> Strip TCP header (check seq/ack, route to port)
        -> Hand HTTP request to the web server

Common Gotcha

People mix up fragmentation (IP layer, splitting a big packet) with segmentation (TCP layer, breaking a stream into segments). Different layers, different problems. TCP segmentation is normal and clean; IP fragmentation is generally avoided because it hurts performance and reliability.

References

IP Addressing (IPv4 & IPv6)

beginner ip ipv4 ipv6 addressing networking

An IP address is a unique identifier for a device on a network. Like a postal address, but for packets. Two flavors exist today: IPv4 (the old one) and IPv6 (the bigger one).

In simple language: every machine that wants to send or receive on the internet needs an IP address — that’s how routers know where to deliver our packets.

IPv4 — Dotted Decimal

IPv4 addresses are 32 bits, written as four numbers (0–255) separated by dots:

192.168.1.10
  |   |   |   |
  8b  8b  8b  8b   = 32 bits total

That gives us about 4.3 billion unique addresses (2^32). Sounds like a lot — until we realize there are 8 billion humans plus phones, IoT devices, servers, etc. We ran out years ago. NAT and IPv6 are the rescue plans.

IPv4 Classes (Historical)

Originally, IPv4 was split into classes based on the first octet:

Class A:  1.0.0.0    - 126.255.255.255    (huge networks, 8-bit prefix)
Class B:  128.0.0.0  - 191.255.255.255    (medium, 16-bit prefix)
Class C:  192.0.0.0  - 223.255.255.255    (small, 24-bit prefix)
Class D:  224.0.0.0  - 239.255.255.255    (multicast)
Class E:  240.0.0.0  - 255.255.255.255    (reserved)

Classes are deprecated — modern networks use CIDR (covered in the next note). But interviewers still ask, so know the ranges.

Private IP Ranges (RFC 1918)

These ranges are reserved for private networks — they don’t get routed on the public internet:

10.0.0.0    /8    -> 10.0.0.0    – 10.255.255.255    (16M addresses)
172.16.0.0  /12   -> 172.16.0.0  – 172.31.255.255    (1M addresses)
192.168.0.0 /16   -> 192.168.0.0 – 192.168.255.255   (65K addresses)

That’s why our home router gives us something like 192.168.1.42. Outside of our LAN, we’re known by our public IP (assigned by the ISP) thanks to NAT.

Special IPs Worth Knowing

127.0.0.1            -> loopback (this machine itself, "localhost")
0.0.0.0              -> "any IP" (used to bind to all interfaces)
255.255.255.255      -> broadcast (everyone on the LAN)
169.254.x.x          -> link-local (auto-assigned when DHCP fails)

IPv6 — The Bigger Address Space

IPv6 addresses are 128 bits — written as eight groups of 4 hex digits separated by colons:

2001:0db8:85a3:0000:0000:8a2e:0370:7334

That’s 2^128 addresses, or roughly 340 undecillion. Enough to give every grain of sand on Earth its own IP and still have plenty left over.

IPv6 Shorthand

Long IPv6 is painful to type. We can shorten it:

Drop leading zeros in each group: 2001:db8:85a3:0:0:8a2e:370:7334
Replace one run of all-zero groups with ::: 2001:db8:85a3::8a2e:370:7334

The :: shortcut can only appear once in an address (otherwise we couldn’t tell how many zero groups it represents).

Common IPv6 Special Addresses

::1            -> loopback (IPv6 version of 127.0.0.1)
::             -> unspecified ("any")
fe80::/10      -> link-local (similar to 169.254.x.x in v4)
fc00::/7       -> unique local (private, like RFC 1918)
2000::/3       -> global unicast (the public internet)

Why IPv6 If We Have NAT?

NAT works, but it’s a hack:

Breaks end-to-end connectivity (peer-to-peer apps suffer).
Makes server hosting from a home network awkward.
Adds latency and state at every gateway.

IPv6 gives every device a real, routable address. No NAT needed.

Checking Your Own IPs

# Linux / macOS — see all interfaces
ip addr           # Linux
ifconfig          # macOS

# Just your public IP
curl ifconfig.me
curl -4 ifconfig.me   # force IPv4
curl -6 ifconfig.me   # force IPv6

# On Windows
ipconfig /all

Adoption Reality

IPv6 adoption has crept up — Google measures roughly 40%+ of users reaching them over IPv6 in 2025. Big networks (mobile carriers, datacenters) are mostly there. Lots of legacy infra still runs IPv4 + NAT, so we’ll be juggling both for a long time.

Interview Tip

Two facts that come up often: 127.0.0.1 = localhost = ::1, and the three private IPv4 ranges. If asked “why do we need IPv6?” — say “IPv4 exhaustion” and bonus points for mentioning NAT trade-offs.

References

Subnetting & CIDR

beginner subnetting cidr subnet-mask ipv4 networking

Subnetting is the act of splitting a big IP block into smaller networks so we can organize them, secure them, and route between them efficiently. CIDR (Classless Inter-Domain Routing) is the modern notation we use for it.

In simple language: take an IP range and slice it up. CIDR tells us “how much of the address is the network, and how much is the host.”

The /N Notation

192.168.1.0/24 means:

The first 24 bits identify the network.
The remaining 32 - 24 = 8 bits identify the host within that network.
That gives us 2^8 = 256 addresses (254 usable — minus the network and broadcast).

192.168.1.0/24
|________| |__|
network    host bits (8)

Subnet Mask

The subnet mask is the binary version of the prefix length:

/24  ->  255.255.255.0     ->  11111111.11111111.11111111.00000000
/16  ->  255.255.0.0       ->  11111111.11111111.00000000.00000000
/8   ->  255.0.0.0         ->  11111111.00000000.00000000.00000000
/30  ->  255.255.255.252   ->  11111111.11111111.11111111.11111100

Where the mask is 1, that’s the network. Where it’s 0, that’s the host.

Network Address vs Host Address

For 192.168.1.50/24:

Network address: 192.168.1.0 (host bits all zero)
Broadcast address: 192.168.1.255 (host bits all one)
Usable hosts: 192.168.1.1 – 192.168.1.254 (254 total)

The two reserved addresses (network + broadcast) cost us 2 IPs per subnet.

A Simple Subnetting Example

Say we get the block 10.0.0.0/24 (256 addresses) and want 4 equal subnets.

To get 4 subnets, we borrow log2(4) = 2 bits from the host portion. New prefix: /24 + 2 = /26.

Each /26 subnet has 2^(32-26) = 64 addresses (62 usable).

10.0.0.0/26       ->  10.0.0.0   – 10.0.0.63    (broadcast .63)
10.0.0.64/26      ->  10.0.0.64  – 10.0.0.127   (broadcast .127)
10.0.0.128/26     ->  10.0.0.128 – 10.0.0.191   (broadcast .191)
10.0.0.192/26     ->  10.0.0.192 – 10.0.0.255   (broadcast .255)

Done. Four neat subnets, 62 hosts each.

Quick Math Cheat Sheet

Prefix   Hosts (usable)   Common use
/30      2                Point-to-point links
/29      6                Tiny subnets
/28      14               Small office segment
/27      30               Floor of a building
/24      254              Typical LAN
/22      1022             Larger office
/16      65,534           Big private network
/8       16,777,214       Massive (10.0.0.0/8)

Formula: usable hosts = 2^(32 - prefix) - 2.

Why Subnet at All?

Security — keep finance servers off the same broadcast domain as the guest Wi-Fi.
Performance — smaller broadcast domains = less ARP/DHCP noise.
Routing efficiency — routers can summarize routes (longest prefix match).
IP conservation — give a /30 to a router-to-router link instead of a wasteful /24.

CIDR Aggregates Routes Too

CIDR isn’t just about splitting — it lets us combine adjacent networks:

192.168.0.0/24
192.168.1.0/24    ->  can be summarized as 192.168.0.0/23

Routing tables stay smaller. The internet’s BGP relies on this heavily.

Practical Tools

# Linux: ipcalc (install via apt/brew)
ipcalc 192.168.1.0/26
# Network:   192.168.1.0/26
# HostMin:   192.168.1.1
# HostMax:   192.168.1.62
# Broadcast: 192.168.1.63
# Hosts/Net: 62

# Or use Python
python3 -c "import ipaddress; n = ipaddress.ip_network('192.168.1.0/26'); print(list(n.hosts())[:5])"

Common Gotcha

The mask /31 is special — RFC 3021 allows /31 on point-to-point links with 2 usable hosts (no network/broadcast). Without that exception, /31 would have 0 usable hosts. /30 is still the safer textbook answer.

Interview Tip

Practice writing out the binary mask quickly. /24 = 255.255.255.0 should be reflex. For trickier prefixes like /27, remember the host bits: 5 bits of host = 32 hosts = mask 255.255.255.224 (256 - 32 = 224).

References

MAC Addresses & ARP

beginner mac arp ethernet lan data-link

A MAC (Media Access Control) address is a unique hardware identifier baked into every network interface card. ARP is the protocol that translates IP addresses into MAC addresses so devices on the same LAN can actually talk.

In simple language: IP gets us across the internet, but on the final hop within a LAN, devices speak in MAC. ARP is the phone book that maps “this IP” to “this MAC.”

MAC Address Format

A MAC address is 48 bits, written as 6 hex octets:

00:1A:2B:3C:4D:5E
|_______||_______|
   OUI    Device ID
 (3 bytes) (3 bytes)

OUI (Organizationally Unique Identifier) — first 3 bytes, identifies the vendor (Apple, Cisco, Intel, etc.).
Device ID — last 3 bytes, unique within that vendor.

So 00:1A:2B:xx:xx:xx would mean “made by some specific vendor that owns the OUI 00:1A:2B.”

Special MAC Addresses

FF:FF:FF:FF:FF:FF    -> broadcast (everyone on the LAN)
01:00:5E:xx:xx:xx    -> IPv4 multicast
33:33:xx:xx:xx:xx    -> IPv6 multicast
00:00:00:00:00:00    -> unspecified

Why MAC and IP Both?

Two layers, two purposes:

IP (Layer 3) — global, hierarchical, routable. Tells routers where to send packets.
MAC (Layer 2) — local, flat, hardware-tied. Tells switches which physical port to forward a frame to.

A router’s job is to receive a frame, peel off the Ethernet header, look at the IP, find the next hop, and rewrite the MAC for the next leg. IP stays the same end-to-end (mostly). MAC changes every hop.

How ARP Works

We have IP 192.168.1.5 and want to send a packet to 192.168.1.10. The OS needs the MAC of .10 to build the Ethernet frame. ARP to the rescue.

ARP Request / Reply

1. Host A broadcasts: "Who has 192.168.1.10? Tell 192.168.1.5"

(sent to MAC FF:FF:FF:FF:FF:FF — everyone on the LAN)

2. Host B replies (unicast): "192.168.1.10 is at 00:1A:2B:3C:4D:5E"

3. Host A caches the mapping and sends the actual frame.

The ARP Cache

Every OS keeps an ARP cache so we don’t broadcast for every packet.

# Linux / macOS
arp -a
# ? (192.168.1.1) at 8c:4d:ea:11:22:33 on en0 ifscope [ethernet]
# ? (192.168.1.10) at 00:1a:2b:3c:4d:5e on en0 ifscope [ethernet]

# Windows
arp -a

# Manually delete an entry
sudo arp -d 192.168.1.10

Entries typically expire after a few minutes (varies by OS) so stale info doesn’t linger.

Gratuitous ARP

A device can send an ARP for its own IP without anyone asking. Why?

Announce itself — “Hey, 192.168.1.20 is now me, MAC AA:BB:CC…”
Update everyone’s caches after a NIC change or failover.
Detect IP conflicts — if someone else replies, we have a duplicate IP.

This is what high-availability setups (VRRP, keepalived) use to migrate a virtual IP between nodes.

ARP Spoofing (Security Note)

ARP has no authentication. An attacker on the same LAN can send forged ARP replies saying “I’m the gateway” and intercept everyone’s traffic — that’s an ARP spoofing / MITM attack. Mitigations: dynamic ARP inspection on managed switches, static ARP entries for critical hosts, or just use TLS so even if someone reads the bytes, they can’t decrypt.

Changing a MAC Address

MAC is “burned in” but the OS can override what’s sent on the wire:

# macOS
sudo ifconfig en0 ether aa:bb:cc:dd:ee:ff

# Linux
sudo ip link set dev eth0 address aa:bb:cc:dd:ee:ff

Useful for privacy on public Wi-Fi (modern phones randomize MACs by default).

ARP Doesn’t Exist in IPv6

IPv6 replaces ARP with NDP (Neighbor Discovery Protocol) running over ICMPv6. Same idea — find the link-layer address for a given IPv6 — but smarter and authenticated-friendly.

Interview Tip

A common gotcha: “what’s the destination MAC when host A talks to a host on a different subnet?” Answer: it’s the gateway’s MAC, not the final destination’s. The MAC changes at every router hop; the IP stays the same.

References

Ports & Sockets

beginner ports sockets tcp udp 5-tuple

A port is a 16-bit number (0–65535) that identifies a specific process or service on a machine. A socket is the combination of an IP address and a port — it’s the actual endpoint a connection talks to.

In simple language: an IP gets us to the right machine. A port gets us to the right app on that machine. Together they form a socket.

Why We Need Ports

Our laptop runs many things at once — browser, IDE, Spotify, Slack. They all share the same IP address. When a packet arrives, how does the OS know which app it’s for?

Ports. The OS routes incoming packets to the process that owns the matching port.

Port Ranges

0     – 1023      Well-known ports (need root/admin to bind)
1024  – 49151     Registered ports
49152 – 65535     Ephemeral / dynamic ports (used for outbound connections)

Well-Known Ports to Memorize

20, 21    FTP (data, control)
22        SSH
23        Telnet (avoid)
25        SMTP (email send)
53        DNS
80        HTTP
110       POP3
123       NTP (time sync)
143       IMAP
443       HTTPS
465/587   SMTPS / submission
993       IMAPS
995       POP3S
3306      MySQL
5432      PostgreSQL
6379      Redis
8080      Common alt-HTTP
27017     MongoDB

If asked “what port does X use?” — these are the safe bets.

Socket = IP + Port

Client socket:  192.168.1.5 : 51234
Server socket:  93.184.216.34 : 443

When our browser connects to example.com:443, the OS picks an ephemeral port (e.g. 51234) for our side. Now both ends have a socket.

The 5-Tuple — Connection Identity

A TCP connection is uniquely identified by five fields:

5-Tuple

Protocol

TCP / UDP

Source IP

192.168.1.5

Source Port

51234

Destination IP

93.184.216.34

Destination Port

443

Two connections with any one of those values different = two distinct connections. That’s how a server can have thousands of clients all hitting :443 — they each have a unique source IP/port combo.

Multiple Browser Tabs to the Same Site

Open example.com in 3 tabs. We get 3 connections, all to 93.184.216.34:443, but with different source ports on our laptop:

1. 192.168.1.5:51234 -> 93.184.216.34:443
2. 192.168.1.5:51235 -> 93.184.216.34:443
3. 192.168.1.5:51236 -> 93.184.216.34:443

Server distinguishes them via the 5-tuple. Easy.

Listening vs Connected Sockets

A server socket has two states:

LISTENING — bound to a port, waiting for connections (e.g. 0.0.0.0:80).
ESTABLISHED — an active connection with a specific client.

# See listening sockets
sudo lsof -iTCP -sTCP:LISTEN -P -n      # macOS / Linux
ss -tlnp                                 # Linux modern
netstat -an | grep LISTEN                # cross-platform

A Tiny Socket in Code

Sockets are the API every language exposes for network I/O.

// Node.js — TCP server
const net = require('net');

const server = net.createServer((socket) => {
  // socket is a connected socket with a 5-tuple
  console.log('connected:', socket.remoteAddress, socket.remotePort);
  socket.write('hello\n');
  socket.end();
});

server.listen(3000, () => console.log('listening on :3000'));

# Python — TCP client
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  # TCP
s.connect(('example.com', 80))
s.sendall(b'GET / HTTP/1.0\r\nHost: example.com\r\n\r\n')
print(s.recv(4096).decode())
s.close()

Common Gotcha

Ports below 1024 require root/administrator privileges to bind. That’s why dev servers default to 3000, 5173, 8080 — no sudo needed. In production, the app usually binds to a high port and a reverse proxy (nginx/Caddy) listens on 80/443 and forwards.

Interview Tip

The 5-tuple is the cleanest way to answer “how does a server handle thousands of simultaneous clients on the same port?” — each connection is uniquely identified, so the OS keeps them straight.

References

Transport Layer

TCP vs UDP

intermediate tcp udp transport protocols networking

TCP and UDP are the two main transport-layer protocols. They both deliver data between processes, but with different trade-offs.

In simple language: TCP is a reliable phone call — connection, ordering, retries. UDP is a postcard — fire it and hope it arrives.

The Core Difference

TCP (Transmission Control Protocol) — connection-oriented, reliable, ordered, slower.
UDP (User Datagram Protocol) — connectionless, unreliable, unordered, fast and simple.

Neither is “better” — they’re tools for different jobs.

Side-by-Side Comparison

TCP

Connection-oriented (3-way handshake)

Reliable — retries lost packets

Ordered — packets arrive in sequence

Flow + congestion control

Heavier header (~20 bytes)

Slower, higher latency

UDP

Connectionless — just send

Unreliable — no retransmits

Unordered — packets may arrive jumbled

No flow / congestion control

Tiny header (8 bytes)

Fast, low latency

Header Sizes

TCP header (20 bytes minimum, more with options):

| Src Port | Dst Port | Seq # | Ack # | Flags | Window | Checksum | Urgent |

UDP header (just 8 bytes):

| Src Port | Dst Port | Length | Checksum |

That tiny UDP header is part of why it’s fast — less overhead per packet.

When TCP Wins

We need the data to arrive completely and in order:

HTTP / HTTPS — every byte of an HTML page must be correct.
SSH — command-line characters can’t get scrambled.
Email (SMTP, IMAP, POP3) — losing parts of an email is unacceptable.
File transfer (FTP, SFTP) — corrupted files are useless.
Database connections — every query and result must land intact.

When UDP Wins

We care about speed more than perfection, or we’re handling our own reliability:

DNS — single small query/response. Retry by sending again. (DNS over TCP exists for big responses but UDP is the default.)
Video calls / VoIP — a missed millisecond of audio is fine. Waiting for retries would freeze the call.
Online gaming — same idea. Stale state is worse than lost state.
Video streaming (some) — many streamers use UDP-based protocols (QUIC, RTP).
NTP — time sync wants minimal latency.
DHCP — basic broadcast bootstrapping.

QUIC — UDP With Reliability On Top

QUIC (the protocol behind HTTP/3) runs over UDP but rebuilds reliability, ordering, and congestion control in user space. Why? Because TCP is implemented in the kernel and slow to evolve. UDP gives QUIC a clean slate.

So yes — modern HTTPS often runs on UDP. The world is weirder than the textbooks suggest.

A Simple UDP Example

# Server
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  # UDP
s.bind(('0.0.0.0', 9999))
data, addr = s.recvfrom(1024)
print(f'got {data!r} from {addr}')
s.sendto(b'pong', addr)

# Client
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.sendto(b'ping', ('127.0.0.1', 9999))
print(s.recvfrom(1024))

No connection, no handshake. Just send.

Common Gotcha

UDP doesn’t guarantee delivery — but on a healthy LAN, packet loss is near zero. Devs sometimes ship UDP apps that “work in dev” and fall apart on a real network. If we use UDP, we either don’t care about loss or we build our own retry logic.

Interview Tip

When asked “TCP or UDP for X?” — pick based on the question’s hint. “Real-time” or “low latency” or “it’s okay to lose some” → UDP. “File”, “reliable”, “in order” → TCP. Bonus points for mentioning QUIC.

References

TCP 3-Way Handshake

intermediate tcp handshake syn ack transport

Before TCP can ship any actual data, the client and server shake hands with three messages: SYN, SYN-ACK, ACK. This sets up sequence numbers and confirms both sides are alive.

In simple language: it’s like calling someone — “Hello?” “Hello!” “Great, let’s talk.” Three turns and we’re ready.

The Three Messages

SYN — Client says “I want to connect. My starting sequence number is X.”
SYN-ACK — Server says “Got it. My starting sequence number is Y, and I acknowledge yours (ack = X+1).”
ACK — Client says “Confirmed. ack = Y+1.” Done — connection is ESTABLISHED.

The Handshake Timeline

CLIENT

SERVER

SYN seq=100

────────────▶

(LISTEN → SYN_RCVD)

(SYN_SENT)

◀────────────

SYN-ACK seq=500, ack=101

ACK seq=101, ack=501

────────────▶

(ESTABLISHED)

CONNECTION ESTABLISHED

Why Three Messages, Not Two?

Both sides need to synchronize sequence numbers AND prove they can hear each other.

Message 1: client → server. Server learns client is alive + client’s seq#.
Message 2: server → client. Client learns server is alive + server’s seq#.
Message 3: client → server. Server learns its message reached the client.

Without the third message, the server wouldn’t know if its SYN-ACK got through.

Sequence Numbers — Why Random?

The initial sequence number (ISN) is chosen pseudo-randomly. Why not start at 0?

Security — predictable seq numbers let attackers spoof packets in a session.
Avoid collisions — old packets from a previous connection on the same 5-tuple shouldn’t get confused with the new one.

RFC 6528 defines how modern OSes pick ISNs.

Connection States

Each side moves through a state machine. The key states for the handshake:

Client:  CLOSED -> SYN_SENT -> ESTABLISHED
Server:  CLOSED -> LISTEN -> SYN_RCVD -> ESTABLISHED

We can see them live with:

ss -tan         # Linux
netstat -an | grep -E 'SYN|ESTAB'   # macOS / cross-platform

Watching It With tcpdump

sudo tcpdump -i any -n 'tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' and host example.com
# 14:00:01 IP 192.168.1.5.51234 > 93.184.216.34.443: Flags [S], seq 100
# 14:00:01 IP 93.184.216.34.443 > 192.168.1.5.51234: Flags [S.], seq 500, ack 101
# 14:00:01 IP 192.168.1.5.51234 > 93.184.216.34.443: Flags [.], ack 501

[S] = SYN, [S.] = SYN+ACK, [.] = ACK.

SYN Flood Attack

Send tons of SYN packets but never reply with the final ACK. The server keeps half-open connections in SYN_RCVD state, exhausting memory. Mitigation: SYN cookies — the server stores no state until the ACK arrives.

Latency Cost

The handshake takes 1 RTT (round-trip time) before any application data flows. To India ↔ US, that’s ~150ms doing nothing useful. This is one reason QUIC and TLS 1.3 0-RTT exist — to fold the handshake into the first useful message.

TCP Fast Open (TFO)

A modern extension: a client can include data in the initial SYN if it has a valid cookie from a previous connection. Saves an RTT on repeat visits.

Common Gotcha

People say “the handshake exchanges data” — it doesn’t. The handshake just sets up the connection. The first byte of HTTP doesn’t go out until after the third ACK.

Interview Tip

If asked to draw the handshake on a whiteboard, draw two vertical lines (client/server) and three diagonal arrows. Label each with the flags (SYN, SYN-ACK, ACK) and a sample seq/ack. Done — most interviewers want to see exactly that sketch.

References

TCP Connection Termination (4-Way)

intermediate tcp fin close time-wait transport

Closing a TCP connection takes four messages, not three. Each side has to independently say “I’m done sending” and get an acknowledgement. Then there’s a weird state called TIME_WAIT that sysadmins love to complain about.

In simple language: TCP is full-duplex — both sides can send. So both sides have to close their end separately. That’s why we get four messages instead of three.

The Four Messages

FIN (A → B) — A says “I have no more data to send.”
ACK (B → A) — B acknowledges the FIN. (B can still send data to A.)
FIN (B → A) — B says “I’m also done sending.”
ACK (A → B) — A acknowledges. Connection fully closed.

The Termination Timeline

CLIENT (A)

SERVER (B)

FIN seq=1000

────────────▶

(CLOSE_WAIT)

(FIN_WAIT_2)

◀────────────

ACK ack=1001

(still FIN_WAIT_2)

◀────────────

FIN seq=2000

ACK ack=2001

────────────▶

(CLOSED)

A enters TIME_WAIT (~2× MSL)

State Walkthrough

The side that calls close() first is the active closer. The other side is the passive closer.

Active closer:   ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2 -> TIME_WAIT -> CLOSED
Passive closer:  ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED

The active closer is stuck in TIME_WAIT for a while. The passive closer goes straight to CLOSED after the final ACK.

What is TIME_WAIT?

After sending the final ACK, the active closer waits for 2 × MSL (Maximum Segment Lifetime). MSL is typically 60s by default, so TIME_WAIT lasts ~30–120s depending on the OS.

Why Wait?

Two reasons:

The final ACK might be lost. If the other side never sees it, they’ll resend their FIN. We need to be around to ACK it again.
Old duplicate packets must die. A delayed segment from the just-closed connection shouldn’t show up in a brand-new one with the same 5-tuple.

Why TIME_WAIT Causes Headaches

A busy server (think a load balancer making lots of short-lived outbound connections) can pile up tens of thousands of sockets in TIME_WAIT. Each consumes a 5-tuple. We can run out of ephemeral ports.

# Count TIME_WAIT sockets on Linux
ss -tan state time-wait | wc -l

# Useful kernel knobs (Linux)
sysctl net.ipv4.tcp_fin_timeout      # how long FIN_WAIT_2 lingers
sysctl net.ipv4.tcp_tw_reuse         # 1 = reuse TIME_WAIT sockets for outgoing connections

Tuning tcp_tw_reuse=1 is a common fix for proxy servers. Don’t enable it blindly though — read the docs.

Simultaneous Close (Rare)

If both sides send FIN at the same time, we end up in a state called CLOSING and the dance becomes:

A: FIN_WAIT_1 -> CLOSING -> TIME_WAIT
B: FIN_WAIT_1 -> CLOSING -> TIME_WAIT

Cool but rarely seen in practice.

Half-Close

TCP supports a half-close — A says FIN, but keeps reading. This is what shutdown(sock, SHUT_WR) does. Useful for protocols like HTTP/1.0 where the client sends a request, half-closes write, and reads the response.

Reset (RST) — The Hard Hangup

If something goes wrong (connection refused, port not listening, app crashes), the OS sends a RST instead of going through the polite 4-way termination. Both sides immediately tear down. No TIME_WAIT.

Common Gotcha

A common interview answer is “TCP closes with 3 messages, like the open.” Wrong — it’s 4 because each direction is closed independently. The trick is that two of those four can sometimes piggyback into a single segment if both sides are ready, but logically there are still four.

Interview Tip

If asked “why does TIME_WAIT exist?” — both reasons matter (lost ACK + ghost packets). Don’t just say “to be safe.” Mentioning that the active closer pays the cost is bonus.

References

Reliable Delivery & Sequence Numbers

intermediate tcp reliability sequence-numbers ack retransmission

TCP gives us reliable, in-order, exactly-once delivery on top of an unreliable internet. The magic ingredients are sequence numbers, acknowledgements, and retransmissions.

In simple language: every byte gets numbered. The receiver tells the sender “I got up to byte X.” If something goes missing, the sender resends.

The Core Mechanism

Sender numbers each byte with a sequence number.
Receiver sends back acknowledgement (ACK) for the next byte expected.
Sender keeps a timer for unacked data. If the timer fires, retransmit.
Receiver buffers out-of-order data and reorders before delivering to the app.

That’s it. Every other reliability feature is built on these four ideas.

Sequence Numbers Are Per-Byte

Each byte in a TCP stream has its own sequence number. The header carries the seq# of the first byte in the segment.

Segment 1:  seq=1000, length=500   -> covers bytes 1000..1499
Segment 2:  seq=1500, length=500   -> covers bytes 1500..1999
Segment 3:  seq=2000, length=200   -> covers bytes 2000..2199

The receiver replies with ack=2200 once everything up to byte 2199 has arrived. ACKs are cumulative — ack=2200 implicitly acknowledges everything before it.

A Lost Segment Example

Sender                       Receiver
  seq=1000, len=500   ──▶
  seq=1500, len=500   ──▶ (lost!)
  seq=2000, len=500   ──▶
                          ◀──  ack=1500   (got first segment)
                          ◀──  ack=1500   (got 3rd, but 2nd missing!)
                          ◀──  ack=1500   (still waiting)
  retransmit seq=1500 ──▶
                          ◀──  ack=2500   (now caught up)

Three duplicate ACKs is a classic signal — modern TCP triggers fast retransmit (don’t wait for the timer; resend immediately).

Retransmission Timeout (RTO)

The sender doesn’t pick the timeout randomly. It estimates the round-trip time (RTT) continuously and sets RTO based on it.

RTO ≈ smoothed_RTT + 4 * RTT_variance

If a network is fast and stable, RTO is tight. If RTT spikes, RTO grows. If we still don’t get an ACK, RTO doubles on each retry (exponential backoff).

SACK — Selective Acknowledgement

Cumulative ACKs are simple but wasteful — if segment 5 of 10 is lost, the sender might retransmit 5 through 10 even though only 5 was lost.

SACK (Selective ACK) lets the receiver say: “I got up to byte X, AND I separately have bytes Y..Z.” Sender only resends what’s actually missing. RFC 2018.

ack=1500, SACK=2000-2500    means "next expected = 1500, but I also have 2000..2500"

Most modern TCPs negotiate SACK during the handshake.

Detecting Duplicates

Because sequence numbers identify exact bytes, the receiver can tell if a segment is a duplicate (e.g. sender retransmitted but the original eventually arrived). Duplicates are silently dropped at the receiver.

Out-of-Order Delivery

The IP layer doesn’t guarantee order. A segment with seq=2000 might arrive before seq=1500. The receiver:

Buffers the out-of-order segment.
Sends a duplicate ACK for the byte it’s still waiting for.
Once the gap fills, delivers everything to the app in order.

The app never sees out-of-order bytes. That’s the whole point.

Checksums — Detecting Corruption

Every TCP segment has a 16-bit checksum covering the header + payload. If a bit flips on the wire, the checksum fails and the receiver drops the segment without ACKing. The sender will retransmit.

The checksum is weak by modern standards (Ethernet’s CRC and TLS’s MAC are stronger), but it catches casual corruption.

Connection State Tracks All This

Each TCP connection keeps track of:

type TCPState = {
  sndUna: number   // smallest unacked byte
  sndNxt: number   // next byte to send
  rcvNxt: number   // next byte expected from peer
  rttSample: number
  rto: number
  sackBlocks: Array<[number, number]>
}

The OS kernel maintains all this for each socket — and that’s why TCP is “expensive” compared to UDP.

Common Gotcha

TCP is reliable, but not instant. A retransmission can add hundreds of milliseconds. Apps that need real-time delivery (voice, gaming) often pick UDP precisely to avoid TCP’s reliability mechanisms — fresh data is more important than complete data.

Interview Tip

A favorite question: “If TCP is reliable, why do real-time apps avoid it?” — head-of-line blocking. One lost packet stalls everything behind it because TCP must deliver in order. UDP/QUIC sidestep this by not enforcing ordering across all data.

References

Flow Control & Sliding Window

intermediate tcp flow-control sliding-window rwnd transport

Flow control is how TCP makes sure a fast sender doesn’t drown a slow receiver. The mechanism is the sliding window — the receiver tells the sender how much room it has, and the sender sends no more than that.

In simple language: the receiver says “I can handle 32 KB right now.” The sender sends up to 32 KB without waiting, then pauses for an ACK that updates the window.

The Problem

Without flow control, a sender on a fast link could fire off megabytes per second. A receiver running on an old phone might only be able to consume a few KB/sec. The phone’s buffer overflows, packets get dropped, everyone suffers.

Flow control = “send only what I can hold.”

The Receive Window (rwnd)

Every TCP segment carries a Window field (16 bits, optionally scaled). It says: “I can buffer this many more bytes from you right now.”

The sender tracks this and never sends more unacked data than the window allows.

sender sees: rwnd = 32768 bytes, unacked = 20000
            -> can send 12768 more bytes before stopping

The Sliding Window Visualized

Sender's Buffer (sliding window)

ACKED

SENT, NOT ACKED

CAN SEND NOW

CAN'T SEND YET

past

in flight

window

future

• ACKs from receiver slide the window right, freeing space.

• Sender can keep up to (sent - acked) ≤ rwnd bytes in flight.

How ACKs Slide the Window

Sender state: sndUna = 1000, sndNxt = 5000, rwnd = 8000.

That means: bytes 1000–4999 are in flight. We can send up to byte 9000 (1000 + 8000) before stopping.

When an ACK comes in for ack=3000, we slide:

Before: [acked < 1000][in flight 1000..4999][can send 5000..8999]
After:  [acked < 3000][in flight 3000..4999][can send 5000..10999]   <- window slid right

We unlocked 2000 more bytes worth of sending room.

Window Size 0 — “Stop”

If the receiver’s app is too slow, its buffer fills up. The receiver sends an ACK with window = 0. The sender must stop sending until a window update arrives.

But what if that window update is lost? The sender would deadlock forever.

Zero-Window Probes

The sender periodically sends a 1-byte probe to keep the conversation alive. The receiver replies with the current window. If the window is still 0, the probe is rejected. If it’s now 100, the sender resumes.

Window Scaling

The 16-bit Window field maxes out at 65,535 bytes. On modern fast links, that’s tiny. Solution: window scaling option (RFC 7323) — both sides agree on a shift factor at handshake time, multiplying the advertised window by 2^N.

With window scaling, windows can be hundreds of MB.

Bandwidth-Delay Product (BDP)

The “right” window size is the bandwidth-delay product:

BDP (bytes) = bandwidth (bytes/sec) × RTT (sec)

A 1 Gbps link with 100ms RTT needs ~12.5 MB of in-flight data to keep the pipe full. If the window is smaller, we’re wasting bandwidth.

Flow Control vs Congestion Control

Easy mix-up — they’re different things:

Flow control (rwnd) — protects the receiver from overload.
Congestion control (cwnd) — protects the network from overload.

The sender uses min(rwnd, cwnd) as the actual amount it can send. We’ll cover congestion control in the next note.

Inspecting It

# On Linux, see the current send/recv windows for a connection
ss -tin
# State  Recv-Q  Send-Q  ...  cwnd:10  rcv_space:65483

# tcpdump shows the window field
sudo tcpdump -nn -i any 'tcp port 443'
# ... Flags [.], ack 12345, win 502, ...   <- 502 (with scaling factor applied)

Common Gotcha

A 0-window stall looks like the connection is dead to a casual observer. ss shows ESTABLISHED, but no data is moving. Check Recv-Q on the slow side — if it’s full, the app isn’t reading fast enough.

Interview Tip

If asked “why does TCP have a window?” — flow control is the answer. If asked “why does it slide?” — to allow continuous sending without waiting for each ACK individually (that would be a stop-and-wait protocol, very slow).

References

Congestion Control (Slow Start, AIMD)

intermediate tcp congestion-control slow-start aimd cubic bbr

Congestion control is how TCP avoids overloading the network itself — the routers, links, and queues between sender and receiver. Even if the receiver has plenty of buffer, the path in between might not.

In simple language: TCP slowly probes how fast it can go, backs off when there’s loss, and tries again. It does this without anyone telling it the network capacity.

Two Different Window Limits

The sender’s actual sending rate is bounded by the smaller of:

rwnd (receive window) — flow control, protects the receiver.
cwnd (congestion window) — congestion control, protects the network.

in_flight ≤ min(rwnd, cwnd)

The Phases

Classic TCP (Reno-style) has four phases:

Slow Start — exponential ramp-up.
Congestion Avoidance — linear growth (AIMD).
Fast Retransmit — resend on 3 duplicate ACKs.
Fast Recovery — recover quickly after fast retransmit.

Phase 1 — Slow Start

Despite the name, slow start is fast. We start with cwnd = 1 MSS (or 10, in modern Linux thanks to RFC 6928). On every ACK, cwnd grows by 1 MSS.

RTT 1:  cwnd = 1
RTT 2:  cwnd = 2     (doubled — got 1 ACK -> +1)
RTT 3:  cwnd = 4     (doubled again)
RTT 4:  cwnd = 8
RTT 5:  cwnd = 16
...

Effectively, cwnd doubles every RTT. We blast forward until we hit a threshold or see a loss.

Phase 2 — Congestion Avoidance (AIMD)

When cwnd reaches ssthresh (slow-start threshold), we shift to AIMD — Additive Increase, Multiplicative Decrease:

Additive Increase: cwnd grows by 1 MSS per RTT (much slower).
Multiplicative Decrease: on loss, cwnd is halved.

no loss   ->  cwnd += 1 per RTT     (linear)
loss      ->  cwnd /= 2             (cliff)

This sawtooth pattern is the signature look of TCP’s bandwidth graph.

Phase 3 — Fast Retransmit

We don’t always wait for the timer. Three duplicate ACKs are a strong signal that one segment was lost (but later ones got through). Sender immediately resends the missing segment instead of waiting for RTO.

Phase 4 — Fast Recovery

After fast retransmit, instead of going back to slow start (cwnd=1), we set:

ssthresh = cwnd / 2
cwnd     = ssthresh

…and resume in congestion avoidance. We lost a packet, not the whole network. No need to start from zero.

The Sawtooth

cwnd over time (Reno)

cwnd
  │              ╱│           ╱│        ╱│
  │            ╱  │         ╱  │      ╱  │
  │          ╱    │       ╱    │    ╱    │
  │        ╱      │     ╱      │  ╱      │
  │      ╱        ▼   ╱        ▼ ╱       ▼
  │    ╱                                       (slow start)
  │  ╱
  │╱
  └─────────────────────────────────▶  time
       slow │  congestion avoidance (AIMD)
       start│  loss → halve → grow linearly

The slow-start phase is the steep ramp at the start. After the first loss, we drop to half and keep doing the linear-up + halve-on-loss dance forever.

cwnd vs rwnd

cwnd is internal to the sender. The receiver doesn’t know it.
rwnd is advertised by the receiver in every ACK.
The sender takes the minimum.

A high-bandwidth, low-loss network often has cwnd as the bottleneck early, and rwnd later (once cwnd > BDP).

Algorithm Variants

Reno / NewReno — the classic AIMD algorithm. RFC 5681.
CUBIC — Linux default since ~2006. Cwnd grows as a cubic function of time since the last loss — fast recovery to the prior peak, then careful exploration above it. Better for high-bandwidth long-RTT links.
BBR (Bottleneck Bandwidth and RTT) — Google, ~2016. Doesn’t use loss as a signal at all. Models the path’s bandwidth and RTT and paces sending. Often beats CUBIC on lossy or buffered networks.
Vegas — uses RTT increase (queue buildup) as a signal before loss happens. Niche.

# Linux — see and change the algo
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = cubic

sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

ECN — Don’t Wait For Loss

Explicit Congestion Notification: routers can mark packets as “I’m congested” instead of dropping them. Sender reacts the same as if there was a loss but no actual data was lost. Requires ECN-aware routers and endpoints.

Common Gotcha

People conflate “TCP is reliable” with “TCP is fast.” In high-loss networks, TCP can crawl because every loss halves cwnd. UDP-based protocols like QUIC + BBR are often faster in those conditions, even though they have to redo reliability themselves.

Interview Tip

The big four to remember: slow start, congestion avoidance, fast retransmit, fast recovery. Plus AIMD = additive increase, multiplicative decrease. If you can sketch the sawtooth on a whiteboard, you’ve answered 90% of TCP-congestion interview questions.

References

Network Layer & Routing

IP Routing & Routers

intermediate routing router ip default-gateway longest-prefix-match

A router is a device that forwards IP packets between networks. Each packet’s destination IP gets looked up in a routing table to decide which interface (and next hop) to send it out of.

In simple language: routers are the postal-sorting offices of the internet. They look at the destination, check their map, and forward toward the next office.

What’s in a Routing Table?

A routing table is a list of rules: “if destination matches this prefix, send the packet to this next hop via this interface.”

# Linux
ip route
# default via 192.168.1.1 dev wlan0
# 169.254.0.0/16 dev wlan0 scope link
# 192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.42

# macOS
netstat -rn -f inet

Each row has at minimum:

Destination prefix (e.g. 192.168.1.0/24)
Next-hop gateway (or “directly connected”)
Interface to send out
Optional metric/weight

Default Gateway

When the destination doesn’t match any specific route, the router falls back to 0.0.0.0/0 — the default route. The next hop for that route is the default gateway.

For our home laptop, the default gateway is usually the home router. Anything not in the LAN goes to it, and it figures out the rest.

Longest Prefix Match

When multiple routes match the destination, the router picks the most specific one — the longest prefix.

Routing table:
  10.0.0.0/8     -> next-hop A
  10.1.0.0/16    -> next-hop B
  10.1.2.0/24    -> next-hop C
  0.0.0.0/0      -> next-hop default

Destination: 10.1.2.50
Match candidates: /8, /16, /24, /0
Winner: /24 (longest)  ->  send to next-hop C

This is THE rule of IP routing. Memorize it.

A Hop’s Job

For every packet arriving at a router:

Decapsulate the Ethernet frame (verify CRC, accept if dst MAC = mine).
Decrement the IP TTL. If TTL == 0, drop the packet and send back an ICMP “TTL exceeded.”
Lookup destination IP in the routing table (longest prefix match).
ARP for the next-hop’s MAC (if not already cached).
Re-encapsulate with a new Ethernet header (new dst MAC, our src MAC).
Forward out the chosen interface. Update IP checksum.

The IP source/destination addresses stay the same end-to-end. The MACs change at every hop.

Example: My Laptop to a Server

[laptop 192.168.1.42] -> [home router 192.168.1.1] -> [ISP edge] -> [backbone] -> [server]

At each step, only the next-hop MAC changes. IP src and dst stay constant.

We can visualize hops:

traceroute google.com
# 1  192.168.1.1   (home router)         1.2 ms
# 2  10.0.0.1      (ISP first hop)       8.4 ms
# 3  72.x.x.x      (regional)            12.1 ms
# ...

Static vs Dynamic Routing

Static routes — manually configured. Fine for small/stable networks.
Dynamic routing — routers exchange routes automatically using protocols (RIP, OSPF, BGP). Self-healing. Required at scale.

We’ll cover algorithms in the next note. For now: static = manual, dynamic = automatic.

Default Gateway From the Host’s Side

A regular host (laptop, server) doesn’t usually run routing protocols. It has just one or two routes:

default via <gateway>     # everything not local goes here
<my subnet> dev <iface>   # local traffic stays on the LAN

That’s why on a misconfigured machine, “I can ping local IPs but not the internet” usually means gateway is wrong.

Forwarding vs Routing

A small distinction:

Routing = the process of building the routing table (running protocols, learning paths).
Forwarding = the actual per-packet decision based on the table.

Routers do both. The forwarding layer is hot-path and often hardware-accelerated.

Inspecting a Linux Box as a Router

# Enable IP forwarding (turn a Linux box into a router)
sudo sysctl -w net.ipv4.ip_forward=1

# Add a static route
sudo ip route add 10.10.10.0/24 via 192.168.1.50

# Show ARP cache
ip neigh

# See live forwarding decisions
ip route get 8.8.8.8
# 8.8.8.8 via 192.168.1.1 dev wlan0 src 192.168.1.42

Common Gotcha

A common bug: two routes to the same destination with the same prefix length but different metrics. Some OSes load-balance, others pick one based on metric. This is ECMP (Equal-Cost Multi-Path) when load-balanced — useful but can confuse traceroutes.

Interview Tip

The two answers interviewers love: longest prefix match and default gateway. If the question is “how does my packet to Google get there?” walk through: laptop → default gateway → ISP → BGP-routed path → Google’s edge. Shows you understand the layers fit together.

References

Routing Algorithms (Distance Vector, Link State)

intermediate routing ospf rip bgp distance-vector link-state

Routers don’t manually know every network on the internet — they learn routes from each other using routing protocols. There are two main families: distance vector and link state, plus BGP for the global internet.

In simple language: routers gossip about which networks they can reach and how far away those networks are. From that gossip, each router computes its forwarding table.

The Two Families

Distance Vector — “I tell my neighbors what I can reach and how far it is.” Bellman-Ford under the hood. Example: RIP.
Link State — “Everyone broadcasts a complete map of their links. Each router computes shortest paths locally with Dijkstra.” Example: OSPF.

Plus a third for the internet:

Path Vector (BGP) — like distance vector but with the full AS-path attached, used between Autonomous Systems.

Interior vs Exterior

Interior Gateway Protocols (IGP) — within a single organization / AS. RIP, OSPF, IS-IS, EIGRP.
Exterior Gateway Protocols (EGP) — between ASes. BGP is the only one that matters today.

Distance Vector: RIP

RIP (Routing Information Protocol) is the simplest IGP.

Each router knows its directly-connected networks.
Periodically (every 30s), it tells its neighbors: “Here’s my list of networks and the hop count to each.”
Neighbors update their own tables: “I can reach X via you, hop count = your_count + 1.”

This is the Bellman-Ford algorithm in distributed form.

RIP messages from router A:
  192.168.1.0/24  hop=0  (directly connected)
  10.0.0.0/8      hop=1  (learned from neighbor B)
  172.16.0.0/12   hop=2

Router B receives -> if "via A" gives a shorter path than current, update.

RIP’s Problems

Slow convergence. It can take minutes for a topology change to propagate.
Count to infinity. A loop can cause hop counts to climb forever. RIP caps at 15 (“infinity”) to limit the damage.
Doesn’t scale beyond small networks.

Modern networks rarely use RIP. It’s a teaching example.

Link State: OSPF

OSPF (Open Shortest Path First) is the modern IGP. It uses Dijkstra’s algorithm.

Each router discovers its directly-connected neighbors (Hello packets).
Each router floods a Link-State Advertisement (LSA) describing its links and costs.
Every router builds an identical map of the entire area (the Link-State Database).
Each router runs Dijkstra locally to compute shortest paths to every destination.

Result: fast convergence, loop-free, scales to thousands of routers (with areas).

OSPF Concepts

Cost — typically based on link bandwidth (cost = ref_bw / link_bw).
Areas — break a big OSPF domain into sub-domains. Area 0 is the backbone.
Hello timers — neighbors exchange Hellos every few seconds; 4 missed Hellos = neighbor down.

IS-IS

IS-IS is OSPF’s older cousin — link-state, similar idea, common in large ISPs because it’s protocol-agnostic (carries IPv4 and IPv6 routes equally well).

Comparing Distance Vector vs Link State

                  Distance Vector (RIP)        Link State (OSPF)
Algorithm         Bellman-Ford                 Dijkstra
Knowledge         Only what neighbors say      Full topology of area
Convergence       Slow                         Fast
CPU/memory        Low                          Higher
Loops             Possible (count to infinity) Avoided by design
Scale             Small networks               Large enterprises

BGP — Routing the Internet

BGP (Border Gateway Protocol) connects the Autonomous Systems (ASes) that make up the internet. An AS is a network operated by a single organization (your ISP, AWS, Google, Cloudflare).

BGP is path vector — it advertises full AS paths, not just costs.
Decisions are policy-based, not just shortest-path: “prefer routes through cheaper/contracted peers.”
It’s TCP-based (port 179), unlike OSPF which has its own protocol number.

AS 64500 advertises 198.51.100.0/24 as path: [64500]
AS 64501 receives, prepends itself: [64501, 64500]
AS 64502 receives, prepends: [64502, 64501, 64500]

The AS_PATH lets BGP detect loops: if a router sees its own AS in the path, drop it.

Why BGP Matters

When BGP misconfigurations happen, large parts of the internet go dark. Famous outages:

Pakistan / YouTube (2008) — accidental hijack via misadvertised prefix.
Cloudflare / Verizon (2019) — leaked routes from a small ISP redirected major traffic.
Facebook (2021) — withdrew its own routes, took itself off the internet.

Quick Algorithm Refresher

Bellman-Ford: relaxes edges, handles negative weights, runs in O(V·E). Distance vector basis.
Dijkstra: greedy, non-negative weights, O(E log V) with a heap. Link state basis.

Common Gotcha

People conflate routing protocols with routed protocols. IP is the routed protocol — what gets carried. OSPF/BGP/RIP are routing protocols — they decide where IP packets go. Different layer, different job.

Interview Tip

Three crisp answers for routing protocols:

RIP = distance vector = Bellman-Ford = slow, small networks
OSPF = link state = Dijkstra = enterprise IGP
BGP = path vector = the internet’s glue

References

NAT (Network Address Translation)

intermediate nat ipv4 private-ip port-forwarding networking

NAT (Network Address Translation) lets many devices on a private network share a single public IP address. Our home router does this for every device behind it — laptop, phone, smart fridge, all sharing one public IP.

In simple language: NAT rewrites the source IP and port on outgoing packets and remembers the mapping so it can rewrite replies on the way back.

Why NAT Exists

IPv4 has only ~4.3 billion addresses, and we long since blew past that. Without NAT, every connected device would need its own public IP. NAT lets ISPs hand out one public IP per home and serve everyone behind it.

IPv4 address conservation (the main reason).
A side effect: hosts inside aren’t directly reachable from outside without explicit forwarding — accidental firewalling.

How It Rewrites

Say my laptop (192.168.1.42:51234) connects to 93.184.216.34:443.

Outgoing from laptop:
  src=192.168.1.42:51234   dst=93.184.216.34:443

Router rewrites and stores in NAT table:
  src=203.0.113.5:60001    dst=93.184.216.34:443
  (203.0.113.5 = router's public IP, 60001 = router's chosen port)

Reply from server:
  src=93.184.216.34:443    dst=203.0.113.5:60001

Router looks up port 60001 in NAT table -> rewrites:
  src=93.184.216.34:443    dst=192.168.1.42:51234

Hands packet to laptop. Laptop has no idea translation happened.

The NAT table entry is keyed by (public_port → private_ip:private_port + remote). It expires after a timeout if there’s no traffic.

Types of NAT

Static NAT — one private IP maps to one fixed public IP. Like a permanent reservation.
Dynamic NAT — pool of public IPs, mapping changes per session.
PAT / NAPT (Port Address Translation) — many private IPs share one public IP using different ports to disambiguate. This is what your home router does. Also called NAT overload.

When people say “NAT” today, they almost always mean PAT.

NAT Behavior Types (for P2P)

For peer-to-peer apps (WebRTC, gaming), the type of NAT matters:

Full-cone (one-to-one) — once an internal host has a public mapping, anyone can reach it via that mapping. Most permissive.
Restricted-cone — same mapping, but only the external host the internal one already contacted can use it.
Port-restricted-cone — even stricter: must match both IP and port.
Symmetric — a new external destination gets a new public port. Hardest to traverse — STUN often fails, requiring TURN relays.

Symmetric NAT is the bane of P2P apps.

Port Forwarding

A device behind NAT can’t normally accept incoming connections — the router has no mapping yet. Port forwarding is a manual rule: “incoming packets to my public IP on port 22000 → forward to 192.168.1.50:22 (my home server’s SSH).”

Public:  203.0.113.5:22000
            ↓ (port forward rule)
Private: 192.168.1.50:22

Common for self-hosted services, game servers, etc.

Hairpinning (NAT Loopback)

Imagine our home server is at private IP 192.168.1.50 and a port-forward rule exposes it as 203.0.113.5:8080. From inside the LAN, can we reach 203.0.113.5:8080?

If the router supports hairpinning — yes, it’ll loop the request back to the internal host. If not, we have to use the private IP from inside. Older routers often fail at this.

What Breaks With NAT

End-to-end principle — outside hosts can’t initiate to inside hosts.
Some protocols — FTP active mode, SIP/VoIP, IPsec passthrough can struggle because they embed IP addresses in payloads.
Logging / forensics — many users behind one IP makes attribution hard.
P2P — needs hole-punching tricks (STUN, ICE, TURN).

Carrier-Grade NAT (CGNAT)

ISPs can run NAT on their side too, sharing one public IP across thousands of customers. We end up double-NATted: home router NAT → ISP NAT → internet. Often shows as a public IP in the 100.64.0.0/10 range (RFC 6598).

This makes self-hosting from home impossible without VPN tunnels or solutions like Cloudflare Tunnel / Tailscale Funnel.

Inspecting NAT

# Linux: see the conntrack table (tracks NAT mappings)
sudo conntrack -L
# tcp 6 ESTABLISHED src=192.168.1.42 dst=93.184.216.34 sport=51234 dport=443 \
#                  src=93.184.216.34 dst=203.0.113.5 sport=443 dport=60001

# Check public IP from inside
curl ifconfig.me

Common Gotcha

NAT is not a firewall, but it acts firewall-ish by accident. Inbound connections to unmapped ports just have nowhere to go. Don’t rely on NAT for security — use a proper firewall.

Interview Tip

Two reasons NAT was invented: (1) IPv4 exhaustion (the real one), (2) simple isolation for private LANs. The trade-off: it breaks end-to-end connectivity, which is why IPv6 enthusiasts want NAT to die.

References

ICMP, ping & traceroute

intermediate icmp ping traceroute ttl diagnostics

ICMP (Internet Control Message Protocol) is the network’s signalling layer — it’s how routers and hosts send each other diagnostic and error messages. ping and traceroute are two diagnostic tools built on top of it.

In simple language: when something goes wrong with a packet (or we’re just curious about the path), ICMP is how the network tells us about it.

What ICMP Carries

ICMP is a Layer 3 protocol that sits beside IP. Common message types:

Type 0   Echo Reply              (ping reply)
Type 3   Destination Unreachable (sub-codes: net, host, port unreachable)
Type 5   Redirect                (use a different gateway)
Type 8   Echo Request            (ping)
Type 11  Time Exceeded           (TTL expired — used by traceroute)
Type 12  Parameter Problem

ICMP messages don’t have ports — they’re addressed to whole hosts.

ping — Echo Request / Reply

ping sends an ICMP Echo Request (type 8) and waits for an Echo Reply (type 0).

ping -c 4 example.com
# PING example.com (93.184.216.34): 56 data bytes
# 64 bytes from 93.184.216.34: icmp_seq=0 ttl=53 time=82.4 ms
# 64 bytes from 93.184.216.34: icmp_seq=1 ttl=53 time=81.9 ms
# 64 bytes from 93.184.216.34: icmp_seq=2 ttl=53 time=82.1 ms
# 64 bytes from 93.184.216.34: icmp_seq=3 ttl=53 time=83.0 ms

Three useful numbers:

time — round-trip time (RTT) in ms.
ttl — TTL value left on the reply. Tells us roughly how many hops we’re away (start TTL minus this).
packet loss % — at the end of ping’s summary.

What ping Does and Doesn’t Tell Us

It tells us:

Is the host reachable?
Roughly how fast is the path?
Is there packet loss?

It does NOT tell us:

Is the actual app (web server, DB) working — only the kernel responds to ICMP.
The full path to the host.
Whether ICMP is being blocked (a “no reply” might mean the host is up but firewalled).

Many production hosts deliberately drop ICMP. “Ping doesn’t work” ≠ “host is down.”

TTL — How Packets Expire

Every IP packet has a Time To Live field. Each router decrements it by 1. When TTL hits 0, the packet is dropped and an ICMP Time Exceeded is sent back to the source.

laptop sends packet with TTL=64
hop 1: TTL=63
hop 2: TTL=62
...
hop 64: TTL=0  -> drop, send ICMP Time Exceeded back

TTL exists to prevent packets from looping forever in case of routing bugs.

traceroute — Abusing TTL on Purpose

traceroute (or tracert on Windows) uses TTL as a clever trick:

Send a probe with TTL=1. The first router decrements to 0, drops it, replies with ICMP Time Exceeded. We learn hop 1’s IP.
Send a probe with TTL=2. Reaches hop 2 before expiring. We learn hop 2’s IP.
Repeat with TTL=3, 4, 5… until we reach the destination (which replies normally instead of with Time Exceeded).

traceroute example.com
#  1  192.168.1.1   (home router)        1.1 ms
#  2  10.0.0.1      (ISP first hop)      8.4 ms
#  3  72.x.x.x                           12.0 ms
#  4  *  *  *                            (some hop blocking ICMP)
#  5  93.184.216.34 (example.com)        82.0 ms

traceroute’s Probe Type

What does traceroute send? Depends on the OS:

Linux / BSD traceroute — sends UDP probes to incrementing high ports. Replies are ICMP Time Exceeded (mid-hops) or ICMP Port Unreachable (final).
Windows tracert — sends ICMP Echo Requests directly.
Modern traceroute -T — uses TCP SYN probes (more likely to get through firewalls).

mtr — traceroute + ping

mtr is a fantastic tool that runs traceroute continuously and shows packet loss per hop. Great for diagnosing flaky paths.

mtr --report --report-cycles 10 example.com

Things That Block ICMP

Cloud firewalls (AWS security groups, GCP firewall rules) often disable ICMP by default.
Some ISPs rate-limit ICMP from routers, causing weird “lost” middle hops in traceroute.
DDoS-protected hosts (Cloudflare, etc.) may drop ICMP entirely.

So when traceroute shows * * * for a hop, it’s usually fine — that router is just silent on ICMP.

ICMPv6

IPv6 has its own ICMP (ICMPv6) which is more important than ICMPv4 — Neighbor Discovery (the IPv6 ARP replacement) and Path MTU Discovery rely on it. Blocking ICMPv6 entirely breaks IPv6 networks.

Common Gotcha

Some people use ping to “test the internet.” If the target host blocks ICMP, ping fails — but TCP/UDP services on that host might be perfectly fine. A more reliable test is curl -I https://1.1.1.1 or nc -vz 8.8.8.8 53.

Interview Tip

Two crisp explanations:

ping = ICMP echo request/reply, RTT measurement.
traceroute = clever abuse of TTL, sending packets with TTL=1, 2, 3… and reading the ICMP Time Exceeded responses.

References

VPN Basics

intermediate vpn tunneling ipsec wireguard openvpn

A VPN (Virtual Private Network) is an encrypted tunnel that carries our traffic between two points — usually our device and a remote server. To everyone else, the traffic looks like opaque encrypted bytes.

In simple language: a VPN wraps our packets in another packet, encrypts the inside, and sends it through the internet as if we were on a private network.

What a VPN Actually Does

Two main goals:

Privacy / security — encrypt traffic so the local network (coffee shop Wi-Fi, ISP) can’t read it.
Access — make our device appear as if it’s on a remote network (corporate LAN, home network, another country).

A side effect of #2: services see the VPN endpoint’s IP, not ours. That’s why VPNs are used for geo-bypass.

Tunneling — The Core Idea

The original packet becomes the payload of an outer packet:

Original (inner):
  [ IP hdr (10.0.0.5 -> 10.0.0.99) | TCP | HTTP request ]

Wrapped + encrypted (outer):
  [ IP hdr (203.0.113.5 -> 203.0.113.50) | UDP | encrypted{ inner } ]

The inside is invisible until it reaches the other end of the tunnel, which decrypts and forwards.

Two Common Topologies

Client-to-site (Remote Access VPN) — single device connects to a corporate/home network. Most consumer VPNs. (e.g. WireGuard on a laptop talking to a home server.)
Site-to-site VPN — two networks (offices) join into one virtual network via a tunnel between their routers. Common with IPsec.

VPN Protocol Comparison

IPsec

L3 standard, kernel-level

Often used site-to-site, NAT-tricky, complex config (IKE)

OpenVPN

User-space, TLS-based

Mature, flexible, slower than kernel options

WireGuard

Modern, kernel-level, UDP

Tiny codebase, fast, opinionated crypto

IPsec (in slightly more detail)

Two modes: transport (encrypts payload only) and tunnel (encrypts whole packet, adds new IP header).
Uses IKE (Internet Key Exchange) for negotiating keys.
Standardized, supported everywhere — but config is famously fiddly.

OpenVPN

Runs in user space over TCP or UDP (UDP preferred for performance).
Uses TLS for the control channel and a separate channel for data.
Battle-tested but heavier than WireGuard.

WireGuard

~4000 lines of code in the Linux kernel (vs OpenVPN’s hundreds of thousands).
Pure UDP, fixed modern crypto suite (Curve25519, ChaCha20-Poly1305).
Stateless from the network’s point of view — peers exchange tiny handshake every couple of minutes.
Easy config: a public key, an endpoint, an allowed-IPs list per peer.

# wg0.conf — minimal WireGuard config
[Interface]
PrivateKey = <ours>
Address    = 10.10.0.2/24

[Peer]
PublicKey  = <theirs>
Endpoint   = vpn.example.com:51820
AllowedIPs = 10.10.0.0/24, 0.0.0.0/0   # 0.0.0.0/0 = route everything through tunnel
PersistentKeepalive = 25

Split Tunneling

Sometimes we don’t want all traffic going through the VPN — only specific subnets.

Full tunnel — AllowedIPs = 0.0.0.0/0: every packet goes through the VPN.
Split tunnel — only specified routes go through the VPN; the rest goes via the regular internet.

Split tunnels save bandwidth and keep latency low for non-corporate traffic, but they leak your real IP to anything that’s not in the corporate range. Security-strict orgs disable split tunneling.

What a VPN Doesn’t Hide

The VPN endpoint sees everything you send through it. Trust matters — choose providers carefully.
DNS leaks: if our DNS resolver isn’t routed through the tunnel, our queries reveal what we’re browsing.
Browser fingerprinting, cookies, login state — VPN doesn’t help.

Modern Alternatives

Tailscale / ZeroTier — mesh-style VPNs. Each device gets a stable IP in a virtual network; the control plane handles NAT traversal automatically.
Cloudflare WARP — basically a free consumer VPN, used to be a CDN feature.
Zero Trust Network Access (ZTNA) — replaces VPNs with per-app identity-based gates. The current corporate trend.

Interview Tip

If asked “how does a VPN work?” — three sentences cover it: tunneling (wrap one packet inside another), encryption (so anyone in the middle can’t read it), and routing (the VPN gateway forwards the inner packet to the real destination). Bonus: mention WireGuard as the modern default and split-tunneling as a common configuration.

Common Gotcha

A VPN does NOT make us anonymous — it just shifts trust from the ISP to the VPN provider. The provider can log just as much as the ISP could. Privacy comes from the provider’s policies (and audits), not from the technology.

References

Application Layer Protocols

HTTP Basics (Methods, Status Codes, Headers)

beginner http methods status-codes headers web

HTTP (HyperText Transfer Protocol) is the language browsers and servers use to talk. Every webpage we open, every API we call from our app — it’s HTTP under the hood.

In simple language: a client says “give me this thing” or “here’s some data, do something with it,” and the server replies with a status and a body.

Methods (Verbs)

Methods describe what we want to do with a resource.

GET — fetch a resource. Read-only. Safe and idempotent.
POST — create something new, or trigger an action. Not idempotent.
PUT — replace a resource entirely. Idempotent.
PATCH — partially update a resource.
DELETE — remove a resource. Idempotent.
HEAD — same as GET but no response body. Useful for checking if something exists.
OPTIONS — ask the server what methods/headers it allows (used by CORS preflight).

# Fetch a user
curl -X GET https://api.example.com/users/42

# Create a user
curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -d '{"name":"Manish"}'

# Replace a user
curl -X PUT https://api.example.com/users/42 \
  -d '{"name":"Manish","email":"m@example.com"}'

# Partial update
curl -X PATCH https://api.example.com/users/42 -d '{"email":"new@example.com"}'

# Delete
curl -X DELETE https://api.example.com/users/42

Status Codes

The server replies with a 3-digit code. The first digit tells us the category.

1xx Informational — 100 Continue, 101 Switching Protocols (used for WebSocket upgrade).
2xx Success — 200 OK, 201 Created, 204 No Content.
3xx Redirection — 301 Moved Permanently, 302 Found, 304 Not Modified (cache hit).
4xx Client Error — we messed up. 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests.
5xx Server Error — server messed up. 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout.

Quick mnemonic: 1 = hold on, 2 = here you go, 3 = look elsewhere, 4 = your fault, 5 = my fault.

Common Headers

Headers carry metadata. Sent on both requests and responses.

Request headers:

Host — which domain we’re hitting (mandatory in HTTP/1.1).
User-Agent — what client is making the request (browser, curl, app).
Accept — what content types we can handle (application/json).
Authorization — credentials, usually a bearer token.
Cookie — session info from a previous response.
Content-Type — type of the body we’re sending (application/json).

Response headers:

Content-Type — type of body the server is returning.
Content-Length — body size in bytes.
Set-Cookie — server asks the client to store a cookie.
Cache-Control — caching rules (no-cache, max-age=3600).
Location — where to redirect to (for 3xx responses).

A Full Request and Response

Here’s what actually goes over the wire when we hit a URL:

GET /users/42 HTTP/1.1
Host: api.example.com
User-Agent: curl/8.1.2
Accept: application/json
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9...

And the server replies:

HTTP/1.1 200 OK
Date: Sat, 03 May 2026 10:00:00 GMT
Content-Type: application/json
Content-Length: 58
Cache-Control: max-age=60

{"id":42,"name":"Manish","email":"manish@example.com"}

Notice the blank line — that separates headers from body. Always there.

Interview Tip

Don’t memorize every status code. Remember the categories and the famous ones (200, 301, 304, 400, 401, 403, 404, 500, 502, 503). For methods, be ready to explain idempotency — calling PUT/DELETE multiple times has the same effect as once, POST does not.

References

HTTP/1.0 vs 1.1 vs 2 vs 3 (QUIC)

intermediate http http2 http3 quic performance tcp

HTTP has gone through four major versions. Each one fixed a bottleneck of the previous one.

In simple language: HTTP/1.0 opened a new connection per request. 1.1 reused connections. HTTP/2 sent multiple things at once on one connection. HTTP/3 ditched TCP entirely.

HTTP/1.0 (1996)

One TCP connection per request. Open, send request, get response, close.

The cost: TCP handshake every single time. Loading a page with 30 images? 30 handshakes. Painfully slow.

HTTP/1.1 (1997)

Introduced persistent connections (keep-alive) by default. The TCP socket stays open and we can reuse it for multiple requests.

It also added pipelining — sending multiple requests without waiting for responses. But responses still had to come back in order, so a slow response blocked all the ones behind it (head-of-line blocking). Most browsers never enabled pipelining because of this.

To speed things up, browsers opened 6 parallel connections per domain. Better than nothing, but wasteful.

HTTP/2 (2015)

Built on Google’s SPDY. The big idea: multiplexing. One TCP connection, many concurrent streams.

Key features:

Binary protocol instead of text — faster to parse.
Stream multiplexing — multiple requests/responses interleaved on a single connection.
Header compression (HPACK) — repeated headers like User-Agent get compressed.
Server push — server can send resources the client hasn’t asked for yet (later deprecated, rarely used well).

Problem: HTTP/2 still runs over TCP. If a single TCP packet is lost, all streams pause until it’s retransmitted. TCP-level head-of-line blocking.

HTTP/3 (2022)

Same semantics as HTTP/2 but built on QUIC instead of TCP. QUIC runs on UDP.

Why this matters:

Each HTTP stream is an independent QUIC stream. A lost packet only stalls its own stream, not all of them.
TLS 1.3 is baked in. The handshake is faster — often 0-RTT for repeat visits.
Connection migration: switching from Wi-Fi to mobile data doesn’t drop the connection (uses a connection ID, not IP+port).

In simple language: HTTP/3 is HTTP/2’s idea done right, on a transport that doesn’t get in the way.

Side-by-Side Comparison

HTTP/1.0

Connection: new per request

HOL block: N/A (1 at a time)

Transport: TCP

Format: text

HTTP/1.1

Connection: keep-alive

HOL block: at app layer

Transport: TCP

Format: text

HTTP/2

Connection: 1 multiplexed

HOL block: at TCP layer

Transport: TCP + TLS

Format: binary + HPACK

HTTP/3

Connection: QUIC streams

HOL block: none

Transport: UDP + QUIC

Format: binary + QPACK

Checking Which Version We Use

# curl shows the negotiated version
curl -I --http2 https://www.cloudflare.com
curl -I --http3 https://www.cloudflare.com

# In Chrome DevTools → Network → Protocol column shows h2, h3, http/1.1

Interview Tip

The key story to tell: head-of-line blocking moved up the stack and finally got solved. 1.1 had it at the request level, 2 had it at the TCP level, 3 fixed it with QUIC. If you can explain that, you’ve nailed the evolution.

References

DNS Deep Dive (Recursive vs Iterative, Records)

intermediate dns resolver records caching networking

DNS (Domain Name System) is the phonebook of the internet. We type gyaan.pman47.cc, DNS turns it into an IP like 144.24.126.230, and the browser knows where to connect.

In simple language: humans like names, computers like numbers. DNS bridges the two.

The Players

Stub resolver — the tiny client on our OS that asks questions.
Recursive resolver — does the legwork on our behalf (e.g., 8.8.8.8, 1.1.1.1, our ISP’s resolver).
Root servers — 13 logical servers that know where to find TLD servers.
TLD servers — handle a top-level domain like .com, .cc, .in.
Authoritative servers — the actual owners of a domain’s records.

Recursive vs Iterative

Recursive query: the client asks one server “give me the answer,” and that server does whatever it takes to find it. This is what we do when we ask 1.1.1.1.

Iterative query: the server replies “I don’t have it, but ask this other server.” The asker keeps following the trail. This is what the recursive resolver does on the backend.

How `gyaan.pman47.cc` Resolves

Browser

→ "What's the IP of gyaan.pman47.cc?"

Resolver

recursive (1.1.1.1) — checks cache first

Root

→ "Don't know, ask the .cc TLD server"

TLD .cc

→ "Ask pman47.cc's authoritative NS"

Authoritative

→ "gyaan.pman47.cc = 144.24.126.230"

Resolver

caches answer, returns to browser

Browser

opens TCP connection to 144.24.126.230

The browser → resolver hop is recursive. The resolver → root → TLD → authoritative chain is iterative.

Common Record Types

Record	What it points to
A	IPv4 address
AAAA	IPv6 address
CNAME	Alias to another domain (`www` → `example.com`)
MX	Mail exchange server (priority + hostname)
TXT	Arbitrary text (SPF, DKIM, domain verification)
NS	Authoritative nameservers for the domain
PTR	Reverse DNS — IP back to a hostname
SRV	Service location (host + port for things like SIP)

TTL & Caching

Every record has a TTL (time-to-live, in seconds). It tells resolvers how long to cache the answer.

Low TTL (60s) — fast propagation, more queries, more load.
High TTL (86400s = 1 day) — fewer queries, slow updates if we move servers.

Caching happens at multiple layers: browser, OS stub resolver, recursive resolver. That’s why a DNS change might take hours to “propagate” — old caches are still alive.

Tools to Inspect DNS

# Quick lookup
dig gyaan.pman47.cc

# Specific record type
dig pman47.cc MX

# Trace the full iterative resolution
dig +trace gyaan.pman47.cc

# Use a specific resolver
dig @1.1.1.1 gyaan.pman47.cc

# nslookup — older but available everywhere
nslookup gyaan.pman47.cc

Interview Tip

When asked “what happens when we type a URL,” DNS is the first major step. Be ready to draw the resolver/root/TLD/authoritative chain and mention caching at every layer. Bonus points for talking about DNS over HTTPS (DoH) and DNSSEC for security.

References

DHCP

beginner dhcp ip-address dora networking lan

DHCP (Dynamic Host Configuration Protocol) is what assigns our laptop an IP address the moment we connect to Wi-Fi. Without it, we’d manually configure IP, subnet mask, gateway, and DNS on every device.

In simple language: DHCP is the network’s receptionist — “welcome, here’s your IP, here’s the gateway, here’s the DNS server, please use them for the next few hours.”

Why DHCP Exists

Before DHCP, network admins maintained spreadsheets of IPs and assigned them by hand. Imagine doing that for a coffee shop with 50 phones connecting and disconnecting all day.

DHCP solves this by:

Automatically picking an unused IP from a pool.
Handing out network settings (gateway, DNS, subnet mask).
Reclaiming IPs when devices leave.

The DORA Handshake

DHCP works in four steps. We remember them as DORA.

Client (no IP yet)                    DHCP Server
       │                                    │
       │── 1. DISCOVER (broadcast) ────────>│   "Anyone got an IP for me?"
       │                                    │
       │<── 2. OFFER (broadcast) ───────────│   "Sure, take 192.168.1.42"
       │                                    │
       │── 3. REQUEST (broadcast) ─────────>│   "I'll take 192.168.1.42"
       │                                    │
       │<── 4. ACK (broadcast) ─────────────│   "Confirmed, here's the lease"
       │                                    │

Discover — the client has no IP, so it broadcasts to 255.255.255.255: “Hey any DHCP server, I need an address.” Source IP is 0.0.0.0.

Offer — one or more DHCP servers respond with a candidate IP and lease details.

Request — the client picks one offer (usually the first) and broadcasts its choice. The broadcast tells the other servers their offer was rejected so they can put the IP back in the pool.

Acknowledge — the chosen server confirms and the client commits the IP, gateway, DNS, and lease time.

What Else DHCP Hands Out

Besides the IP, a DHCP response usually includes:

Subnet mask — 255.255.255.0
Default gateway — the router’s IP, e.g. 192.168.1.1
DNS servers — e.g. 1.1.1.1, 8.8.8.8
Lease time — how long the IP is ours
NTP server (optional) — for time sync

Lease & Renewal

The IP is ours for a lease period (often 24 hours on home routers, shorter on busy networks).

At 50% of lease time — the client tries to renew with the same server (unicast request).
At 87.5% — if no answer yet, it broadcasts to any server (REBINDING).
If the lease fully expires — the client gives up the IP and starts DORA again.

This is why we sometimes see brief Wi-Fi hiccups when a lease expires and the device hasn’t renewed in time.

Inspect & Trigger DHCP

# Linux: see current lease
cat /var/lib/dhcp/dhclient.leases

# Linux: release & renew
sudo dhclient -r       # release
sudo dhclient          # request a new lease

# macOS: renew lease
sudo ipconfig set en0 BOOTP
sudo ipconfig set en0 DHCP

# Windows
ipconfig /release
ipconfig /renew

Interview Tip

Just remember DORA and that it’s all over UDP ports 67 (server) and 68 (client). Bonus: mention that DHCP uses broadcasts because the client has no IP yet, so it can’t direct traffic at a specific server.

References

SMTP, IMAP & POP3

beginner smtp imap pop3 email protocols

Email is older than the modern web, and it shows. We have three different protocols for what feels like one task.

In simple language: SMTP sends, IMAP and POP3 receive. Sender uses SMTP. The recipient’s mail client uses IMAP or POP3 to pull messages from their mailbox.

SMTP — Simple Mail Transfer Protocol

SMTP is the postman. It carries mail from our client to our outgoing mail server, and between mail servers across the internet.

Port 25 — server-to-server SMTP. Often blocked by ISPs to fight spam.
Port 587 — submission port for clients to send via their own server. Uses STARTTLS for encryption.
Port 465 — implicit TLS. Originally deprecated, now back in use.

# What an SMTP conversation looks like (simplified)
> HELO client.example.com
< 250 Hello
> MAIL FROM:<manish@example.com>
< 250 OK
> RCPT TO:<friend@gmail.com>
< 250 OK
> DATA
< 354 Send message
> Subject: Hi
>
> Hello friend.
> .
< 250 Message accepted
> QUIT

SMTP is push-based. The sending server contacts the receiving server (looked up via the recipient’s MX record) and pushes the message.

IMAP — Internet Message Access Protocol

IMAP keeps mail on the server. Our client (Gmail web, Apple Mail, Outlook) just shows what’s there.

Port 143 — plain.
Port 993 — IMAP over TLS.

Why IMAP wins for most people:

Same inbox on phone, laptop, web — read on one, marked as read everywhere.
Folders, flags, drafts all live on the server.
Server keeps the mail safe; client is just a viewer.

POP3 — Post Office Protocol v3

POP3 downloads mail to one device and (by default) deletes it from the server.

Port 110 — plain.
Port 995 — POP3 over TLS.

POP3 made sense when:

Storage on the server was expensive.
We had one PC and a slow dial-up line.
We didn’t need to read mail on multiple devices.

Today it’s mostly a fallback option. Most people should use IMAP.

IMAP vs POP3 — Quick Pick

Use IMAP if we read mail on more than one device.
Use IMAP if we want server-side folders and search.
Use POP3 only if we want to keep all mail locally on a single machine and the server has tiny storage.

How These Fit Together

                  ┌──────────────┐
   Manish writes  │   SMTP 587   │
   email ────────>│  (his mail   │
                  │   server)    │
                  └──────┬───────┘
                         │ SMTP 25
                         ▼
                  ┌──────────────┐
                  │  Recipient's │
                  │ mail server  │
                  └──────┬───────┘
                         │
            ┌────────────┴────────────┐
            ▼                         ▼
    IMAP 993 (sync)           POP3 995 (download)
            │                         │
            ▼                         ▼
       Friend's phone           Friend's old laptop

Modern Reality

Most people don’t touch these protocols directly anymore — Gmail, Outlook web, and similar use HTTPS-based APIs internally. But IMAP and SMTP are still what desktop mail clients (Thunderbird, Apple Mail) speak under the hood.

Interview Tip

Keep it short: SMTP = send, IMAP = sync from server, POP3 = download and delete. Remember the secure ports (587, 993, 995) because interviewers love port numbers.

References

FTP & SFTP

beginner ftp sftp ftps ssh file-transfer

FTP (File Transfer Protocol) is one of the oldest internet protocols — older than HTTP. SFTP is the modern, secure replacement that runs over SSH.

In simple language: FTP moves files but sends everything (including passwords) in plaintext. SFTP does the same job inside an encrypted SSH tunnel.

FTP Basics

Port 21 — control connection (commands).
Port 20 — data connection (file contents) in active mode.

FTP is unusual because it uses two TCP connections: one for commands, one for actual file data.

Active vs Passive Mode

This is the part FTP gets famous for being confusing.

Active mode:

Client connects to server’s port 21 (control).
Client tells server “send data to my port X.”
Server initiates a new connection from its port 20 to the client’s port X.

Problem: most clients are behind NAT/firewalls that block incoming connections. The server can’t reach the client. Active mode breaks.

Passive mode (PASV):

Client connects to server’s port 21.
Client says “PASV — you tell me where to connect.”
Server opens a random data port and replies with it.
Client makes the data connection outbound.

Passive mode works through NAT because the client initiates both connections. This is the default in modern FTP clients.

ACTIVE                              PASSIVE
Client ──cmd──> Server:21           Client ──cmd──> Server:21
Client <──data── Server:20          Client ──data──> Server:randomPort
(server initiates → blocked by NAT) (client initiates → NAT-friendly)

Why FTP Is Insecure

Everything goes in plaintext on the wire:

Username and password.
File contents.
Directory listings.

Anyone sniffing the network sees it all. There’s no encryption, no integrity check. Don’t use plain FTP in 2026.

FTPS vs SFTP — Don’t Confuse Them

Both are “secure FTP,” but they’re completely different.

FTPS — old FTP wrapped in TLS. Same dual-connection mess, just encrypted. Two flavors: implicit (port 990) and explicit (port 21 with STARTTLS).
SFTP — a totally different protocol that runs as a subsystem of SSH on port 22. Single connection, encrypted by default, no active/passive nonsense.

If we have a choice, pick SFTP. Simpler, more reliable, and the SSH ecosystem (keys, agents, port forwarding) just works.

Using SFTP

If we already have SSH access to a server, we already have SFTP.

# Interactive session
sftp manish@server.example.com

# Inside the session:
sftp> ls
sftp> cd /var/www
sftp> put localfile.txt
sftp> get remotefile.txt
sftp> bye

# One-shot copy (uses SCP under the hood, similar idea)
scp local.txt manish@server.example.com:/tmp/
rsync -avz local-dir/ manish@server.example.com:/var/www/

# GUI clients: FileZilla, Cyberduck, Transmit — all speak SFTP

SFTP supports key-based auth, which means we can sync files without typing passwords.

When We’d Still See FTP

Legacy systems and old hosting providers.
Anonymous FTP for public software mirrors (e.g., older Linux distro mirrors).
Industrial equipment that hasn’t been updated since 2003.

For everything else, SFTP, HTTPS uploads, or object storage (S3) have replaced it.

Interview Tip

The trap question is “what’s the difference between FTPS and SFTP?” Be ready: FTPS = FTP + TLS, SFTP = file transfer over SSH. Different protocols, different ports (990/21 vs 22), different design.

References

SSH

intermediate ssh encryption keys port-forwarding remote

SSH (Secure Shell) is the encrypted remote login we use every day to manage servers. It runs on port 22 and replaces older insecure tools like Telnet, rlogin, and rsh.

In simple language: SSH gives us a secure terminal into a remote machine — and a Swiss army knife for tunneling traffic.

What SSH Gives Us

Encrypted remote shell.
Strong authentication (passwords or, much better, key pairs).
Secure file transfer (scp, sftp).
Port forwarding to tunnel arbitrary TCP traffic.

Key-Based Authentication

Instead of typing a password every time, we generate a key pair:

Private key — stays on our laptop (~/.ssh/id_ed25519). Never share it.
Public key — copied to the server’s ~/.ssh/authorized_keys.

When we connect, the server challenges us to prove we hold the private key. Math happens. We’re in.

# Generate a modern key pair
ssh-keygen -t ed25519 -C "manish@laptop"

# Copy our public key to the server
ssh-copy-id manish@server.example.com

# Now connect — no password prompt
ssh manish@server.example.com

Why this is better than passwords:

Brute-forcing a key is mathematically infeasible.
We can revoke a key by removing one line from authorized_keys.
Works great with ssh-agent so we type the passphrase once per session.

known_hosts — Trust on First Use

The first time we connect to a new server, SSH shows the server’s host key fingerprint and asks if we trust it. If we say yes, it’s saved to ~/.ssh/known_hosts.

Next time we connect, SSH compares the offered key with what’s on file. If it doesn’t match, we get a scary warning:

WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!

This protects us from man-in-the-middle attacks. If we genuinely changed servers, we remove the old entry:

ssh-keygen -R server.example.com

Port Forwarding (Tunneling)

This is SSH’s superpower. We can tunnel arbitrary TCP traffic through the SSH connection.

Local forwarding (`-L`)

Forwards a port on our machine to something reachable from the server.

# Access a database on the server that's only listening on localhost
ssh -L 5432:localhost:5432 manish@server.example.com
# Now we connect to localhost:5432 on our laptop and hit Postgres on the server

Remote forwarding (`-R`)

Exposes a port on the server that points back to our machine.

# Make our laptop's web app on :3000 accessible on the server's :8080
ssh -R 8080:localhost:3000 manish@server.example.com

Dynamic forwarding (`-D`)

Turns SSH into a SOCKS proxy.

# Browse the web through the server
ssh -D 1080 manish@server.example.com
# Then point our browser at SOCKS5 proxy localhost:1080

Common Commands

# Basic login
ssh manish@server.example.com

# Use a non-default port
ssh -p 2222 manish@server.example.com

# Specify a key
ssh -i ~/.ssh/work_key manish@server.example.com

# Run a single command and exit
ssh manish@server.example.com "df -h"

# Copy a file to the server
scp local.txt manish@server.example.com:/tmp/

# Copy a directory recursively
scp -r ./build manish@server.example.com:/var/www/

# Use the ~/.ssh/config file for shortcuts
cat ~/.ssh/config
# Host prod
#   HostName server.example.com
#   User manish
#   Port 2222
#   IdentityFile ~/.ssh/prod_key

# Now we just type:
ssh prod

Interview Tip

If asked “how does SSH stay secure,” explain three layers:

Transport — encrypted channel via Diffie-Hellman key exchange.
Authentication — host key (server proves itself), then user key/password (we prove ourselves).
Channels — multiple sessions multiplexed over the same encrypted connection (shell + sftp + forwarding all together).

References

Web & Real-Time Communication

REST API Networking

intermediate rest api idempotency caching etag http

REST is more a style than a protocol — but the way it uses HTTP creates real network-level behaviors we should know: statelessness, idempotency, and caching.

In simple language: each REST request stands alone (no server-side session needed), some methods are safe to retry, and clever headers let us skip downloading data we already have.

Statelessness

Every REST request must contain everything the server needs to handle it. No “remember what I asked last time” — the server has no per-client memory.

Why we care:

Any server can handle any request → trivial horizontal scaling.
Load balancers don’t need sticky sessions.
Easier to cache, easier to retry.

If we need state (logged-in user), we send it on each request — usually as a token in the Authorization header or a cookie.

Idempotency

A method is idempotent if calling it once or 100 times has the same effect on the server. Crucial for retries on flaky networks.

Method	Safe?	Idempotent?	Notes
GET	yes	yes	Read-only
HEAD	yes	yes	Same as GET, no body
OPTIONS	yes	yes	Used for CORS preflight
PUT	no	yes	Replaces resource — same input → same end state
DELETE	no	yes	Already gone after the first call
POST	no	no	Creating a new resource each time
PATCH	no	not always	Depends on the patch (`{x: 5}` yes, `{x: x+1}` no)

Why this matters: if our HTTP client retries on timeout, it should only retry idempotent methods automatically. Retrying a POST might create two orders.

To make POST safer, we use an Idempotency-Key header (Stripe-style):

POST /payments HTTP/1.1
Idempotency-Key: 7c8a4b9e-1234-...
Content-Type: application/json

{"amount": 1000}

Server caches the result keyed by that ID and returns the same response if we retry.

Caching Headers

HTTP caching can save us from re-fetching unchanged data.

`Cache-Control`

Sets the cacheability rules.

Cache-Control: public, max-age=3600     # cache for 1 hour, anyone can cache
Cache-Control: private, max-age=60      # only the browser, not CDNs
Cache-Control: no-store                 # don't cache at all (sensitive data)
Cache-Control: no-cache                 # cache, but always revalidate

`ETag` and `If-None-Match`

A short fingerprint of the response. The browser sends it back to ask “still the same?”

`Last-Modified` and `If-Modified-Since`

Same idea but using a timestamp.

The 304 Not Modified Flow

1st request:
  GET /users/42
  → 200 OK
    ETag: "abc123"
    Cache-Control: max-age=60
    {body...}

2nd request (after max-age expires):
  GET /users/42
  If-None-Match: "abc123"
  → 304 Not Modified
    (no body — browser reuses cached copy)

The 304 response is tiny — just headers, no body. Saves bandwidth and time when data hasn’t changed.

// Express example serving an ETag
app.get("/users/:id", (req, res) => {
  const user = getUser(req.params.id);
  const etag = hash(user); // short fingerprint
  res.set("ETag", etag);

  // If client already has this version, send 304
  if (req.headers["if-none-match"] === etag) {
    return res.status(304).end();
  }
  res.json(user);
});

REST vs RPC On the Wire

Quick contrast:

REST — resource-centric URLs (/users/42), HTTP verbs as actions, leverages HTTP caching and status codes natively.
RPC (gRPC, JSON-RPC) — function-centric (/UserService/GetUser), usually a single verb (POST), bypasses HTTP semantics.

REST plays nicer with browsers, CDNs, and tooling because it speaks HTTP fluently. RPC tends to be more efficient and strongly typed but gives up cache-friendliness.

Interview Tip

When asked “is POST idempotent,” the right answer is “no by default, but we make it idempotent at the application level using an idempotency key.” Understanding that nuance separates juniors from mids.

References

WebSockets

intermediate websockets real-time full-duplex upgrade ws

WebSockets give us a persistent, full-duplex channel between browser and server. Once the connection is open, either side can send data anytime — no request needed.

In simple language: HTTP is like sending letters, WebSockets is like opening a phone line.

Why Not Just Use HTTP?

HTTP is request/response. The client asks, the server replies. The server can’t push data on its own. Workarounds (polling, long polling) are wasteful or laggy.

For chat apps, live dashboards, multiplayer games, collaborative editors — we need real two-way communication. That’s WebSockets.

The HTTP Upgrade Handshake

A WebSocket starts life as a regular HTTP request that asks to “upgrade” the connection.

Client request:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

Server response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The 101 Switching Protocols status is the moment HTTP steps aside. After this, the same TCP socket carries WebSocket frames instead of HTTP.

URLs use ws:// (plain) or wss:// (over TLS). Use wss:// in production — it gives the same security as HTTPS.

Frames, Not Requests

After upgrade, data flows as small frames. Each frame has:

An opcode (text, binary, close, ping, pong).
A payload length.
A masking key (client → server frames are masked).
The payload itself.

This is much lighter than HTTP — no headers per message, no method line, no status code.

Ping / Pong (Keepalive)

WebSockets have a built-in heartbeat. The server can send a ping frame; the client must reply with a pong. This:

Detects dead connections (router rebooted, NAT entry expired).
Keeps middleboxes from closing idle connections.

Most libraries do this automatically every 30-60 seconds.

Client Example

// Open a WebSocket
const ws = new WebSocket("wss://chat.example.com/room/42");

ws.addEventListener("open", () => {
  console.log("connected");
  ws.send(JSON.stringify({ type: "join", user: "manish" }));
});

ws.addEventListener("message", (event) => {
  const msg = JSON.parse(event.data);
  console.log("from server:", msg);
});

ws.addEventListener("close", (event) => {
  console.log("closed", event.code, event.reason);
  // Reconnect logic goes here — WebSockets don't auto-reconnect
});

ws.addEventListener("error", (err) => {
  console.error("ws error", err);
});

// Send any time
function sendChat(text) {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "chat", text }));
  }
}

Server Example (Node)

import { WebSocketServer } from "ws";

const wss = new WebSocketServer({ port: 8080 });

wss.on("connection", (socket) => {
  socket.send(JSON.stringify({ type: "welcome" }));

  socket.on("message", (raw) => {
    const msg = JSON.parse(raw);
    // Broadcast to everyone
    for (const client of wss.clients) {
      if (client.readyState === 1) client.send(raw);
    }
  });
});

When to Use What

WebSockets — bidirectional and frequent (chat, games, live editing).
SSE (Server-Sent Events) — server → client only, simpler, auto-reconnect.
Long polling — fallback for environments where WebSockets are blocked.
Short polling — quick & dirty, low traffic, no infra changes.

If only the server pushes updates, prefer SSE. If both sides talk, go WebSockets.

Common Gotchas

No auto-reconnect — write reconnection logic with exponential backoff.
Behind proxies — older proxies/load balancers may not handle the Upgrade. Modern ones (nginx, Caddy, Cloudflare) do, but check.
Authentication — we can’t easily add custom headers in browsers. Common pattern: pass a short-lived token in the URL query string and validate on connect.
Scaling — connections are sticky. Scaling to N servers requires a pub/sub layer (Redis, NATS) so messages reach clients connected to other nodes.

Interview Tip

The key concept is the Upgrade handshake: WebSockets start as an HTTP request and switch protocols mid-flight. Mention 101 Switching Protocols and you’ve shown you actually understand the wire-level handoff.

References

Server-Sent Events (SSE)

intermediate sse real-time eventsource streaming http

Server-Sent Events (SSE) is a one-way streaming channel from server to client over plain HTTP. The server sends events whenever it wants; the client just listens.

In simple language: SSE is a pipe the server keeps writing to. The browser keeps reading. No frames, no upgrade dance — just HTTP that doesn’t end.

How It Works

The client makes a normal GET request. The server replies with Content-Type: text/event-stream and never closes the connection. It sends data in a simple text format:

data: hello

data: how are you?

event: chat
data: {"user":"manish","text":"hi"}

Each event ends with a blank line. That’s the entire wire format.

When SSE Fits

Server pushes updates to clients (notifications, stock tickers, log tails).
We don’t need the client to send messages back through the same channel.
We want auto-reconnect for free.
We want to get through corporate proxies that block WebSockets.

If we need bidirectional communication, use WebSockets instead.

Server Example (Node)

import express from "express";

const app = express();

app.get("/events", (req, res) => {
  // Required SSE headers
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");
  res.flushHeaders(); // send headers immediately

  // Send an event every 2 seconds
  const interval = setInterval(() => {
    const payload = { time: new Date().toISOString() };
    res.write(`event: tick\n`);
    res.write(`data: ${JSON.stringify(payload)}\n\n`);
  }, 2000);

  // Clean up when client disconnects
  req.on("close", () => clearInterval(interval));
});

app.listen(3000);

Notice each message ends with \n\n — that’s the event boundary.

Client Example (EventSource)

const events = new EventSource("/events");

// Default "message" channel
events.addEventListener("message", (e) => {
  console.log("msg:", e.data);
});

// Custom event names (we used "tick" on the server)
events.addEventListener("tick", (e) => {
  const data = JSON.parse(e.data);
  console.log("tick at", data.time);
});

// Browser auto-reconnects on disconnect
events.addEventListener("error", (err) => {
  console.warn("connection lost, browser will retry...");
});

// Manually close when we don't need it anymore
// events.close();

The browser’s EventSource API handles reconnection automatically. No backoff logic to write ourselves.

Reconnection & Last-Event-ID

SSE has a built-in resume mechanism. The server can send an id: field with each event:

id: 42
data: {"text":"hello"}

If the connection drops, the browser reconnects and sends Last-Event-ID: 42 header. The server can then resume from where we left off.

SSE vs WebSockets vs Long Polling

Feature	SSE	WebSockets	Long Polling
Direction	Server → client	Both ways	Both ways
Protocol	HTTP	ws/wss	HTTP
Auto-reconnect	Yes (built-in)	No	Per-request
Binary	No (text only)	Yes	Yes
Proxy-friendly	Very	Mostly	Yes
Setup complexity	Trivial	Moderate	Moderate

Common Gotchas

Connection limits — browsers cap concurrent connections per origin (often 6 over HTTP/1.1). On HTTP/2 this isn’t a problem.
No binary — SSE is text-only. For binary, base64-encode it or use WebSockets.
Buffering — make sure the response isn’t being buffered by a reverse proxy. With nginx, set X-Accel-Buffering: no or proxy_buffering off.
Timeout — some proxies drop idle connections after 60s. Send a keepalive comment (: ping\n\n) periodically.

Interview Tip

Best one-liner: “SSE is a long-lived HTTP response with Content-Type: text/event-stream.” If we say that and mention auto-reconnect via Last-Event-ID, we’ve covered the essentials.

References

Long Polling vs Short Polling

intermediate polling long-polling real-time http latency

Before WebSockets and SSE, the only way to “push” data from server to client over HTTP was polling. We still use polling today for fallbacks, simple cases, and quick prototypes.

In simple language: short polling = “anything new yet?” every X seconds. Long polling = “tell me when something’s new, I’ll wait.”

Short Polling

The client sends a request on a fixed interval. The server replies immediately with whatever it has (often an empty list).

// Client
async function pollMessages() {
  const res = await fetch("/messages?since=" + lastId);
  const data = await res.json();
  if (data.length) renderMessages(data);
  setTimeout(pollMessages, 5000); // ask again in 5s
}
pollMessages();

Pros:

Dead simple. Just plain HTTP.
Works through any proxy or firewall.
Easy to scale — every request is independent.

Cons:

Wastes bandwidth — most requests return nothing.
Latency — new data sits up to interval seconds before the client sees it.
More requests = more load on the server, even when nothing’s happening.

Long Polling

The client sends a request, but the server doesn’t reply until it has data (or a timeout hits, e.g. 30s). When the client gets a response, it immediately sends another request.

// Client
async function longPoll() {
  try {
    const res = await fetch("/messages?since=" + lastId);
    const data = await res.json();
    if (data.length) {
      renderMessages(data);
      lastId = data[data.length - 1].id;
    }
  } catch (e) {
    await new Promise(r => setTimeout(r, 1000)); // backoff on error
  }
  longPoll(); // immediately reconnect
}
longPoll();

// Server (Express, simplified)
app.get("/messages", async (req, res) => {
  const since = parseInt(req.query.since || "0");
  // Wait for new data, or timeout after 30s
  const messages = await waitForMessagesSince(since, 30_000);
  res.json(messages); // returns [] on timeout
});

Pros:

Near real-time — the server replies the instant new data arrives.
Far fewer empty responses than short polling.

Cons:

The server has to hold open many connections simultaneously. Needs an async/event-loop server (Node, Go, async Python). Not great for thread-per-request servers.
Still has per-request overhead (headers, TLS handshake if no keep-alive).
Reconnection adds a tiny race window where messages could land between requests — protocols use since=lastId cursors to avoid losing them.

Trade-Off Summary

Aspect	Short polling	Long polling
Latency	Up to interval	Near real-time
Server load	Lots of empty replies	Fewer requests, more held connections
Server type	Any	Async (Node, Go, etc.)
Bandwidth	Wasted on empty responses	Efficient when idle
Complexity	Trivial	Slight (timeout + cursor)

When to Pick What

Short polling — low-frequency updates (1 per minute is fine), or when we just need something working in 10 lines of code.
Long polling — chat-like scenarios where we need fast updates but can’t use WebSockets (legacy networks, restrictive proxies).
WebSockets / SSE — anything serious in 2026. Polling is mostly a fallback now.

Common Gotchas

Don’t poll faster than necessary. A 1-second short poll on a busy app crushes the server. Match the interval to how stale the data can be.
Backoff on errors. If the server is down, polling every second from 100k clients amounts to a self-DDoS.
Use a cursor. Always tell the server what we last saw (since=42) so it doesn’t resend the same data.

Interview Tip

If asked “how would we build a chat app without WebSockets,” walk through long polling. Mention the timeout, the cursor, and the need for an async server. Bonus: explain why long polling is harder to scale — each waiting client holds a TCP connection.

References

gRPC & HTTP/2 Streams

advanced grpc http2 protobuf streams rpc

gRPC is Google’s RPC framework. It uses Protocol Buffers (Protobuf) for the message format and HTTP/2 as the transport.

In simple language: instead of POST /users with JSON, we call a typed function userService.GetUser(id) and get a typed response. The wire format is binary and small. The transport is HTTP/2 streams, so we get multiplexing for free.

Why It Exists

REST + JSON works great for browsers. For service-to-service traffic in a microservices system, it has problems:

JSON is bulky and slow to parse.
Schemas are documented in wikis, not enforced.
Streaming is awkward.

gRPC fixes all three: tight binary encoding, schemas in .proto files, native streaming.

Protobuf Schema

We define services and messages in a .proto file:

syntax = "proto3";

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (stream User);
  rpc UploadAvatars(stream AvatarChunk) returns (UploadResult);
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

message GetUserRequest {
  int64 id = 1;
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

The protoc compiler generates client and server stubs in our language of choice. Both sides are statically typed against the same schema.

The Four Call Types

gRPC supports four kinds of RPCs, all built on HTTP/2 streams.

1. Unary
   client ──req──> server
   client <──res── server
   (Like a normal function call)

2. Server streaming
   client ──req──> server
   client <──res── server
   client <──res── server
   client <──res── server
   (Server sends many responses for one request)

3. Client streaming
   client ──req──> server
   client ──req──> server
   client ──req──> server
   client <──res── server
   (Client sends many requests, gets one response)

4. Bidirectional streaming
   client ──msg──> server
   client <──msg── server
   client ──msg──> server
   client <──msg── server
   (Both ends send independently)

Each direction maps to one HTTP/2 stream. Bidi streams are full-duplex.

HTTP/2 Streams as the Substrate

A gRPC call is just an HTTP/2 stream with a specific encoding:

Method is always POST.
Path is /<service>/<method> (e.g., /UserService/GetUser).
Body is length-prefixed Protobuf frames.
Trailers carry the gRPC status code.

Because HTTP/2 multiplexes streams over a single TCP connection, hundreds of concurrent gRPC calls share one connection — no head-of-line blocking at the application layer.

gRPC vs REST

Aspect	gRPC	REST + JSON
Wire format	Protobuf (binary)	JSON (text)
Schema	Enforced via `.proto`	Optional (OpenAPI)
Streaming	First-class, four flavors	Workarounds (SSE, WS)
Browser support	Needs gRPC-Web proxy	Native
Caching by CDNs	Hard (POST, binary body)	Native (GET, ETag)
Tooling for humans	Needs grpcurl/Bloomrpc	curl, browser, Postman
Transport	HTTP/2 only	Any HTTP
Speed	Faster (binary, multiplexed)	Slower

Rule of thumb: gRPC for backend-to-backend, REST/GraphQL for client-to-backend.

A Quick Server (Node)

import grpc from "@grpc/grpc-js";
import protoLoader from "@grpc/proto-loader";

const pkg = grpc.loadPackageDefinition(
  protoLoader.loadSync("user.proto")
);

const server = new grpc.Server();
server.addService(pkg.UserService.service, {
  GetUser: (call, callback) => {
    callback(null, { id: call.request.id, name: "Manish", email: "m@x.com" });
  },
  ListUsers: (call) => {
    // Server streaming
    for (const user of getAllUsers()) call.write(user);
    call.end();
  },
});

server.bindAsync("0.0.0.0:50051", grpc.ServerCredentials.createInsecure(), () => {});

Common Gotchas

Browser support. Browsers can’t speak raw gRPC. We need gRPC-Web (an Envoy/Connect proxy translates).
Load balancing. Standard L4 load balancers can’t see individual streams. Use L7 LBs (Envoy, Linkerd, gRPC-aware ingress).
Debugging. No curl magic. Use grpcurl, server reflection, or generated clients.
Versioning. Protobuf field numbers are forever. Adding fields is safe; removing/renumbering breaks old clients.

Interview Tip

If asked “why would we use gRPC over REST,” lead with three reasons: schema enforcement, performance (binary + HTTP/2 multiplexing), and first-class streaming. Mention that it’s not a great fit for browser clients without gRPC-Web — that nuance shows depth.

References

CORS Deep Dive

intermediate cors security same-origin preflight browser

CORS (Cross-Origin Resource Sharing) is the browser mechanism that decides whether JavaScript on origin A is allowed to read responses from origin B.

In simple language: by default the browser blocks cross-origin reads. CORS is the server saying “yes, this origin is allowed.”

Same-Origin Policy

Two URLs are the same origin if all three match:

Scheme (https)
Host (api.example.com)
Port (443)

URL A	URL B	Same origin?
`https://example.com`	`https://example.com/path`	yes
`https://example.com`	`http://example.com`	no (scheme)
`https://example.com`	`https://api.example.com`	no (host)
`https://example.com:443`	`https://example.com:8443`	no (port)

The same-origin policy is enforced by the browser, not the server. From a server’s view, a request from app.example.com looks identical to one from evil.com. The browser is the one refusing to expose the response to JS unless CORS headers say it’s okay.

Important: this only applies to JavaScript-initiated requests. <img>, <script>, <link> tags can load cross-origin resources freely (just JS can’t read their bytes).

Simple vs Preflight Requests

CORS splits requests into two buckets.

Simple requests

Sent directly. The browser checks the response headers afterwards to decide whether to expose it to JS. A request is “simple” only if all of these are true:

Method is GET, HEAD, or POST.
Headers are limited to a small CORS-safe list (Accept, Accept-Language, Content-Language, Content-Type).
Content-Type is one of text/plain, application/x-www-form-urlencoded, multipart/form-data.

Preflighted requests

For anything else (custom headers, PUT/DELETE, JSON Content-Type), the browser sends an OPTIONS preflight first to ask for permission.

Browser                                  Server
   │                                        │
   │── OPTIONS /api/users ─────────────────>│   "Can I send a PUT with
   │   Origin: https://app.example.com      │    Authorization header?"
   │   Access-Control-Request-Method: PUT   │
   │   Access-Control-Request-Headers: ...  │
   │                                        │
   │<── 204 No Content ─────────────────────│   "Yes, allowed"
   │   Access-Control-Allow-Origin: ...     │
   │   Access-Control-Allow-Methods: ...    │
   │                                        │
   │── PUT /api/users ─────────────────────>│   actual request
   │<── 200 OK ─────────────────────────────│

The preflight is the OPTIONS round-trip. If it fails, the real request is never sent.

Key Response Headers

Access-Control-Allow-Origin — the origin that’s allowed. Either an exact match (https://app.example.com) or *. With credentials, * is forbidden.
Access-Control-Allow-Methods — methods the server accepts (GET, POST, PUT, DELETE).
Access-Control-Allow-Headers — headers the client may send (Authorization, Content-Type, X-Request-Id).
Access-Control-Allow-Credentials — true if cookies / Authorization should be sent on cross-origin requests.
Access-Control-Max-Age — how long the browser may cache the preflight result (in seconds). High values mean fewer OPTIONS calls.
Access-Control-Expose-Headers — extra response headers that JS is allowed to read (default exposed list is tiny).

A Working Server (Express)

import express from "express";
const app = express();

app.use((req, res, next) => {
  const origin = req.headers.origin;
  const allowed = ["https://app.example.com", "https://admin.example.com"];

  if (allowed.includes(origin)) {
    res.setHeader("Access-Control-Allow-Origin", origin);
    res.setHeader("Access-Control-Allow-Credentials", "true");
    res.setHeader("Vary", "Origin"); // important for caches
  }
  res.setHeader("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS");
  res.setHeader("Access-Control-Allow-Headers", "Authorization, Content-Type");
  res.setHeader("Access-Control-Max-Age", "86400");

  if (req.method === "OPTIONS") return res.status(204).end();
  next();
});

Common Gotchas

* with credentials is illegal. If we want cookies/Authorization, we must echo back a specific origin. Use a whitelist.
Always Vary: Origin when origin is dynamic. Otherwise CDNs cache one origin’s allow header and serve it to everyone.
CORS errors are not the server’s status code. A 403 from the API is a real response. A CORS failure shows up only in the browser console — the network tab might show “(failed) net::ERR_FAILED” or a successful OPTIONS followed by a blocked main request.
Authorization header triggers preflight. Adding a JWT to a GET makes it non-simple.
fetch with credentials: "include" is the only way cookies cross origins. same-origin (default) won’t send them.
Server didn’t break — the browser blocked it. The server processed the request normally and may have logged it. CORS only stops JS from reading the response.

Interview Tip

If asked “explain CORS in one minute”: (1) browsers enforce the same-origin policy, (2) servers opt in via Access-Control-* headers, (3) “non-simple” requests get a preflight OPTIONS first. Bonus points for naming the four parts that decide simple vs preflight (method, headers, content-type, no streams), and for mentioning that CORS protects users from malicious sites — not servers from malicious clients.

References

Network Security

SSL/TLS Handshake

intermediate tls ssl handshake https security encryption

The TLS handshake is how a client and a server agree on a shared encryption key before they start sending real data. It happens at the very start of every HTTPS connection.

In simple language: before sending our password to a website, our browser and the server first do a quick “hello, here’s my ID, here’s how we’ll encrypt things” dance. That dance is the handshake.

Why It Matters

Without TLS, data goes over the wire in plaintext. Anyone on the same WiFi can read it. TLS solves three problems at once:

Encryption — nobody in the middle can read the bytes.
Authentication — we know we’re really talking to bank.com, not a fake.
Integrity — nobody tampered with the bytes in transit.

The handshake sets all of this up.

TLS 1.2 — The Full Handshake (2-RTT)

This is the classic flow. Two round trips before any application data flows.

ClientHello — client says: “Hi, I support these cipher suites and TLS versions, here’s a random number.”
ServerHello + Certificate + ServerKeyExchange + ServerHelloDone — server picks a cipher, sends its X.509 certificate, sends its key exchange params, says “your turn.”
ClientKeyExchange + ChangeCipherSpec + Finished — client verifies the cert, generates the pre-master secret, encrypts it with the server’s public key, switches to encrypted mode, sends a Finished message.
ChangeCipherSpec + Finished — server decrypts the secret, derives the same session keys, switches to encrypted mode, sends its own Finished.

Now both sides have the same symmetric session key. Real data starts flowing.

TLS 1.3 — The Modern Handshake (1-RTT)

TLS 1.3 (RFC 8446, 2018) cuts this in half. Cipher suite negotiation is simpler. Key share is sent in the very first message.

ClientHello (with key share + cipher suites) — client sends everything it can in round one.
ServerHello + EncryptedExtensions + Certificate + Finished — server picks the cipher, sends its cert, derives the keys, and sends Finished. From this point onward, even the certificate is encrypted.
Finished — client sends its Finished. Done.

One round trip. And with 0-RTT resumption, returning visitors can send data with the very first packet.

Side by Side

TLS 1.2 (2-RTT)

Client → ClientHello

supported ciphers, random

Server → ServerHello, Cert, KeyEx, Done

Client → KeyEx, ChangeCipher, Finished

Server → ChangeCipher, Finished

App data flows

TLS 1.3 (1-RTT)

Client → ClientHello + KeyShare

all cipher info upfront

Server → ServerHello, {Cert, Finished}

cert is already encrypted

Client → {Finished}

App data flows

0-RTT possible on resumption

Inspect a Real Handshake

# See every step of the handshake against a server
openssl s_client -connect google.com:443 -tls1_3

# Just the cert chain
openssl s_client -connect google.com:443 -showcerts < /dev/null

Why HTTPS Is Fast Now

HTTPS used to have a bad reputation for being slow. Three things changed that:

TLS 1.3 cut the handshake from 2-RTT to 1-RTT.
Session resumption (PSK) skips the handshake entirely on reconnects.
HTTP/2 and HTTP/3 multiplex many requests over one TLS session, so we pay the handshake cost once.

In practice, the TLS overhead on a modern site is a few milliseconds at most.

Common Gotcha

People often confuse the handshake with the ongoing encryption. The handshake uses asymmetric crypto (slow) only to agree on a shared key. After that, all real traffic uses symmetric crypto (fast). We’ll cover that distinction in the next note.

References

Symmetric vs Asymmetric Encryption

intermediate encryption aes rsa ecc cryptography tls

Encryption comes in two flavors: symmetric (same key on both sides) and asymmetric (one public key, one private key). They solve different problems, and modern protocols like TLS use them together.

Symmetric Encryption

Both sides share one secret key. The same key encrypts and decrypts.

In simple language: it’s like a lock where the same key opens and closes it. Fast, simple, but how do we share the key in the first place without someone intercepting it?

Algorithms: AES (the standard), ChaCha20, 3DES (legacy)
Speed: Very fast. AES-NI hardware can do gigabytes per second.
Problem: Key distribution. We can’t email the key, anyone could read it.

# Encrypt a file with AES-256 using openssl
openssl enc -aes-256-cbc -salt -in secret.txt -out secret.enc
# Both sides need the same passphrase

Asymmetric Encryption

Each side has a key pair: a public key (share with anyone) and a private key (never share). Anything encrypted with the public key can only be decrypted with the matching private key.

In simple language: imagine a mailbox with a slot. Anyone can drop a letter in (encrypt with public key), but only the person with the key (private key) can open it.

Algorithms: RSA, ECC (elliptic curve), Diffie-Hellman, Ed25519
Speed: Much slower than symmetric. RSA-2048 is roughly 1000x slower than AES.
Solves: Key exchange and digital signatures. We never need to share a secret beforehand.

# Generate an RSA key pair
openssl genrsa -out private.pem 2048
openssl rsa -in private.pem -pubout -out public.pem

Side by Side

Symmetric

One shared key

Examples: AES, ChaCha20

Speed: Very fast

Key size: 128/256 bits

Used for: Bulk data encryption

Problem: How to share the key?

Asymmetric

Public + private key pair

Examples: RSA, ECC, Ed25519

Speed: Slow (~1000x slower)

Key size: 2048+ bits (RSA), 256 (ECC)

Used for: Key exchange, signatures

Solves: Key distribution problem

Why TLS Uses Both

This is the brilliant part. TLS combines them to get the best of both worlds:

Asymmetric crypto at the start of the connection — to safely exchange a session key. The server’s public key (in the certificate) is used to securely send the shared secret.
Symmetric crypto for the rest of the conversation — because once both sides have the shared key, AES is fast enough to handle gigabytes of data without slowing things down.

In simple language: we use the slow-but-safe lock to deliver the fast-but-naked-without-it lock. Then we throw away the slow one.

1. Client gets server's public key (from the certificate)
2. Client + server use it to negotiate a random AES key
3. The rest of the session uses AES (fast)

Digital Signatures (Asymmetric, Reversed)

Asymmetric crypto also enables signatures. We sign with our private key, anyone can verify with our public key. This is how certificates work — a CA signs a cert with its private key, browsers verify with the CA’s public key.

Interview Tip

If asked “is RSA symmetric or asymmetric?” — RSA is asymmetric. If asked “why don’t we use RSA for everything?” — because it’s slow and key sizes balloon. Always mention the hybrid approach in TLS — it’s the cleanest answer.

References

Certificates & PKI

intermediate certificates pki x509 ca tls https

A TLS certificate is a small file that proves “this public key really belongs to bank.com.” PKI (Public Key Infrastructure) is the whole system of trust that backs that proof up.

In simple language: when our browser connects to a site over HTTPS, the site shows a certificate like an ID card. The browser checks if the ID was signed by someone it trusts (a Certificate Authority).

X.509 — The Cert Format

Almost every TLS certificate in the world uses the X.509 standard. Inside a cert we find:

Subject — who the cert is for (CN=bank.com)
Issuer — who signed it (a CA like Let’s Encrypt)
Public key — the site’s public key
SANs (Subject Alternative Names) — extra domains the cert covers (*.bank.com, www.bank.com)
Validity period — Not Before / Not After dates
Signature — the CA’s signature over all of the above
Serial number, fingerprint — unique identifiers

# Inspect a certificate from a live server
openssl s_client -connect bank.com:443 -servername bank.com < /dev/null \
  | openssl x509 -text -noout

# Check just the dates and SANs
openssl x509 -in cert.pem -text -noout | grep -E "DNS:|Not"

CN vs SAN

The old way was to put one domain in the Common Name (CN). Modern browsers (Chrome since 2017) ignore CN and only look at the SAN field. So a cert without SAN entries is rejected, even if CN matches.

Always use SANs. CN is mostly cosmetic now.

The Chain of Trust

Browsers don’t trust every CA in the world directly. Instead they trust a small set of root CAs baked into the OS / browser trust store. Real certs are issued by intermediate CAs, which are signed by roots.

Root CA (in browser trust store)
   |
   v signs
Intermediate CA
   |
   v signs
bank.com leaf certificate

When the server sends its cert during the TLS handshake, it sends the leaf + intermediate chain. The browser climbs the chain until it hits a root it already trusts. If every signature checks out, the cert is valid.

Root CAs

Roots live in the OS or browser trust store. On macOS:

# List trusted root CAs
security find-certificate -a /System/Library/Keychains/SystemRootCertificates.keychain

If a root is compromised or misbehaves (Symantec, DigiNotar), browsers can revoke trust by removing it from the store.

Self-Signed vs CA-Signed

Self-signed: we generate our own cert and sign it ourselves. Browsers show a big scary warning because no public CA vouches for it. Fine for local dev (localhost), not for production.
CA-signed: signed by a public CA. Browsers trust it silently.

# Generate a self-signed cert for local dev
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem \
  -sha256 -days 365 -nodes -subj "/CN=localhost"

Let’s Encrypt

Let’s Encrypt is a free, automated, public CA. It uses the ACME protocol — a script proves we control the domain (HTTP-01 or DNS-01 challenge), and a cert is issued in seconds. Certs are valid for 90 days, so we set up auto-renewal.

# Get a cert with certbot
sudo certbot --nginx -d example.com -d www.example.com

This single tool is why HTTPS went from “expensive” to “default” in the late 2010s.

Revocation — OCSP and CRL

What if a private key leaks? The cert needs to be revoked before it expires.

CRL (Certificate Revocation List): a big list of revoked serial numbers. Browser downloads it and checks. Old, bandwidth-heavy.
OCSP (Online Certificate Status Protocol): browser asks the CA “is this cert still valid?” in real time. Faster, but adds a round trip and leaks browsing history to the CA.
OCSP Stapling: the server periodically fetches the OCSP response and staples it to the TLS handshake. Browser doesn’t need to call the CA. Best of both worlds.

In practice, modern browsers also use CRLite / CRLSets — pre-built lists pushed via browser updates.

Common Gotcha

Certs expire. A lot of outages come from “we forgot to renew the cert.” Always automate renewal (certbot, cert-manager on Kubernetes, ACM on AWS) and set up monitoring/alerts on Not After dates.

Interview Tip

When asked “how does HTTPS know it’s the real site?” — walk through: cert → signed by intermediate → signed by root → root is in browser trust store. That’s the chain of trust. Bonus points for mentioning SAN, OCSP stapling, and Let’s Encrypt.

References

Common Attacks (DDoS, MITM, Spoofing, Replay)

advanced security ddos mitm spoofing replay attacks

This is a tour of the most common network-layer attacks. We’ll cover what they do, how they work, and how we defend against each.

DDoS — Distributed Denial of Service

The attacker overwhelms our service with so much traffic that real users can’t get in. “Distributed” means traffic comes from thousands of compromised machines (a botnet), so we can’t just block one IP.

DDoS comes in three flavors based on which layer they target:

Volumetric (Layer 3/4)

Floods the pipe with raw bandwidth — UDP floods, ICMP floods, DNS amplification. Measured in Gbps or Tbps. The goal is to saturate the network link before traffic even reaches the server.

Defense: scrubbing services (Cloudflare, AWS Shield, Akamai), large upstream bandwidth, BGP blackholing.

Protocol (Layer 4)

Exploits how TCP/UDP works. SYN flood is the classic — attacker sends millions of TCP SYN packets but never completes the handshake. The server holds half-open connections until its socket table fills up.

Defense: SYN cookies (server doesn’t allocate state until the third handshake packet), tcp_syncookies=1 on Linux, connection rate limits.

Application (Layer 7)

Looks like real HTTP traffic but specifically targets expensive endpoints. A few thousand requests per second to /search?q=* can take a database down without using much bandwidth.

Defense: rate limiting, WAF rules, bot detection, CAPTCHA, caching.

MITM — Man in the Middle

Attacker sits between client and server, reading and possibly modifying traffic. Common on public WiFi.

How it works: the attacker tricks the victim into routing traffic through them — via ARP spoofing on a LAN, a fake WiFi access point, or a rogue cert.

Defense:

HTTPS everywhere — TLS authenticates the server via certificates, so a MITM can’t impersonate the real site without a valid cert.
HSTS — tells browsers “always use HTTPS for this domain, never accept HTTP.”
Certificate pinning — mobile apps pin specific cert fingerprints so even a compromised CA can’t issue a fake cert.
VPNs on untrusted networks.

ARP Spoofing

ARP (Address Resolution Protocol) maps IPs to MAC addresses on a LAN. There’s no authentication. An attacker on the same LAN sends fake ARP replies saying “I’m the router,” and traffic from victims now flows through them.

Defense: static ARP entries (impractical at scale), Dynamic ARP Inspection on managed switches, port security, monitoring tools like arpwatch.

DNS Spoofing / Cache Poisoning

Attacker tricks a DNS resolver into caching a fake mapping (bank.com -> attacker IP). Now every user of that resolver gets sent to the attacker’s server.

The classic technique was the Kaminsky attack (2008) — guessing transaction IDs to inject fake responses.

Defense:

DNSSEC — DNS responses are cryptographically signed by the zone owner.
DNS over HTTPS (DoH) / DNS over TLS (DoT) — encrypts the resolver path.
Random source ports + 0x20 encoding to make spoofing harder.

Replay Attack

Attacker captures a valid encrypted message (say, a payment authorization) and replays it later to make the action happen twice.

The packet is still encrypted — they don’t read it, they just resend it.

Defense:

Nonces — random one-time values in every request; server rejects duplicates.
Timestamps — reject anything older than N seconds.
Sequence numbers — TLS does this internally; every record has an incrementing counter.

Quick Reference

DDoS

Overwhelm with traffic

Defend: scrubbing, rate limit

MITM

Sit between parties

Defend: HTTPS, HSTS

ARP Spoof

Fake LAN identity

Defend: DAI, switch security

DNS Spoof

Poison DNS cache

Defend: DNSSEC, DoH

SYN Flood

Half-open TCP exhaustion

Defend: SYN cookies

Replay

Resend captured msg

Defend: nonce, timestamp

Interview Tip

For each attack, an interviewer wants three things: what it does, how it works at the protocol level, and at least one mitigation. Don’t just say “use HTTPS” for everything — show that we understand which layer the attack hits.

References

Firewalls (Stateful vs Stateless)

intermediate firewall iptables nftables waf security networking

A firewall is a piece of software (or hardware) that decides which packets get through and which get dropped. It sits between two networks and applies a set of rules.

In simple language: a bouncer at the door. Has a guest list (the rules). Anyone not on the list, denied.

Stateless (Packet Filter)

Looks at each packet in isolation. No memory of past packets. Rules match on headers only — source IP, destination IP, source port, destination port, protocol.

ALLOW any -> 1.2.3.4 port 443 (TCP)
DENY any -> any port 22 (TCP)

Pros: very fast, low memory. Cons: can’t distinguish a reply from a fresh attack. To allow returning traffic, we have to open ephemeral ports both ways, which is loose.

Stateful (Connection Tracking)

Keeps a table of active connections. When packet comes in, the firewall checks: “is this part of an existing connection I already approved?”

If yes, the packet is allowed without re-checking rules. If no, the rules are evaluated.

This is how almost every modern firewall works (iptables conntrack, pf, Windows Firewall, AWS Security Groups).

# iptables stateful rule — allow return traffic for established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Allow new SSH connections
iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW -j ACCEPT

# Drop everything else
iptables -A INPUT -j DROP

Pros: smart enough to handle replies automatically, less rule writing, more secure default. Cons: uses memory per connection (state table can be exhausted by SYN floods).

Application / Layer 7 / WAF

A Web Application Firewall reads inside the HTTP request — headers, URL, body, cookies — and matches against rules like “block if path contains ../” or “block requests with SQL injection patterns.”

WAFs sit in front of web servers (Cloudflare, AWS WAF, ModSecurity). They’re the only firewall that can stop application-layer attacks like SQLi, XSS, or attacks on /login endpoints.

Default-Deny vs Default-Allow

Two philosophies for the catch-all rule at the end:

Default-deny — deny anything not explicitly allowed. Always preferred for production. Adding a service is an explicit decision.
Default-allow — allow anything not explicitly denied. Easier to start with but unsafe — every new service is exposed unless someone remembers to block it.

# Default-deny on iptables INPUT chain
iptables -P INPUT DROP
# Now nothing comes in unless we explicitly ALLOW it above this point

iptables vs nftables

iptables — the classic Linux firewall tool. Five tables (filter, nat, mangle, raw, security), chains (INPUT, OUTPUT, FORWARD), rules.
nftables — the modern replacement. Single tool (nft), unified syntax, faster rule evaluation. Default on most modern Linux distros.

# nftables example — allow SSH and HTTP
nft add rule inet filter input tcp dport { 22, 80, 443 } accept
nft add rule inet filter input ct state established,related accept
nft add rule inet filter input drop

AWS — Security Groups vs NACLs

A common interview question. Both filter traffic, both work in the cloud, but they’re different:

Security Group — stateful, attached to instances/ENIs, only allow rules. Reply traffic is automatic.
Network ACL — stateless, attached to subnets, allow + deny rules with order. Reply traffic needs explicit rules.

Use Security Groups for normal app rules. Use NACLs for blanket subnet-level blocks (e.g., block a known bad IP range).

Common Gotcha

Stateless ≠ slow. Stateful ≠ smart enough. A SYN flood fills up the conntrack table of a stateful firewall. The fix is nf_conntrack_max tuning, SYN cookies, and SYN proxy in front of the firewall — not switching to stateless.

Interview Tip

When asked about firewalls, structure the answer by layer. L3/L4 packet filter (stateless) → L4 stateful firewall → L7 WAF. Each layer catches different attacks. A real production stack uses all three.

References

HTTP Security Headers (HSTS, CSP, etc.)

intermediate http headers security hsts csp web

HTTP security headers are response headers we set on a web server to tell browsers “behave safely on this site.” They’re free, take a few lines of nginx config, and stop a surprising number of attacks.

In simple language: by default browsers are pretty lax — they’ll let any script run, render any iframe, follow any redirect. These headers lock things down.

HSTS — Strict-Transport-Security

Tells the browser “always use HTTPS for this domain — even if the user types http://.”

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

max-age — how long the browser remembers (in seconds). 1 year is standard.
includeSubDomains — applies to *.example.com too.
preload — opts into the HSTS preload list baked into browsers, so even the very first visit is HTTPS-only.

Stops: SSL stripping attacks (where a MITM downgrades HTTPS to HTTP).

CSP — Content-Security-Policy

A whitelist for what content the page is allowed to load. Probably the most powerful and the most annoying header to configure.

Content-Security-Policy: default-src 'self'; script-src 'self' https://cdn.example.com; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; frame-ancestors 'none'

default-src 'self' — by default, only load resources from our own origin.
script-src — JS sources allowed (no inline scripts unless 'unsafe-inline' or a nonce).
frame-ancestors — who can iframe us (replaces X-Frame-Options).

Stops: XSS (rogue inline scripts), clickjacking, data exfiltration, mixed content.

Tip: roll out CSP in report-only mode first to find what breaks.

Content-Security-Policy-Report-Only: default-src 'self'; report-uri /csp-report

X-Content-Type-Options

X-Content-Type-Options: nosniff

Tells the browser “trust the Content-Type header — don’t try to guess.” Without this, IE/older browsers might treat a .txt upload as an HTML or JS file, enabling XSS through MIME sniffing. Always set this.

X-Frame-Options

X-Frame-Options: DENY

Or SAMEORIGIN. Tells browsers “don’t let other sites iframe me.” Stops clickjacking (where an attacker iframes our site and tricks users into clicking buttons they can’t see).

Modern alternative: Content-Security-Policy: frame-ancestors 'none'. Set both for legacy browser support.

Referrer-Policy

Referrer-Policy: strict-origin-when-cross-origin

Controls how much info the Referer header leaks when our users click external links. Common values:

no-referrer — never send Referer at all.
same-origin — only send for same-site requests.
strict-origin-when-cross-origin — send full URL same-origin, only origin cross-origin, nothing on HTTPS-to-HTTP. Sensible default.

Stops: leaking session tokens or user data via URL parameters in Referer.

Permissions-Policy

(Used to be called Feature-Policy.) Controls which browser APIs the page can use.

Permissions-Policy: camera=(), microphone=(), geolocation=(self), payment=()

camera=() — disable camera entirely.
geolocation=(self) — only our own page can request location.

Stops: rogue scripts (or compromised third parties) from accessing sensitive APIs.

Other Useful Headers

Cross-Origin-Opener-Policy: same-origin — isolates browsing context, mitigates Spectre.
Cross-Origin-Embedder-Policy: require-corp — required for cross-origin isolation features (SharedArrayBuffer).
Cross-Origin-Resource-Policy: same-origin — controls who can fetch our resources.
X-XSS-Protection: 0 — old IE/Edge XSS filter, disable it (it caused more bugs than it fixed; CSP replaces it).

A Sane Starting Set

For a typical web app, these six headers cover the common cases:

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Content-Security-Policy: default-src 'self'
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: camera=(), microphone=(), geolocation=()

Set them once in nginx / Caddy / your CDN, scan with securityheaders.com, iterate.

Example: Setting Headers in Nginx

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'" always;

The always keyword ensures the header is set even on error responses (4xx, 5xx).

Interview Tip

When asked “how do you secure a web app?” — these headers are an easy, concrete answer alongside HTTPS, input validation, and auth. Knowing what each one prevents (CSP → XSS, X-Frame-Options → clickjacking, HSTS → SSL stripping) shows depth.

References

Performance, Scaling & Debugging

Latency vs Bandwidth vs Throughput

intermediate latency bandwidth throughput performance rtt networking

Latency, bandwidth, and throughput sound similar. They’re not. Mixing them up is the root cause of half the “why is my app slow” debates.

Definitions

Latency — how long it takes for one packet to travel from A to B. Measured in milliseconds.
Bandwidth — the theoretical maximum capacity of the link. Measured in bits per second.
Throughput — the actual rate of data we observe end to end. Always ≤ bandwidth.

The Highway Analogy

Imagine a highway between two cities.

Bandwidth = the number of lanes. More lanes = more cars can travel in parallel.
Latency = how long it takes one car to drive from city A to city B. Fixed by speed and distance.
Throughput = how many cars actually arrive per minute. Depends on lanes, speed limit, traffic jams, accidents.

A 16-lane highway (high bandwidth) with a 5-hour drive (high latency) still delivers a lot of cars per hour because the lanes are wide. But if all the lanes are jammed, throughput drops to a trickle.

Why Low Bandwidth + Low Latency ≠ High Throughput

Throughput is bounded by both plus the protocol overhead. TCP, for example, requires acknowledgments. The sender can only send a “window” of unacknowledged data, then must wait for an ACK before sending more.

That window divided by the round-trip time gives us the maximum effective throughput, no matter how big the pipe is.

Throughput ~= Window Size / RTT

If we have 1 Gbps of bandwidth but 200 ms RTT and a 64 KB window, throughput is capped around:

64 KB / 0.2 s = 320 KB/s ~= 2.5 Mbps

Even though the link can do 1000 Mbps. This is the long fat pipe problem.

Bandwidth-Delay Product (BDP)

BDP = bandwidth × RTT. It tells us how much data is “in flight” at any moment.

BDP = 1 Gbps * 100 ms = 1e9 bits/s * 0.1 s = 1e8 bits = 12.5 MB

If our TCP window is smaller than BDP, we can’t fill the pipe. Modern TCP uses window scaling (RFC 7323) to grow the window beyond the original 64 KB cap.

RTT — Round-Trip Time

The time for a packet to go to a host and come back. Measured by ping.

# Measure RTT to a server
ping -c 5 google.com

# Continuous monitoring with jitter
mtr google.com

Typical RTTs:

Same datacenter: < 1 ms
Same city: 5-10 ms
Cross-country (US): 50-80 ms
Cross-continent: 100-200 ms
Satellite (GEO): 600+ ms

Why Latency Matters Even When Bandwidth Is Huge

Loading a page often means dozens of small HTTP requests in serial — DNS, TLS handshake, fetch HTML, fetch CSS, fetch JS, etc. Each one pays the RTT cost.

A 100 ms RTT × 10 sequential requests = 1 second of pure waiting, regardless of bandwidth. That’s why CDNs (low latency) often matter more than upgrading the connection (high bandwidth).

This is also why HTTP/2 multiplexing and HTTP/3 (over QUIC) save time — they parallelize the round trips.

Latency Hierarchy (Approximate)

L1 cache          0.5 ns
L2 cache          5 ns
RAM               100 ns
SSD read          150 us
HDD seek          10 ms
Network LAN       0.5 ms
Network WAN       50-150 ms
Mobile (4G/5G)    30-50 ms

Notice the 5-7 orders of magnitude between RAM and a typical WAN round trip. That’s why caching matters so much.

Measuring Throughput

# iperf3 - the standard tool for measuring real throughput
# On the server
iperf3 -s
# On the client
iperf3 -c server.example.com -t 30

iperf3 reports actual achievable throughput. Compare with the link’s advertised bandwidth — the gap is overhead, congestion, and protocol limits.

Interview Tip

If asked “would you rather have 10x bandwidth or 1/10 latency?” — for most user-facing workloads (web pages, APIs, gaming), lower latency wins. Bandwidth helps with bulk transfers (video streaming, backups). Always tie the answer to the workload.

References

CDN & Edge Networks

intermediate cdn edge anycast caching performance networking

A CDN (Content Delivery Network) is a global network of servers that caches our content close to users. Instead of every request hitting our origin server in us-east-1, a user in Mumbai gets served from a Mumbai cache.

In simple language: many copies of our static stuff, spread around the world, so the bytes don’t have to travel halfway around the planet.

This note focuses on the network mechanics — how the routing, caching, and origin protection work. (For the “what is a CDN” view in system design, that’s covered in HLD.)

Edge PoPs

A Point of Presence (PoP) is a CDN’s data center in a particular city. Cloudflare has 300+ PoPs. Akamai has 4000+. AWS CloudFront has 600+.

Each PoP holds:

A cache (fast SSD) for popular content.
Compute for edge functions (Cloudflare Workers, Lambda@Edge).
TLS termination — TLS handshake completes at the edge, not at origin.

The PoP closest to the user (by latency, not just by km) handles the request.

Anycast — How Routing Knows the Closest PoP

This is the magic. The CDN announces the same IP address from every PoP. When a user does DNS for cdn.example.com and gets, say, 172.67.1.1, every PoP in the world is announcing that IP via BGP.

The internet’s routing protocols naturally pick the shortest path to that IP. A user in Mumbai gets routed to the Mumbai PoP. A user in Tokyo gets routed to the Tokyo PoP. Same IP, different physical machines.

# See how routing differs by location with traceroute
traceroute 172.67.1.1
# From Mumbai - hits Mumbai PoP
# From Tokyo - hits Tokyo PoP

Compare this to unicast (each server has a unique IP), where DNS-based geo-routing is needed and is much sloppier.

Cache Hit vs Miss

When a request reaches a PoP, it checks its cache:

Cache hit — content is in the local SSD. Served in under 10 ms. The origin never sees this request.
Cache miss — content is not cached (first request, or cache expired). The PoP fetches from origin, serves to the user, and stores a copy.

# Cloudflare returns this header so we can see hit rate
CF-Cache-Status: HIT
CF-Cache-Status: MISS
CF-Cache-Status: EXPIRED
CF-Cache-Status: REVALIDATED

Cache hit ratio is the single most important CDN metric. 95%+ is the goal for static assets. Below 80% means cache rules need work.

Cache Control

The origin tells the CDN how long to cache via Cache-Control headers.

Cache-Control: public, max-age=31536000, immutable

max-age — seconds the response can be cached.
immutable — content will never change (great for hashed asset filenames like app.a8f3.js).
s-maxage — overrides max-age just for shared caches (CDNs).

Origin Shielding

Without shielding, every PoP that gets a cache miss hits the origin directly. With 300 PoPs, that’s 300 origin requests for one popular item right after a deployment.

Origin shield designates one regional PoP to be the parent. All other PoPs miss to the shield first. Only the shield can hit the origin. This collapses 300 origin requests into one.

User -> Mumbai PoP (miss)
       -> Singapore Shield (miss)
       -> Origin in us-east-1 (one request)

User -> Tokyo PoP (miss)
       -> Singapore Shield (HIT this time)
       -> No origin request

Push vs Pull CDN

Pull CDN (the standard) — we just put content on our origin. The CDN pulls it on first cache miss. Auto-managed. CloudFront, Cloudflare, Fastly all default to pull.
Push CDN — we explicitly upload content to the CDN ahead of time. Used for huge static archives, software downloads, video catalogs where pull-on-miss latency is unacceptable.

For most apps, pull is the right answer. Less ops work.

Signed URLs

For private content (premium video, paid downloads), we don’t want a public URL. Signed URLs include a cryptographic signature and expiry time.

https://cdn.example.com/video.mp4?Expires=1715000000&Signature=abc123...

The CDN validates the signature at the edge. Tampered or expired URLs are rejected without ever touching the origin. Used heavily by AWS S3 + CloudFront, Google Cloud CDN, and video platforms.

Cache Invalidation (Purge)

When we deploy new content, we want to evict old versions. Two strategies:

Versioned URLs (best) — app.v123.js → app.v124.js. New URL means new cache key. Old version naturally fades.
Purge API — call the CDN: “drop this URL from all PoPs.” Slower (seconds to minutes to propagate), and rate-limited.

The “name things with a hash” pattern is dramatically better than purging.

What CDNs Do Beyond Caching

Modern CDNs are full-stack edge platforms:

DDoS scrubbing — absorb volumetric attacks at the edge.
WAF — filter malicious requests before they reach origin.
Edge compute — run JS/Wasm at the edge (Cloudflare Workers, Lambda@Edge).
TLS termination — the cert lives at the edge, origin can be HTTP-only inside a VPC.
Image optimization — resize/compress on the fly.

Interview Tip

When asked “how does a user in India get served from an India server?” — the answer is anycast routing (everyone advertises the same IP, BGP picks the shortest path). This single insight separates surface-level CDN knowledge from real understanding.

References

Load Balancing (L4 vs L7)

intermediate load-balancing l4 l7 nginx haproxy networking

A load balancer sits in front of multiple servers and decides which one handles each request. The two main types — L4 and L7 — operate at different layers of the OSI model and have very different superpowers.

L4 — Transport Layer Load Balancing

L4 means transport layer — TCP and UDP. The LB only sees IPs, ports, and TCP flags. It doesn’t read the bytes inside.

In simple language: it forwards packets like a smart router. “This TCP connection goes to backend B” — and from then on every packet of that connection goes to B.

Examples: AWS NLB, HAProxy in TCP mode, IPVS, Linux LVS.
Speed: Very fast. Can handle millions of connections per second on commodity hardware.
Visibility: None into the application. Can’t route by URL or hostname.
TLS: Pass-through. The LB doesn’t terminate TLS, the backend does.

L7 — Application Layer Load Balancing

L7 means application layer — usually HTTP/HTTPS. The LB parses the request, reads headers, URL, cookies, body.

In simple language: it acts like a smart receptionist. “Requests to /api/v2/... go to the new service. Mobile user-agents go to the mobile pool.”

Examples: AWS ALB, NGINX, HAProxy in HTTP mode, Envoy, Traefik.
Speed: Slower than L4 (parsing has cost), but still tens of thousands of req/s easily.
Visibility: Full HTTP. Can route by path, host, header, cookie.
TLS: Terminates TLS at the LB. Backends can be plain HTTP inside the VPC.

Side by Side

L4 (Transport)

TCP / UDP forwarding

Sees: IPs, ports

Speed: Very fast

Routing: By IP/port only

TLS: Pass-through

Tools: NLB, HAProxy TCP

Use when: Raw TCP, high QPS, non-HTTP

L7 (Application)

HTTP-aware routing

Sees: Headers, URL, cookies

Speed: Slower (still fast)

Routing: Path, host, header

TLS: Terminates here

Tools: ALB, NGINX, Envoy

Use when: Microservices, path routing, sticky sessions

L7 Superpowers

L7 unlocks features L4 simply cannot do:

Path-based routing — /api/* → api pool, /static/* → static pool.
Host-based routing — api.example.com → one pool, app.example.com → another.
Sticky sessions — same user goes to same backend (via cookie).
Header rewrites — add X-Forwarded-For, normalize hostnames.
TLS termination — cert managed in one place.
Request retries — retry on 502 to a different backend.
Canary deployments — route 1% of traffic to a new version.

Algorithms

How does the LB pick which backend? Common algorithms:

Round Robin — backend 1, 2, 3, 1, 2, 3… Simple. Doesn’t account for load.
Least Connections — pick whichever backend has the fewest active connections. Good for long-lived connections.
IP Hash — hash the client IP, always send to the same backend. Naive sticky-session option.
Weighted — backend A is bigger, give it 60%; B and C get 20% each. Good when servers are heterogeneous.
Least Response Time — pick the backend with the lowest p50/p95 latency. Smartest, used by Envoy and modern LBs.
Power of Two Choices — pick two random backends, send to whichever has fewer connections. Fast, near-optimal load distribution.

Health Checks

A load balancer is only useful if it stops sending traffic to dead backends. Both L4 and L7 do periodic health checks:

L4 — TCP connect: did the port answer?
L7 — HTTP GET /healthz: did we get a 200?

L7 health checks are stronger because a server can accept TCP but be returning 500s — L4 wouldn’t notice.

Example: NGINX L7 Config

upstream api_backend {
    least_conn;
    server 10.0.1.10:3000 weight=2;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000 backup;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    location /v1/ {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_header;
    }
}

Real-World Stack: Use Both

Big architectures often layer them:

Client -> L4 (NLB, terminates nothing, fast) -> L7 (ALB/NGINX, routes by path, terminates TLS) -> Backends

The L4 layer absorbs raw connections at line rate. The L7 layer does the smart stuff.

Interview Tip

Don’t say “L4 is faster, so always use L4.” Say “L4 is faster but blind. Use L7 when we need HTTP-level features (path routing, sticky sessions, TLS termination); use L4 when we need raw TCP throughput or the protocol isn’t HTTP.” Bonus: mention that ALB is L7, NLB is L4 — comes up constantly in AWS interviews.

References

Forward Proxy vs Reverse Proxy

intermediate proxy forward-proxy reverse-proxy nginx vpn networking

Both forward and reverse proxies sit between two parties and forward traffic. The difference is who they’re working for — and who they’re hiding.

In simple language: a forward proxy works for the client (hides clients from servers). A reverse proxy works for the server (hides servers from clients).

Forward Proxy

The client deliberately routes its traffic through the proxy. The destination server has no idea who the original client is — it just sees the proxy’s IP.

Who configures it? The client.

Who knows about it? The client. The destination server might not even realize a proxy exists.

Examples:

A corporate proxy that all employees go through (so IT can filter sites and log activity).
A VPN exit node — Netflix sees a US IP instead of our actual location.
Tor — many proxies in series for anonymity.
Squid, a classic forward proxy server.

# Use a forward proxy with curl
curl -x http://proxy.company.com:8080 https://example.com

Reverse Proxy

The client thinks it’s talking to the server directly. The proxy is invisible. Behind the scenes, it forwards to one of many real backend servers.

Who configures it? The server / sysadmin.

Who knows about it? Only the server side. The client just sees the proxy’s IP and thinks it’s the real server.

Examples:

NGINX in front of a Node.js app.
Cloudflare in front of our entire site.
AWS ALB / ELB.
Caddy, Traefik.
API gateways (Kong, Tyk).

# Reverse proxy in nginx
server {
    listen 443 ssl;
    server_name example.com;

    location / {
        proxy_pass http://backend:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Direction Diagram — Who Hides Who

Forward Proxy — hides the CLIENT

Client A

Client B

Client C

→

Forward Proxy

→

Internet / Server

Server only sees the proxy's IP. It has no idea Client A, B, or C exists.

Reverse Proxy — hides the SERVER

Internet / Client

→

Reverse Proxy

→

Server X

Server Y

Server Z

Client only sees the proxy's IP. It has no idea Server X, Y, or Z exists.

Why Use a Forward Proxy?

Bypass censorship / geo-restrictions — VPNs are a forward proxy underneath.
Privacy / anonymity — Tor.
Corporate filtering / logging — block social media, log every request.
Caching — older corporate proxies cached web content to save bandwidth.
Authentication enforcement — single sign-on at the network layer.

Why Use a Reverse Proxy?

Load balancing — split traffic across many backend servers.
TLS termination — cert management in one place.
Caching — return cached responses without hitting backend.
Security — backend never directly exposed to the internet.
Rate limiting / WAF — block abuse at the edge.
Compression / response rewriting — gzip once at the proxy.
Hide internal architecture — many microservices behind one URL.

A CDN Is a Reverse Proxy

Cloudflare, CloudFront, Fastly are all distributed reverse proxies. Users hit the CDN. The CDN forwards (on miss) to our origin. The user never directly contacts our origin.

Common Confusion

People say “proxy” and mean forward, but they describe a reverse proxy setup. The trick:

“I’m using a proxy to access blocked sites” → forward proxy.
“Nginx in front of my app” → reverse proxy.
“Cloudflare protects my site” → reverse proxy.
“Company VPN” → forward proxy.

Interview Tip

The cleanest definition: forward proxy serves the client, reverse proxy serves the server. From there, every example follows. Bonus: mention that a CDN is just a globally-distributed reverse proxy with caching — it ties this concept to a topic interviewers love.

References

Network Debugging Toolkit

intermediate debugging tools cli tcpdump wireshark networking

When the network is broken, knowing the right tool saves hours. This is the toolkit that covers 99% of “why can’t I reach this server” debugging.

ping — Is the host reachable?

The simplest test. Sends ICMP echo requests and prints round-trip time.

ping -c 5 google.com
# Use it for: connectivity, RTT, packet loss

If ping works but TCP doesn’t, the issue is firewall / port-level. Some networks block ICMP, so a failing ping doesn’t always mean the host is down.

traceroute / mtr — What’s the path?

traceroute shows every hop between us and the destination. mtr is traceroute + ping — continuous, with packet loss per hop. Always prefer mtr.

traceroute google.com
mtr google.com
# Use it for: locating slow / lossy hops, routing issues

If hop 7 has 30% loss, the problem is probably at that ISP, not our app.

dig / nslookup — DNS resolution

dig is the modern, scriptable DNS tool. nslookup is older but ubiquitous.

# Resolve a record
dig example.com

# Use a specific resolver
dig @8.8.8.8 example.com

# Trace the resolution from root
dig +trace example.com

# Other record types
dig example.com MX
dig example.com TXT
dig example.com NS

When a site “doesn’t load,” 90% of the time DNS is the first thing to check.

netstat / ss — What’s listening?

ss (socket statistics) replaces the older netstat.

# All listening TCP ports
ss -tlnp

# All connections, with PIDs
ss -tnp

# Show stats per protocol
ss -s

Use it for: “is my service even listening on port 3000?” or “how many connections does this server have right now?“

lsof -i — Which process owns a port?

# Who's using port 3000?
lsof -i :3000

# All network sockets used by a PID
lsof -i -p 12345

Best one-liner for “address already in use” errors.

tcpdump — Inspect raw packets

The Swiss army knife. Captures and prints (or saves) packets matching a filter.

# Capture HTTP traffic on eth0
sudo tcpdump -i eth0 'tcp port 80'

# Capture and save to a pcap file (for Wireshark)
sudo tcpdump -i any -w /tmp/capture.pcap 'host 1.2.3.4'

# Filter by host and port
sudo tcpdump -i eth0 -nn 'host 1.2.3.4 and tcp port 443'

# Verbose, show packet contents
sudo tcpdump -i eth0 -vvX 'tcp port 443'

Use it when curl fails but you can’t tell if packets are even going out. Capture, then inspect.

Wireshark — GUI for pcap analysis

Open the pcap file from tcpdump in Wireshark for visual analysis. Filter by tcp.port == 443, follow TCP streams, decode TLS handshakes, see retransmissions in red.

wireshark /tmp/capture.pcap

Use it when tcpdump’s text output isn’t enough — long sessions, complex protocols, retransmission analysis.

curl -v — HTTP debugging

curl is the universal HTTP debugger. The -v flag shows everything — DNS, connect, TLS, request/response headers, body.

# Verbose request
curl -v https://example.com

# Show only response headers
curl -I https://example.com

# Time breakdown of each phase
curl -w "@-" -o /dev/null -s https://example.com <<< '
  time_namelookup:  %{time_namelookup}\n
  time_connect:     %{time_connect}\n
  time_appconnect:  %{time_appconnect}\n
  time_starttransfer: %{time_starttransfer}\n
  time_total:       %{time_total}\n'

# Force HTTP/2 or HTTP/1.1
curl -v --http2 https://example.com
curl -v --http1.1 https://example.com

The timing output is gold for “why is this slow?” investigations — it tells us whether the bottleneck is DNS, TCP, TLS, or the server.

openssl s_client — TLS debugging

When HTTPS misbehaves and we need to see the handshake.

# Connect and show cert + handshake
openssl s_client -connect example.com:443 -servername example.com

# Force TLS 1.2 (test compatibility)
openssl s_client -connect example.com:443 -tls1_2

# Check cert expiry
echo | openssl s_client -connect example.com:443 2>/dev/null \
  | openssl x509 -noout -dates

Use it for cert errors, “wrong version of TLS,” or to see the cert chain a server actually serves.

DevTools Network Tab

The browser’s built-in network inspector. Worth its weight in gold for frontend issues.

Waterfall — see request order and timing.
Headers — request/response headers, cookies, cache hits.
Timing — DNS, connection, TLS, TTFB, content download.
Throttling — simulate slow 3G to find slow assets.
Initiator — what triggered each request.

For most “the page is slow” debugging, this is the first stop, not tcpdump.

Quick Cheat Sheet — Symptom to Tool

"Can I reach the host?"          ping, mtr
"Where's the path slow?"          mtr, traceroute
"DNS not resolving?"              dig, nslookup
"Is my service listening?"        ss -tlnp, lsof -i
"Port already in use?"            lsof -i :PORT
"Are packets going out?"          tcpdump
"What's the HTTP response?"       curl -v
"TLS / cert problem?"             openssl s_client
"Page slow in the browser?"       DevTools Network tab
"Need to deeply analyze pcap?"    Wireshark

Interview Tip

Interviewers love asking “the API is slow / unreachable, walk me through how you’d debug it.” A strong answer goes top-down: DevTools / curl → DNS (dig) → ping/mtr → ss / lsof on the server → tcpdump if needed. Knowing which tool answers which question shows real ops experience.