DevOps — Quick Summary

Quick revision: every topic, key terms, and mnemonics for DevOps.

This is a quick revision doc covering all 43 topics in the DevOps collection. Open the linked notes if you want depth — this is meant to re-cement what we already learned.

Linux Fundamentals

What it is. Linux follows the FHS (Filesystem Hierarchy Standard). Every distro lays out files in the same predictable directories — once we learn the layout, any Linux box feels familiar.

Key terms.

/etc — config files (nginx.conf, ssh, cron live here)
/var/log — system and app logs
/home — user home directories
/usr/bin — most user commands
/tmp — temp files (cleared on reboot)
/proc — virtual filesystem with process/system info

Commands.

pwd; ls -la; cd ~; cd -
cat / head -20 / tail -f file.log
grep -i "error" app.log; grep -rn "TODO" src/
find . -name "*.log" -mtime -1
ps aux | grep nginx | wc -l
echo "x" >> file; sort data > sorted; cmd 2> err.log
awk '{print $1, $3}' access.log

Remember. Config in /etc, logs in /var/log, our stuff in /home. Pipes are an assembly line. > overwrites, >> appends.

File Permissions and Ownership

What it is. Every file has owner/group/others, each with read/write/execute toggles. A 3x3 grid is the entire system.

Key terms.

r=4, w=2, x=1 — octal values, add per group
755 — scripts/dirs (rwxr-xr-x)
644 — regular files (rw-r—r—)
600 — secrets (only owner)
SUID (4xxx) — file runs as owner (e.g. passwd)
SGID (2xxx) — dir’s new files inherit group
Sticky bit (1xxx) — only file owner can delete (/tmp)

Commands.

chmod 755 deploy.sh; chmod +x file
chmod u=rwx,g=rx,o= dir/
chown manish:developers file.txt
chown -R user:group /var/www
umask 0022   # files=644, dirs=755

Remember. rwx → 421, add them per group. 755/644/600 covers 95% of real-world cases.

Process Management

What it is. Every running program is a process. We view them, signal them, and let systemd babysit them.

Key terms.

PID — process ID
Process — independent program, own memory; Thread — runs inside a process, shared memory
SIGTERM (15) — graceful “please stop”
SIGKILL (9) — instant death, can’t be caught
SIGHUP — reload config
Zombie — child finished but parent didn’t reap exit status
systemd — modern Linux service manager

Commands.

ps aux | grep nginx
top / htop
kill 1234        # SIGTERM
kill -9 1234     # SIGKILL (last resort)
nohup ./job.sh > out.log 2>&1 &
systemctl start|stop|restart|reload|status|enable nginx
journalctl -u nginx -f --since "1 hour ago"

Remember. Always try SIGTERM first, SIGKILL only when stuck. systemctl enable survives reboot, start runs now.

Shell Scripting Essentials

What it is. Shell scripts automate sequences of commands. Every DevOps engineer writes them daily.

Key terms.

Shebang — #!/bin/bash at top of file
$1 $2 $# — args and arg count
$? — last exit code (0 = success)
$(cmd) — command substitution
set -euo pipefail — safe-script holy trinity (exit on error, unset vars error, pipe failures)
[[ -f file ]] — modern test syntax
local — function-scoped variable

Code.

#!/bin/bash
set -euo pipefail
log() { echo "[$(date '+%H:%M')] $1"; }
for svc in nginx postgresql; do
  if systemctl is-active --quiet "$svc"; then
    log "OK $svc"
  else
    log "DOWN $svc"; systemctl restart "$svc"
  fi
done

Remember. No spaces around =. Always quote "$var". set -euo pipefail saves us from silent bugs.

Package Management and System Services

What it is. apt/yum install software, systemctl manages services, cron schedules tasks.

Key terms.

apt — Debian/Ubuntu; yum/dnf — RHEL/CentOS/Fedora
apt remove — keeps config files; apt purge — wipes config too
Cron format — min hour day month weekday
*/5 — every 5; 0 2 * * * — daily at 2 AM

Commands.

sudo apt update && sudo apt install nginx
sudo systemctl enable --now nginx
crontab -e   # */5 * * * * /opt/cron.sh >> /var/log/cron.log 2>&1
journalctl -u nginx --since "30 min ago" -f

Remember. Always apt update before install. After install: enable then start. Use crontab.guru to write cron expressions.

Networking Essentials

OSI Model and TCP/IP

What it is. Networking organized as layers, each with one job. OSI = 7 (theoretical), TCP/IP = 4 (practical).

Key terms.

OSI layers — Physical, Data Link, Network, Transport, Session, Presentation, Application
TCP/IP layers — Network Access, Internet, Transport, Application
Encapsulation — each layer wraps data with its own header (segment → packet → frame)
Devices — routers (L3), switches (L2), hubs (L1)

Mnemonic. OSI top-down: “All People Seem To Need Data Processing” (Application, Presentation, Session, Transport, Network, Data Link, Physical).

Remember. Layer 3 = routing/IP issues. Layer 4 = TCP/firewall/ports. Layer 7 = application/proxy errors. Pick the layer to debug.

DNS and Domain Resolution

What it is. DNS is the phone book of the internet — translates google.com to 142.250.80.46.

Key terms.

A — domain → IPv4
AAAA — domain → IPv6
CNAME — alias to another domain (no root domain!)
MX — mail server (with priority)
TXT — arbitrary text (SPF, DKIM, verification)
NS — authoritative nameservers
TTL — cache duration in seconds
Recursive resolver — ISP/Cloudflare/Google DNS that does the lookup work

Resolution flow. Browser cache → OS cache → recursive resolver → root → TLD (.com) → authoritative → answer.

Commands.

dig pman47.cc +short
dig @8.8.8.8 pman47.cc MX
dig pman47.cc +trace
nslookup -type=MX example.com

Remember. Lower TTL before migration. CNAMEs can’t sit at the root domain. /etc/hosts overrides DNS locally.

HTTP, HTTPS, and TLS

What it is. HTTP = how browsers and servers talk. HTTPS = HTTP wrapped in TLS encryption.

Key terms.

Idempotent — same call N times = same result (GET/PUT/DELETE yes, POST/PATCH no)
Status families — 1xx info, 2xx success, 3xx redirect, 4xx client, 5xx server
401 vs 403 — 401 “who are you”, 403 “I know you, you can’t do this”
502 vs 504 — 502 backend unreachable, 504 backend timed out
TLS handshake — ClientHello + key share → ServerHello + cert → encrypted (TLS 1.3 = 1 RTT)
Certificate — proves identity, signed by CA
HTTP/2 — multiplexing over one connection; HTTP/3 — over QUIC/UDP

Status code cheatsheet.

Code	Meaning
200	OK
201	Created
204	No Content
301	Moved Permanently
302	Found (temporary)
304	Not Modified
400	Bad Request
401	Unauthorized (not logged in)
403	Forbidden (not allowed)
404	Not Found
429	Too Many Requests
500	Internal Server Error
502	Bad Gateway
503	Service Unavailable
504	Gateway Timeout

Remember. Methods that change state (POST, PATCH) aren’t idempotent. Let’s Encrypt + Caddy = free auto HTTPS.

TCP vs UDP

What it is. Two transport protocols. TCP is reliable, UDP is fast.

Key terms.

TCP — connection-oriented, ordered, guaranteed delivery, retransmits, flow control
UDP — connectionless, no guarantees, 8-byte header, fire and forget
Three-way handshake — SYN → SYN-ACK → ACK
Four-way teardown — FIN → ACK → FIN → ACK
Window size — flow control buffer (receiver tells sender to slow)
Well-known ports — 0-1023 (need root); ephemeral 49152-65535

Common ports. 22 SSH, 53 DNS, 80 HTTP, 443 HTTPS, 3306 MySQL, 5432 Postgres, 6379 Redis.

Remember. Web/email/SSH/DB → TCP. Video/gaming/DNS/VoIP → UDP. HTTP/3 runs UDP via QUIC (UDP getting reliability layered on top).

Load Balancing

What it is. Distributes traffic across multiple servers for scalability + HA.

Key terms.

L4 — routes by IP+port, fast, no HTTP awareness
L7 — routes by URL/headers/cookies, slower, smart
Round Robin — turn by turn
Weighted RR — bigger servers get more
Least Connections — to whoever is least busy
IP Hash — same client → same server (sticky)
Health checks — active (LB pings) vs passive (LB watches errors)
Sticky sessions — same user → same server (avoid; use Redis instead)

Tools. Nginx, HAProxy, Caddy, AWS ALB (L7) / NLB (L4), Traefik.

Remember. Most web traffic uses L7. Avoid sticky sessions — push state to Redis.

Networking Tools and Troubleshooting

What it is. The toolbox for “the site is down” diagnosis.

Key terms.

curl — Swiss-army HTTP tool (-v verbose, -I headers, -L follow redirects)
ping — ICMP reachability test
traceroute — every hop on the path
ss / netstat — what’s listening on ports
tcpdump — raw packet capture
iptables / ufw — firewall

Commands.

curl -v -I https://example.com
ping -c 4 google.com; traceroute -n google.com
ss -tlnp | grep :80
sudo tcpdump -i any port 443 -A -tttt
sudo ufw allow 22/tcp; sudo ufw enable

Debugging workflow. ping → dig → ss/curl → service logs (journalctl -u, docker logs) → resources (top, df -h, free -h).

Remember. Always curl -v first. Full disk = silent death (df -h early in any debug).

Docker & Containers

Containers vs Virtual Machines

What it is. VMs run a full OS on a hypervisor. Containers share the host kernel via namespaces + cgroups.

Key terms.

Hypervisor — slices hardware among VMs (heavy, GBs, minutes to boot)
Namespaces — give container its own view (process, network, mount, user)
cgroups — limit CPU, memory, I/O per container
Container — milliseconds to start, MBs in size, shares host kernel

Remember. VM = whole apartment. Container = room in a co-living space. Modern stacks run containers inside VMs.

Docker Images and Layers

What it is. An image is a stack of read-only layers. Container = image + thin writable layer.

Key terms.

Layer — one Dockerfile instruction = one layer (cached, sharable)
Registry — Docker Hub, GHCR, ECR, GCR, ACR
Tag — mutable label (nginx:alpine)
Digest — immutable SHA256 hash (nginx@sha256:abc...)
latest tag — dangerous in production; pin versions or digests

Commands.

docker history nginx:alpine
docker pull ghcr.io/pman47/gyaan:latest
docker tag my-app:latest ghcr.io/me/my-app:v1
docker push ghcr.io/me/my-app:v1
docker images --digests

Remember. Image = class. Container = object. Layers cache by hash; same base layer is stored once on disk.

Dockerfile Best Practices

What it is. Writing efficient, secure, cache-friendly image builds.

Key terms.

FROM, WORKDIR, COPY, RUN, ENV, EXPOSE, CMD, ENTRYPOINT — core instructions
CMD — default command, fully overridable
ENTRYPOINT — fixed verb, args appended
Multi-stage build — build in one image, copy artifacts to a smaller runtime image
.dockerignore — exclude junk from COPY .
Cache invalidation — change one layer = rebuild everything below

Code.

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM nginx:1.27-alpine
COPY --from=builder /app/dist /usr/share/nginx/html
USER node
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Remember. Copy package.json BEFORE source code so deps cache survives code changes. Never bake secrets into images. Run as non-root.

Docker Networking

What it is. How containers talk to each other and the outside.

Key terms.

bridge — default; custom bridge — adds DNS by name
host — no isolation, container shares host net
none — zero networking
overlay — multi-host (Swarm/orchestration)
Port mapping -p HOST:CONTAINER

Commands.

docker network create my-net
docker run -d --name db --network my-net postgres:16
docker run -d --name app --network my-net -p 8080:3000 my-app
# app reaches db via hostname "db:5432"
docker network ls; docker network inspect my-net

Remember. Use custom bridge networks 90% of the time — they give DNS for free. Containers on the same network reach each other by name.

Docker Volumes and Storage

What it is. Containers are ephemeral. Volumes persist data.

Key terms.

Volumes — Docker-managed, in /var/lib/docker/volumes/, best for prod
Bind mounts — host path mounted in, best for dev (live-reload code)
tmpfs — in-memory only, for sensitive temp data

Commands.

docker volume create pg-data
docker run -d -v pg-data:/var/lib/postgresql/data postgres:16
docker run -d -v $(pwd):/app node:20  # bind mount

Remember. Database data → named volumes (always). Source code in dev → bind mounts. tmpfs for secrets you don’t want on disk.

Docker Compose

What it is. Define multi-container apps in one YAML file. One command to start/stop everything.

Key terms.

services — each container
depends_on — start order (only waits for container start, not service ready)
profiles — optionally run debug/dev services
Service name = DNS hostname on default network

Code.

services:
  api:
    build: .
    ports: ["3000:3000"]
    environment:
      DATABASE_URL: postgresql://app:secret@db:5432/myapp
    depends_on: [db]
    restart: unless-stopped
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: secret
    volumes:
      - pg-data:/var/lib/postgresql/data
volumes:
  pg-data:

Commands. docker compose up -d, down, logs -f, exec, ps, restart.

Remember. Service name IS the hostname. depends_on doesn’t wait for the app inside — use healthchecks for that.

Container Debugging and Commands

What it is. When containers crash or misbehave, here’s the toolbox.

Key terms.

docker ps -a — all containers including stopped
docker logs -f --tail 50 — recent logs
docker exec -it <c> sh — shell inside running container
docker inspect — full JSON state
Exit code 137 — OOMKilled or docker stop
Exit code 139 — segfault
docker stats — live CPU/memory

Crash workflow. docker ps -a → docker logs → docker inspect --format='{{.State.ExitCode}}' → docker run -it --entrypoint sh image:tag.

Cleanup. docker system prune -a --volumes (nuclear). docker system df (usage).

Remember. 137 = OOM. Run docker run -m 512m to up the limit. Always check logs before guessing.

Docker cheatsheet

Command	Purpose
`docker run -d -p 8080:80 nginx`	run detached, port-mapped
`docker exec -it <c> sh`	shell into running container
`docker logs -f --tail 100 <c>`	follow logs
`docker inspect <c>`	full state JSON
`docker stats`	live CPU/mem
`docker system prune -a`	clean unused stuff
`docker compose up -d --build`	start + rebuild

Kubernetes

Kubernetes Architecture

What it is. Container orchestrator. We declare desired state, K8s makes it happen.

Key terms.

Control plane — the brain (API Server, etcd, Scheduler, Controller Manager)
API Server — front door for every kubectl/component
etcd — distributed KV store, holds entire cluster state
Scheduler — picks which node runs each Pod
Controller Manager — control loops fixing drift
Worker node — runs kubelet, kube-proxy, container runtime
kubelet — agent that talks to API server, manages Pods on its node
kube-proxy — sets iptables/IPVS for Service traffic
CRI — Container Runtime Interface (containerd, CRI-O)

Remember. Control plane = brain, workers = hands. Everything goes through the API server. Components watch and react rather than calling each other directly.

Pods and Workloads

What it is. Pod = smallest deployable unit (1+ containers sharing network/storage). We use higher-level workloads.

Key terms.

Pod phases — Pending, Running, Succeeded, Failed, Unknown
Init container — runs before main container starts
Sidecar — helper container alongside main app (proxy, log shipper)
Deployment — stateless apps with rolling updates + rollback
ReplicaSet — Deployment uses this internally to keep N Pods alive
StatefulSet — stable hostnames, persistent storage, ordered (databases, queues)
DaemonSet — one Pod per node (log/metric agents)
Job / CronJob — run-to-completion / scheduled

Code.

apiVersion: apps/v1
kind: Deployment
metadata: { name: my-app }
spec:
  replicas: 3
  selector: { matchLabels: { app: my-app } }
  template:
    metadata: { labels: { app: my-app } }
    spec:
      containers:
        - name: app
          image: my-app:v2

Remember. Almost never create Pods directly. Deployment for stateless, StatefulSet for databases, DaemonSet for per-node, Job for batch.

Services and Networking

What it is. Pods are ephemeral with changing IPs. Services give stable endpoints.

Key terms.

ClusterIP — internal only (default)
NodePort — open static port (30000-32767) on every node
LoadBalancer — provisions cloud LB (one per service = $$$)
ExternalName — CNAME alias to external DNS
Selector — labels match Pods to Service
CoreDNS — <svc>.<ns>.svc.cluster.local
Ingress — HTTP routing by host/path through one LB
Ingress Controller — actual proxy (nginx-ingress, Traefik); Ingress = config

Remember. ClusterIP = internal default. Ingress = one cheap entry point routing to many services. LoadBalancer = expensive per-service cloud LB.

ConfigMaps and Secrets

What it is. Separate config from container images.

Key terms.

ConfigMap — non-sensitive key-value config
Secret — sensitive data, base64-encoded (NOT encrypted by default!)
Inject as env vars (need restart) or volume mounts (auto-update)
immutable: true — locks ConfigMap/Secret after creation

Code.

envFrom:
  - configMapRef: { name: app-config }
  - secretRef: { name: db-credentials }

Remember. Secrets are encoded, not encrypted. Real security needs encryption at rest in etcd or external (Vault, Sealed Secrets).

Persistent Volumes and Storage

What it is. Storage that survives Pod restarts.

Key terms.

PV — actual disk (EBS, PD, NFS)
PVC — request for storage (“I need 10Gi”)
StorageClass — defines how to dynamically provision
Access modes — RWO (one node R/W), ROX (many R), RWX (many R/W, needs NFS/EFS)
Reclaim policy — Delete (auto cleanup) or Retain (keep data)
volumeClaimTemplate — each StatefulSet replica gets its own PVC

Remember. Block storage (EBS, PD) is usually RWO only. RWX needs NFS-style. StatefulSet + volumeClaimTemplate = standard DB pattern.

Resource Management and Scaling

What it is. Tell K8s how much CPU/memory we need so nothing starves.

Key terms.

Requests — guaranteed minimum (used for scheduling)
Limits — hard ceiling
CPU over limit — throttled (slow but alive)
Memory over limit — OOMKilled (terminated)
QoS — Guaranteed (req=lim), Burstable (req<lim), BestEffort (none) — eviction order BestEffort first
HPA — scales replica count by CPU/memory/custom
VPA — right-sizes Pod requests
Cluster Autoscaler — adds/removes nodes
LimitRange — defaults per container; ResourceQuota — caps per namespace

Code.

resources:
  requests: { cpu: "250m", memory: "128Mi" }
  limits:   { cpu: "500m", memory: "256Mi" }

Remember. CPU throttle, memory kill. Always set requests + limits in prod. 1 CPU = 1 core, 1000m = 1 core.

RBAC and Security

What it is. Lock down who/what can do what.

Key terms.

Role — namespace-scoped permissions
RoleBinding — binds Role to subjects
ClusterRole / ClusterRoleBinding — cluster-wide
ServiceAccount — identity for Pods (vs Users for humans)
Pod Security Standards — Privileged / Baseline / Restricted
SecurityContext — runAsNonRoot, readOnlyRootFilesystem, drop capabilities
NetworkPolicy — Pod-level firewall (needs CNI like Calico/Cilium)
Default deny pattern — start with podSelector: {} ingress block, then allow

Remember. Principle of least privilege. Default-deny NetworkPolicy + drop ALL capabilities + non-root + read-only FS = serious hardening. RBAC denies trump allows.

kubectl cheatsheet

Command	Purpose
`kubectl get pods -n ns`	list pods
`kubectl describe pod <p>`	full pod state
`kubectl logs -f <p> -c <ctnr>`	follow logs
`kubectl exec -it <p> -- sh`	shell in pod
`kubectl apply -f file.yaml`	declarative apply
`kubectl rollout status deploy/x`	watch rollout
`kubectl rollout undo deploy/x`	rollback
`kubectl autoscale deploy x --min=2 --max=10 --cpu-percent=70`	quick HPA
`kubectl auth can-i list pods --as=...`	check RBAC

CI/CD & GitOps

CI/CD Fundamentals

What it is. Automate build/test/deploy. CI = catches bugs early. CD = ship safely + often.

Key terms.

CI — every push triggers automated build + test
Continuous Delivery — always deployable, human clicks deploy
Continuous Deployment — every passing change goes to prod automatically
Pipeline — stages: code → build → test → scan → deploy

Remember. Delivery = “we can deploy anytime.” Deployment = “we do deploy every time.” Smaller diffs are easier to debug.

Pipeline Design

What it is. Structuring fast, reliable pipelines.

Key terms.

Stages — Lint → Build → Unit Test → Integration → Security → Deploy
Artifacts — outputs passed between stages (no rebuilding)
Caching — node_modules, Docker layers, .m2
Parallel jobs — run independent stages concurrently
Matrix builds — test multiple versions/OSes in parallel

Code.

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: npm }
      - run: npm ci && npm run lint
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test
  deploy:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: ./deploy.sh

Remember. Fastest checks first. Keep total under 10 min — beyond that nobody waits. Cache aggressively.

Deployment Strategies

What it is. Safe ways to ship code with rollback paths.

Key terms.

Rolling — replace instances one at a time (K8s default, both versions briefly live)
Blue-Green — two envs, flip LB. Instant rollback. 2x cost.
Canary — 5% → 25% → 100% based on metrics. Safest. Needs observability.
Recreate — kill old, start new. Has downtime. Dev only.
Feature flags — deploy code disabled, toggle for users without redeploy

Strategy comparison.

Strategy	Downtime	Rollback	Cost	When
Recreate	Yes	Slow	1x	Dev/staging
Rolling	None	Slow	1x	Default for K8s
Blue-Green	None	Instant	2x	Need fast rollback
Canary	None	Fast	~1.1x	High-traffic, good monitoring

Remember. Default to rolling. Use canary when stakes are high and metrics are good.

GitOps and ArgoCD

What it is. Git is the source of truth. Cluster pulls desired state from Git.

Key terms.

Push — pipeline kubectl-applies (needs cluster creds)
Pull — agent inside cluster watches Git (more secure)
ArgoCD Application — CRD pointing at Git repo + path → cluster + namespace
Auto-sync — applies on Git change
Prune — deletes resources removed from Git
Self-heal — reverts manual cluster changes
Drift detection — actual vs Git state

Remember. “If it’s not in Git, it doesn’t exist.” Every change = PR. Audit log = git log.

Artifact Management and Registries

What it is. Store and version build outputs.

Key terms.

Artifact — built output (image, JAR, npm package, binary, Helm chart)
Registry — Docker Hub, GHCR, ECR, GCR, GAR, ACR, Harbor
Tagging — semver (v1.2.3), Git SHA, never latest in prod
Trivy — open-source vulnerability scanner
Cosign — image signing (Sigstore)

Code.

docker build -t ghcr.io/me/app:v1.2.3 -t ghcr.io/me/app:$(git rev-parse --short HEAD) .
docker push ghcr.io/me/app:v1.2.3
trivy image --exit-code 1 --severity CRITICAL ghcr.io/me/app:v1.2.3
cosign sign ghcr.io/me/app:v1.2.3

Remember. Tag with both semver AND git SHA. Never latest in prod manifests. Scan in CI; sign before deploy.

Cloud & Infrastructure

Cloud Computing Models

What it is. Spectrum from “we manage everything” (IaaS) to “we manage nothing” (SaaS).

Key terms.

IaaS — VMs, networking (EC2, Compute Engine, Droplets)
PaaS — runtime managed (Heroku, App Engine, Railway)
Serverless — function-as-a-service (Lambda, Cloud Functions, Workers)
SaaS — finished software (Gmail, Slack, GitHub)
Shared responsibility — provider secures the cloud, we secure what’s IN the cloud
Multi-cloud — multiple providers; Hybrid — on-prem + cloud

Remember. Higher in the stack = less control, less ops. Real architectures mix all four.

VPC and Network Architecture

What it is. Our isolated private network in the cloud.

Key terms.

VPC (AWS) / VPC Network (GCP) / VNet (Azure)
Public subnet — has Internet Gateway route, public IPs allowed (LB, bastion, NAT)
Private subnet — no direct internet, outbound via NAT (apps, DBs)
Internet Gateway — door to public internet
NAT Gateway — private subnet → outbound only
Route Table — rules per subnet
Security Group — instance-level, stateful (response auto-allowed)
NACL — subnet-level, stateless (must allow both directions)
VPC Peering — private connection between VPCs

Remember. SG = stateful = apartment door. NACL = stateless = building gate. Public for LB+NAT, private for apps+DB.

IAM and Access Management

What it is. Who can do what on which resources.

Key terms.

User — human; Group — collection of users
Role — temporary identity anyone can assume
Policy — JSON with Effect/Action/Resource
Principal — who’s making the request
Least privilege — give the minimum needed
Assume role — apps get temp credentials (Instance Profile / Service Account / Managed Identity)
MFA — required for humans, especially root
Audit — CloudTrail (AWS) / Cloud Audit Logs (GCP)

Code.

{ "Effect": "Allow",
  "Action": ["s3:GetObject", "s3:ListBucket"],
  "Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"] }

Remember. Deny always wins. Never Action: "*" for app roles. Use roles, not access keys, in code. MFA on the root account, then lock it away.

Cloud Storage and Databases

What it is. Different storage types for different jobs.

Key terms.

Object (S3, GCS, Blob) — files via HTTP, unlimited, cheapest. For uploads, backups, static assets.
Block (EBS, PD, Managed Disks) — virtual disk, one VM, fast IOPS. For OS, databases.
File (EFS, Filestore, Files) — shared NFS, multiple VMs.
Managed DB — RDS/Cloud SQL (SQL), DynamoDB/Firestore/Cosmos (NoSQL).
Cache — ElastiCache, Memorystore (Redis/Memcached).

Remember. S3 for files, EBS for disks, RDS for SQL, Redis for hot reads. Wrong choice = expensive and slow.

Serverless and Managed Services

What it is. Functions triggered by events. Pay per invocation. Scale to zero.

Key terms.

Lambda / Cloud Functions / Workers — code runners
API Gateway — HTTP front door for Lambda
SQS — queue (decouple, retry)
SNS — pub/sub fan-out
EventBridge — event router
Cold start — first invocation = 100ms-2s spin up
Limits — Lambda max 15 min execution

When YES. Sporadic workloads, event processing, cron jobs, variable traffic APIs.

When NO. Latency-critical APIs (cold starts), long jobs (>15 min), steady high throughput (containers cheaper).

Remember. Free when idle, expensive at huge scale. Watch out for cold starts and vendor lock-in.

Infrastructure as Code

IaC Concepts and Benefits

What it is. Infrastructure defined in code, stored in Git, applied by tools.

Key terms.

Declarative — describe end state (Terraform, CloudFormation)
Imperative — describe steps (bash, AWS CLI)
Idempotency — running twice = same result
Reproducibility — same code → same infra
Tools: Terraform, Pulumi, CloudFormation, Ansible

Remember. Declarative = ordering food. Imperative = giving cooking instructions. Most modern IaC is declarative.

Terraform Fundamentals

What it is. Declarative IaC tool by HashiCorp. Uses HCL.

Key terms.

Provider — plugin (aws, google, azurerm, cloudflare)
Resource — actual thing (aws_s3_bucket)
Variable — input (with type + default)
Output — exposed value
Data source — read existing resource
Workflow — init → plan → apply → destroy

Code.

provider "aws" { region = "ap-south-1" }
variable "bucket_name" { type = string }
resource "aws_s3_bucket" "assets" {
  bucket = var.bucket_name
  tags = { ManagedBy = "terraform" }
}
output "bucket_arn" { value = aws_s3_bucket.assets.arn }

Remember. ALWAYS read terraform plan before apply. resource creates, data reads.

Terraform State and Modules

What it is. State file maps config → real resources. Modules = reusable packages.

Key terms.

State — terraform.tfstate JSON, Terraform’s memory
Local state — fine for solo, broken for teams
Remote backend — S3 + DynamoDB (lock) standard
State locking — prevents concurrent apply
Drift detection — state vs reality
Workspaces — separate state per env (dev/stage/prod)
Module — directory with main.tf/variables.tf/outputs.tf
Reference: module "x" { source = "./mod"; ... }

State commands. terraform state list / show / rm / mv.

Remember. Remote backend with locking = day-1 setup. Modules = stop copy-pasting. state rm forgets but doesn’t delete.

Ansible Basics

What it is. Agentless config management over SSH. Configures servers AFTER they exist.

Key terms.

Inventory — list of hosts (grouped)
Playbook — YAML of tasks for hosts
Module — built-in action (apt, copy, service, template, user, docker_container)
Handler — task triggered by notify: (e.g., restart nginx)
Role — packaged tasks/files/templates/handlers/defaults
Idempotent — state: present, state: started — does nothing if already so
Galaxy — npm-like for roles

Code.

- name: Setup web
  hosts: webservers
  become: true
  tasks:
    - apt: { name: nginx, state: present, update_cache: true }
    - copy: { src: nginx.conf, dest: /etc/nginx/nginx.conf }
      notify: restart nginx
    - service: { name: nginx, state: started, enabled: true }
  handlers:
    - name: restart nginx
      service: { name: nginx, state: restarted }

Remember. Terraform builds the house, Ansible furnishes it. No agents needed — just SSH.

Observability & Reliability

Monitoring and Alerting

What it is. Watch infra + apps, alert on real problems before users notice.

Key terms.

Counter — only goes up (total requests)
Gauge — up and down (current memory)
Histogram — distribution (p50, p95, p99 latency)
Summary — client-side histogram
Prometheus — pull-based TSDB, scrapes /metrics, queries with PromQL
Grafana — dashboards
Alertmanager — routes alerts to Slack/PagerDuty/email
USE method (infra) — Utilization, Saturation, Errors
RED method (services) — Rate, Errors, Duration

PromQL.

rate(http_requests_total[5m])
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Alert rules. Alert on symptoms not causes. Every alert actionable. Severities: critical/warning/info. Include runbook links.

Remember. USE for infra, RED for services. Alert fatigue is the real enemy — fewer good alerts > many noisy ones.

Logging and Log Aggregation

What it is. Centralize all logs, structured as JSON, searchable.

Key terms.

Structured logging — JSON with consistent fields
Log levels — DEBUG (dev only), INFO (normal), WARN (handled), ERROR (needs attention), FATAL (crash)
ELK — Elasticsearch (store), Logstash (collect/parse), Kibana (UI)
EFK — replaces Logstash with Fluentd (lighter, K8s-native)
Correlation ID / Trace ID — unique ID per request, threaded through every service
Log rotation — logrotate, Docker --log-opt max-size
ILM — Elasticsearch index lifecycle (hot → cold → delete)

Remember. Always log structured JSON. Generate correlation ID at entry, pass it through every call. Hot logs in ES, cold in S3/Glacier.

Secrets Management and TLS

What it is. Keep passwords/tokens out of code; encrypt traffic.

Key terms.

Env vars — fine for dev, leaky for prod (visible in /proc, logs)
Vault — HashiCorp’s secret store, audit + RBAC + dynamic secrets
Dynamic secrets — Vault creates short-lived DB users on demand
Cloud secrets — AWS Secrets Manager, GCP Secret Manager, Key Vault
Sealed Secrets — encrypt secrets in Git, only cluster decrypts
TLS — encrypts data in transit; cert + private key + CA
cert-manager — auto-issues + renews Let’s Encrypt certs in K8s
Caddy — auto-HTTPS web server
mTLS — both sides present certs (service mesh: Istio, Linkerd)

Remember. Three rules: never in code, always encrypted, rotate regularly. Short-lived > rotation. Caddy/cert-manager remove cert-renewal headaches.

High Availability and Disaster Recovery

What it is. Stay running through failures. Recover from disasters.

Key terms.

SPOF — single point of failure (find them, eliminate them)
Active-Active — all nodes serve traffic
Active-Passive — standby takes over (DB primary-replica)
RPO — Recovery Point Objective = how much data we can lose
RTO — Recovery Time Objective = how fast we recover
Backups — full / incremental / differential
Liveness probe — is process alive? (restart on fail)
Readiness probe — can it serve traffic? (remove from LB on fail)
Chaos engineering — intentionally break things in controlled ways (Chaos Monkey)

RPO/RTO mnemonic. RPO is “how far back” (data loss tolerance). RTO is “how long down” (downtime tolerance). Lower = pricier.

Remember. Untested backup is not a backup. Test restore drills. Health checks + auto-failover = invisible recovery. Multi-region = serious HA.

DevOps — Quick Summary

Linux Fundamentals

Linux Filesystem and Navigation

File Permissions and Ownership

Process Management

Shell Scripting Essentials

Package Management and System Services

Networking Essentials

OSI Model and TCP/IP

DNS and Domain Resolution

HTTP, HTTPS, and TLS

TCP vs UDP

Load Balancing

Networking Tools and Troubleshooting

Docker & Containers

Containers vs Virtual Machines

Docker Images and Layers

Dockerfile Best Practices

Docker Networking

Docker Volumes and Storage

Docker Compose

Container Debugging and Commands

Docker cheatsheet

Kubernetes

Kubernetes Architecture

Pods and Workloads

Services and Networking

ConfigMaps and Secrets

Persistent Volumes and Storage

Resource Management and Scaling

RBAC and Security

kubectl cheatsheet

CI/CD & GitOps

CI/CD Fundamentals

Pipeline Design

Deployment Strategies

GitOps and ArgoCD

Artifact Management and Registries

Cloud & Infrastructure

Cloud Computing Models

VPC and Network Architecture

IAM and Access Management

Cloud Storage and Databases

Serverless and Managed Services

Infrastructure as Code

IaC Concepts and Benefits

Terraform Fundamentals

Terraform State and Modules

Ansible Basics

Observability & Reliability

Monitoring and Alerting

Logging and Log Aggregation

Secrets Management and TLS

High Availability and Disaster Recovery