Gyaan

Threading vs Multiprocessing

intermediate threading multiprocessing concurrency parallelism

Python gives us two ways to run things at the same time: threads (concurrency) and processes (parallelism). They sound similar, but they work very differently under the hood.

Concurrency vs Parallelism

  • Concurrency — multiple tasks making progress by switching between them (like juggling)
  • Parallelism — multiple tasks literally running at the same time on different CPU cores

Threads give us concurrency. Processes give us true parallelism.

Threading (Shared Memory)
One Process
Thread 1 Thread 2 Thread 3
↕ all share the same memory
Shared Variables, GIL Lock
Good for: I/O-bound tasks (network, files)
Multiprocessing (Separate Memory)
Process 1
own memory
own GIL
Process 2
own memory
own GIL
Process 3
own memory
own GIL
Good for: CPU-bound tasks (math, processing)

The GIL Problem

Python has a Global Interpreter Lock (GIL) — a mutex that lets only one thread execute Python bytecode at a time. This means threads can’t truly run Python code in parallel.

So why use threads at all? Because when a thread is waiting for I/O (network response, file read, database query), it releases the GIL. Other threads can run during that wait time. That’s why threads are great for I/O-bound work.

For CPU-heavy work (number crunching, image processing), threads don’t help because the GIL blocks parallel execution. That’s when we reach for multiprocessing — each process has its own GIL.

Threading Basics

import threading
import time

def download(url):
    print(f"Downloading {url}...")
    time.sleep(2)  # simulating network I/O
    print(f"Done: {url}")

# Create and start threads
t1 = threading.Thread(target=download, args=("page1.html",))
t2 = threading.Thread(target=download, args=("page2.html",))
t1.start()
t2.start()

# Wait for both to finish
t1.join()
t2.join()
print("All downloads complete")  # takes ~2s, not ~4s

Multiprocessing Basics

import multiprocessing

def crunch_numbers(n):
    return sum(i * i for i in range(n))

if __name__ == "__main__":
    p1 = multiprocessing.Process(target=crunch_numbers, args=(10_000_000,))
    p2 = multiprocessing.Process(target=crunch_numbers, args=(10_000_000,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

The if __name__ == "__main__" guard is required for multiprocessing on some platforms (especially Windows and macOS) to prevent infinite process spawning.

Sharing Data Between Processes

Since processes have separate memory, we use Queue or Pipe to communicate.

from multiprocessing import Process, Queue

def worker(q, data):
    result = sum(data)
    q.put(result)  # send result back

if __name__ == "__main__":
    q = Queue()
    p = Process(target=worker, args=(q, [1, 2, 3, 4, 5]))
    p.start()
    result = q.get()  # blocks until result is available
    p.join()
    print(result)  # 15

When to Use Which

ScenarioUseWhy
Downloading filesThreadingI/O-bound, threads release GIL during waits
API callsThreadingNetwork I/O, same reason
Image processingMultiprocessingCPU-bound, needs true parallelism
Data crunchingMultiprocessingCPU-bound, bypasses the GIL
Simple scriptingNeitherKeep it simple until we need speed

In simple language, threads are like one chef switching between tasks in one kitchen. Processes are like multiple chefs, each with their own kitchen. Threads share everything (fast but tricky), processes are isolated (safe but heavier).