Python gives us two ways to run things at the same time: threads (concurrency) and processes (parallelism). They sound similar, but they work very differently under the hood.
Concurrency vs Parallelism
- Concurrency — multiple tasks making progress by switching between them (like juggling)
- Parallelism — multiple tasks literally running at the same time on different CPU cores
Threads give us concurrency. Processes give us true parallelism.
The GIL Problem
Python has a Global Interpreter Lock (GIL) — a mutex that lets only one thread execute Python bytecode at a time. This means threads can’t truly run Python code in parallel.
So why use threads at all? Because when a thread is waiting for I/O (network response, file read, database query), it releases the GIL. Other threads can run during that wait time. That’s why threads are great for I/O-bound work.
For CPU-heavy work (number crunching, image processing), threads don’t help because the GIL blocks parallel execution. That’s when we reach for multiprocessing — each process has its own GIL.
Threading Basics
import threading
import time
def download(url):
print(f"Downloading {url}...")
time.sleep(2) # simulating network I/O
print(f"Done: {url}")
# Create and start threads
t1 = threading.Thread(target=download, args=("page1.html",))
t2 = threading.Thread(target=download, args=("page2.html",))
t1.start()
t2.start()
# Wait for both to finish
t1.join()
t2.join()
print("All downloads complete") # takes ~2s, not ~4s
Multiprocessing Basics
import multiprocessing
def crunch_numbers(n):
return sum(i * i for i in range(n))
if __name__ == "__main__":
p1 = multiprocessing.Process(target=crunch_numbers, args=(10_000_000,))
p2 = multiprocessing.Process(target=crunch_numbers, args=(10_000_000,))
p1.start()
p2.start()
p1.join()
p2.join()
The if __name__ == "__main__" guard is required for multiprocessing on some platforms (especially Windows and macOS) to prevent infinite process spawning.
Sharing Data Between Processes
Since processes have separate memory, we use Queue or Pipe to communicate.
from multiprocessing import Process, Queue
def worker(q, data):
result = sum(data)
q.put(result) # send result back
if __name__ == "__main__":
q = Queue()
p = Process(target=worker, args=(q, [1, 2, 3, 4, 5]))
p.start()
result = q.get() # blocks until result is available
p.join()
print(result) # 15
When to Use Which
| Scenario | Use | Why |
|---|---|---|
| Downloading files | Threading | I/O-bound, threads release GIL during waits |
| API calls | Threading | Network I/O, same reason |
| Image processing | Multiprocessing | CPU-bound, needs true parallelism |
| Data crunching | Multiprocessing | CPU-bound, bypasses the GIL |
| Simple scripting | Neither | Keep it simple until we need speed |
In simple language, threads are like one chef switching between tasks in one kitchen. Processes are like multiple chefs, each with their own kitchen. Threads share everything (fast but tricky), processes are isolated (safe but heavier).