The concurrent.futures module is Python’s high-level, “batteries included” way to run tasks in parallel. Instead of manually creating threads or processes, we use executor pools that manage everything for us.
Think of it like a task queue with a pool of workers. We submit jobs, and the pool assigns them to available workers.
ThreadPoolExecutor
This creates a pool of threads. Perfect for I/O-bound tasks — downloading files, making API calls, reading from databases.
from concurrent.futures import ThreadPoolExecutor
import time
def download(url):
time.sleep(2) # simulating network I/O
return f"Downloaded {url}"
# Pool of 3 threads handling 5 tasks
with ThreadPoolExecutor(max_workers=3) as executor:
urls = ["page1", "page2", "page3", "page4", "page5"]
results = executor.map(download, urls)
for result in results:
print(result) # takes ~4s total (2 batches), not ~10s
The with statement ensures the pool shuts down cleanly when we’re done. No need to manually join threads.
ProcessPoolExecutor
Same API, but uses processes instead of threads. Perfect for CPU-bound work — number crunching, image processing, data transformation.
from concurrent.futures import ProcessPoolExecutor
def crunch(n):
return sum(i * i for i in range(n))
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=4) as executor:
numbers = [10_000_000, 20_000_000, 30_000_000]
results = executor.map(crunch, numbers)
for result in results:
print(result)
The only difference is we swap ThreadPoolExecutor for ProcessPoolExecutor. The rest of the code stays the same. That’s the beauty of this module.
submit() and Future Objects
map() is great for bulk operations, but submit() gives us more control. It returns a Future object — a promise that a result will be available later.
from concurrent.futures import ThreadPoolExecutor
def fetch(url):
import time
time.sleep(1)
return f"Data from {url}"
with ThreadPoolExecutor(max_workers=3) as executor:
future = executor.submit(fetch, "api.com/users")
# We can do other stuff here while it's running
print("Working on other things...")
# Now get the result (blocks until ready)
result = future.result()
print(result)
A Future has some handy methods:
result()— blocks and returns the result (or raises the exception)done()— returnsTrueif the task has finishedcancel()— tries to cancel the task (only works if it hasn’t started)exception()— returns the exception if one occurred
as_completed(): Results As They Arrive
By default, map() returns results in the order we submitted them. But what if we want results as soon as they’re ready? That’s what as_completed() does.
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def fetch(url, delay):
time.sleep(delay)
return f"{url} (took {delay}s)"
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(fetch, "fast.com", 1): "fast",
executor.submit(fetch, "slow.com", 3): "slow",
executor.submit(fetch, "medium.com", 2): "medium",
}
# Results arrive in completion order, not submission order
for future in as_completed(futures):
tag = futures[future]
print(f"{tag}: {future.result()}")
# Output: fast, medium, slow (fastest first)
Error Handling
When a task raises an exception, it gets stored in the Future. Calling result() re-raises it.
from concurrent.futures import ThreadPoolExecutor
def risky_task(n):
if n == 0:
raise ValueError("Can't process zero!")
return 100 / n
with ThreadPoolExecutor() as executor:
futures = [executor.submit(risky_task, n) for n in [5, 0, 10]]
for future in futures:
try:
print(future.result())
except ValueError as e:
print(f"Error: {e}")
When to Use This Over Raw threading/multiprocessing
- Use
concurrent.futureswhen we just need to parallelize a batch of similar tasks. It’s cleaner and handles the pool lifecycle for us. - Use raw
threadingwhen we need fine-grained control over threads (custom synchronization, daemon threads, etc.). - Use raw
multiprocessingwhen we need shared memory, custom IPC, or complex process management.
In simple language, concurrent.futures is the “I just want to run a bunch of things faster” module. We don’t need to think about thread management, pool cleanup, or synchronization. We submit tasks, get results.