Gyaan

Concurrent.futures

advanced concurrent futures thread-pool process-pool

The concurrent.futures module is Python’s high-level, “batteries included” way to run tasks in parallel. Instead of manually creating threads or processes, we use executor pools that manage everything for us.

Think of it like a task queue with a pool of workers. We submit jobs, and the pool assigns them to available workers.

ThreadPoolExecutor

This creates a pool of threads. Perfect for I/O-bound tasks — downloading files, making API calls, reading from databases.

from concurrent.futures import ThreadPoolExecutor
import time

def download(url):
    time.sleep(2)  # simulating network I/O
    return f"Downloaded {url}"

# Pool of 3 threads handling 5 tasks
with ThreadPoolExecutor(max_workers=3) as executor:
    urls = ["page1", "page2", "page3", "page4", "page5"]
    results = executor.map(download, urls)
    for result in results:
        print(result)  # takes ~4s total (2 batches), not ~10s

The with statement ensures the pool shuts down cleanly when we’re done. No need to manually join threads.

ProcessPoolExecutor

Same API, but uses processes instead of threads. Perfect for CPU-bound work — number crunching, image processing, data transformation.

from concurrent.futures import ProcessPoolExecutor

def crunch(n):
    return sum(i * i for i in range(n))

if __name__ == "__main__":
    with ProcessPoolExecutor(max_workers=4) as executor:
        numbers = [10_000_000, 20_000_000, 30_000_000]
        results = executor.map(crunch, numbers)
        for result in results:
            print(result)

The only difference is we swap ThreadPoolExecutor for ProcessPoolExecutor. The rest of the code stays the same. That’s the beauty of this module.

submit() and Future Objects

map() is great for bulk operations, but submit() gives us more control. It returns a Future object — a promise that a result will be available later.

from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    import time
    time.sleep(1)
    return f"Data from {url}"

with ThreadPoolExecutor(max_workers=3) as executor:
    future = executor.submit(fetch, "api.com/users")

    # We can do other stuff here while it's running
    print("Working on other things...")

    # Now get the result (blocks until ready)
    result = future.result()
    print(result)

A Future has some handy methods:

  • result() — blocks and returns the result (or raises the exception)
  • done() — returns True if the task has finished
  • cancel() — tries to cancel the task (only works if it hasn’t started)
  • exception() — returns the exception if one occurred

as_completed(): Results As They Arrive

By default, map() returns results in the order we submitted them. But what if we want results as soon as they’re ready? That’s what as_completed() does.

from concurrent.futures import ThreadPoolExecutor, as_completed
import time

def fetch(url, delay):
    time.sleep(delay)
    return f"{url} (took {delay}s)"

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {
        executor.submit(fetch, "fast.com", 1): "fast",
        executor.submit(fetch, "slow.com", 3): "slow",
        executor.submit(fetch, "medium.com", 2): "medium",
    }

    # Results arrive in completion order, not submission order
    for future in as_completed(futures):
        tag = futures[future]
        print(f"{tag}: {future.result()}")
    # Output: fast, medium, slow (fastest first)

Error Handling

When a task raises an exception, it gets stored in the Future. Calling result() re-raises it.

from concurrent.futures import ThreadPoolExecutor

def risky_task(n):
    if n == 0:
        raise ValueError("Can't process zero!")
    return 100 / n

with ThreadPoolExecutor() as executor:
    futures = [executor.submit(risky_task, n) for n in [5, 0, 10]]

    for future in futures:
        try:
            print(future.result())
        except ValueError as e:
            print(f"Error: {e}")

When to Use This Over Raw threading/multiprocessing

  • Use concurrent.futures when we just need to parallelize a batch of similar tasks. It’s cleaner and handles the pool lifecycle for us.
  • Use raw threading when we need fine-grained control over threads (custom synchronization, daemon threads, etc.).
  • Use raw multiprocessing when we need shared memory, custom IPC, or complex process management.

In simple language, concurrent.futures is the “I just want to run a bunch of things faster” module. We don’t need to think about thread management, pool cleanup, or synchronization. We submit tasks, get results.