Gyaan

Generators and Iterators

intermediate generators iterators yield lazy-evaluation

In simple language, a generator is a function that can pause and resume. Instead of returning all values at once, it yields them one at a time. This makes generators incredibly memory-efficient for large datasets.

But to understand generators, we need to start with iterators — the protocol they’re built on.

The Iterator Protocol

Any object in Python is an iterator if it implements two methods:

  • __iter__() — returns the iterator object itself
  • __next__() — returns the next value, raises StopIteration when done
# Under the hood, a for loop does this:
nums = [1, 2, 3]
it = iter(nums)       # calls nums.__iter__()
next(it)              # 1 — calls it.__next__()
next(it)              # 2
next(it)              # 3
next(it)              # raises StopIteration

Building an Iterator with a Class

We can create custom iterators, but it takes some boilerplate.

class Countdown:
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        val = self.current
        self.current -= 1
        return val

for n in Countdown(3):
    print(n)  # 3, 2, 1

That’s a lot of code for something simple. Generators make this much easier.

Generator Functions with yield

A generator function looks like a normal function but uses yield instead of return. Each time we call next(), it runs until the next yield, pauses, and gives us the value.

Generator Lifecycle
Created
gen = func()
next()
Running
executes code
yield
Suspended
paused, value sent
next()
Running
resumes
return/end
Completed
StopIteration
def countdown(n):
    while n > 0:
        yield n    # pause here, give n to the caller
        n -= 1     # resume here on next call

gen = countdown(3)
next(gen)  # 3
next(gen)  # 2
next(gen)  # 1
next(gen)  # StopIteration

# Or just use a for loop
for n in countdown(3):
    print(n)  # 3, 2, 1

Generator Expressions

Just like list comprehensions but with parentheses. They produce values lazily.

# List comprehension — builds entire list in memory
squares_list = [x ** 2 for x in range(1_000_000)]

# Generator expression — produces one value at a time
squares_gen = (x ** 2 for x in range(1_000_000))

# Perfect for passing to functions
sum(x ** 2 for x in range(1_000_000))  # no extra memory

Memory Benefits

This is the biggest win. A list of 10 million items takes megabytes of memory. A generator that yields 10 million items takes almost nothing.

# This creates a massive list in memory
big_list = [x for x in range(10_000_000)]

# This uses almost no memory
big_gen = (x for x in range(10_000_000))

send() and close()

We can send values back into a generator and close it early.

def accumulator():
    total = 0
    while True:
        value = yield total
        if value is None:
            break
        total += value

gen = accumulator()
next(gen)          # 0 — prime the generator
gen.send(10)       # 10
gen.send(20)       # 30
gen.close()        # stop the generator

yield from

When a generator needs to yield all values from another iterable, yield from is cleaner than a loop.

def chain(*iterables):
    for it in iterables:
        yield from it  # same as: for item in it: yield item

list(chain([1, 2], [3, 4], [5, 6]))
# [1, 2, 3, 4, 5, 6]

Infinite Sequences

Generators can produce values forever since they only compute on demand.

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Take first 10 fibonacci numbers
from itertools import islice
list(islice(fibonacci(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In simple language, generators are lazy functions. They do the minimum work possible, computing values only when asked. When we’re dealing with large data or infinite sequences, generators are the way to go.