In simple language, a generator is a function that can pause and resume. Instead of returning all values at once, it yields them one at a time. This makes generators incredibly memory-efficient for large datasets.
But to understand generators, we need to start with iterators — the protocol they’re built on.
The Iterator Protocol
Any object in Python is an iterator if it implements two methods:
__iter__()— returns the iterator object itself__next__()— returns the next value, raisesStopIterationwhen done
# Under the hood, a for loop does this:
nums = [1, 2, 3]
it = iter(nums) # calls nums.__iter__()
next(it) # 1 — calls it.__next__()
next(it) # 2
next(it) # 3
next(it) # raises StopIteration
Building an Iterator with a Class
We can create custom iterators, but it takes some boilerplate.
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
val = self.current
self.current -= 1
return val
for n in Countdown(3):
print(n) # 3, 2, 1
That’s a lot of code for something simple. Generators make this much easier.
Generator Functions with yield
A generator function looks like a normal function but uses yield instead of return. Each time we call next(), it runs until the next yield, pauses, and gives us the value.
def countdown(n):
while n > 0:
yield n # pause here, give n to the caller
n -= 1 # resume here on next call
gen = countdown(3)
next(gen) # 3
next(gen) # 2
next(gen) # 1
next(gen) # StopIteration
# Or just use a for loop
for n in countdown(3):
print(n) # 3, 2, 1
Generator Expressions
Just like list comprehensions but with parentheses. They produce values lazily.
# List comprehension — builds entire list in memory
squares_list = [x ** 2 for x in range(1_000_000)]
# Generator expression — produces one value at a time
squares_gen = (x ** 2 for x in range(1_000_000))
# Perfect for passing to functions
sum(x ** 2 for x in range(1_000_000)) # no extra memory
Memory Benefits
This is the biggest win. A list of 10 million items takes megabytes of memory. A generator that yields 10 million items takes almost nothing.
# This creates a massive list in memory
big_list = [x for x in range(10_000_000)]
# This uses almost no memory
big_gen = (x for x in range(10_000_000))
send() and close()
We can send values back into a generator and close it early.
def accumulator():
total = 0
while True:
value = yield total
if value is None:
break
total += value
gen = accumulator()
next(gen) # 0 — prime the generator
gen.send(10) # 10
gen.send(20) # 30
gen.close() # stop the generator
yield from
When a generator needs to yield all values from another iterable, yield from is cleaner than a loop.
def chain(*iterables):
for it in iterables:
yield from it # same as: for item in it: yield item
list(chain([1, 2], [3, 4], [5, 6]))
# [1, 2, 3, 4, 5, 6]
Infinite Sequences
Generators can produce values forever since they only compute on demand.
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take first 10 fibonacci numbers
from itertools import islice
list(islice(fibonacci(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
In simple language, generators are lazy functions. They do the minimum work possible, computing values only when asked. When we’re dealing with large data or infinite sequences, generators are the way to go.