Beautiful Iterators
I love how beautiful and clear Python’s syntax is compared to many other programming languages. Let’s take the humble for-in loop, for example. It speaks to Python’s beauty that you can read a Pythonic loop like this, as if it was an English sentence:
numbers = [1, 2, 3]
for n in numbers:
print(n)
But how do Python’s elegant loop constructs work behind the scenes? How does the loop fetch individual elements from the object it is looping over ?And, how can you support the same programming style in your own Python objects?
You’ll find the answers to these questions in Python’s iterator protocol: Objects that support the __iter__
and __next__
dunder methods automatically work with for-in loops.
But let’s take things step by step. Just like decorators, iterators and their related techniques can appear quite arcane and complicated on first glance. So, we’ll ease into them.
In this chapter you’ll see how to write several Python classes that support the iterator protocol. They’ll serve as “non-magical” examples and test implementations you can build upon and deepen your understanding with.
We’ll focus on the core mechanics of iterators in Python 3 first and leave out any unnecessary complications, so you can see clearly how iterators behave at the fundamental level.
I’ll tie each example back to the for-in loop question we started out with. And, at the end of this chapter we’ll go over some differences that exist between Python 2 and 3 when it comes to iterators.
Ready? Let’s jump right in!
Iterating Forever
We’ll begin by writing a class that demonstrates the bare-bones iterator protocol. The example I’m using here might look different from the examples you’ve seen in other iterator tutorials, but bear with me. I think doing it this way gives you a more applicable understanding of how iterators work in Python.
Over the next few paragraphs we’re going to implement a class called Repeater that can be iterated over with a for-in loop, like so:
repeater = Repeater('Hello')
for item in repeater:
print(item)
Like its name suggests, instances of this Repeater class will repeatedly return a single value when iterated over. So the above example code would forever print the string 'Hello' to the console.
这个类的实例在迭代时将重复返回单个值。
To start with the implementation, we’ll first define and flesh out the Repeater class:
class Repeater:
def __init__(self, value):
self.value = value
def __iter__(self):
return RepeaterIterator(self)
On first inspection, Repeater looks like a bog-standard Python class. But notice how it also includes the __iter__
dunder method.
What’s the RepeaterIterator object we’re creating and returning from __iter__
? It’s a helper class we also need to define for our for-in iteration example to work:
class RepeaterIterator:
def __init__(self, source):
self.source = source
def __next__(self):
return self.source.value
Again, RepeaterIterator looks like a straightforward Python class, but you might want to take note of the following two things:
- In the
__init__
method, we link each RepeaterIterator instance to the Repeater object that created it. That way we can hold onto the “source” object that’s being iterated over.
2.In
RepeaterIterator.__next__
, we reach back into the “source” Repeater instance and return the value associated with it.
In this code example, Repeater and RepeaterIterator are working together to support Python’s iterator protocol. The two dunder methods we defined, __iter__
and __next__
, are the keys to making a Python object iterable.
在上面代码示例中 Repeater和RepeaterIterator一起配合去支撑python的迭代协议。而iter和next是使得python对象可迭代的关键。
We’ll take a closer look at these two methods and how they work together after some hands-on experimentation with the code we’ve got so far.
Let’s confirm that this two-class setup really made Repeater objects compatible with for-in loop iteration. To do that we’ll first create an instance of Repeater that would return the string 'Hello' indefinitely:
>>> repeater = Repeater('Hello')
And now we’re going to try iterating over this repeater object with a for-in loop. What’s going to happen when you run the following code snippet?
>>> for item in repeater:
... print(item)
Right on! You’ll see 'Hello' printed to the screen…a lot. Repeater keeps on returning the same string value, and so, this loop will never complete. Our little program is doomed to forever print 'Hello' to the console:
Hello
Hello
Hello
Hello
Hello
...
But congratulations—you just wrote a working iterator in Python and used it with a for-in loop. The loop may not terminate yet…but so far, so good!
Next up, we’ll tease this example apart to understand how the __iter__
and __next__
methods work together to make a Python object iterable.
我们开始要拆分一下。
Pro tip: If you ran the last example inside a Python REPL session or from the terminal, and you want to stop it, hit Ctrl + C a few times to break out of the infinite loop.
How do for-in loops work in Python?
At this point we’ve got our Repeater class that apparently supports the iterator protocol, and we just ran a for-in loop to prove it:
repeater = Repeater('Hello')
for item in repeater:
print(item)
Now, what does this for-in loop really do behind the scenes? How does it communicate with the repeater object to fetch new elements from it?
To dispel some of that “magic,” we can expand this loop into a slightly longer code snippet that gives the same result:
repeater = Repeater('Hello')
iterator = repeater.__iter__()
while True:
item = iterator.__next__()
print(item)
As you can see, the for-in was just syntactic sugar for a simple while loop: It first prepared the repeater object for iteration by calling its __iter__
method. This returned the actual iterator object.
- After that, the loop repeatedly called the iterator object’s
__next__
method to retrieve values from it.
If you’ve ever worked with database cursors, this mental model will seem familiar: We first initialize the cursor and prepare it for reading, and then we can fetch data from it into local variables as needed, one element at a time.
作者在这里用数据库鼠标指针的例子进行了类比,每一次鼠标的点击我们就从数据库中取出来一个数据。
Because there’s never more than one element “in flight,” this approach is highly memory-efficient. Our Repeater class provides an infinite sequence of elements and we can iterate over it just fine. Emulating the same thing with a Python list would be impossible—there’s no way we could create a list with an infinite number of elements in the first place. This makes iterators a very powerful concept.
其实这里说的也比较清楚了,就是每一次取一个数,完事看看下一个有没有,如果有的话就继续,没有的话就完事了。这样就像在爬虫中有“下一页”的按钮一样,如有有下一页的按钮,那么我们就继续向后一页爬虫,如果没有的话就到此为止。不同的就是像列表一样,将所有的网页链接都获取到,然后从列表中去获取网页链接爬虫。但是不同的内容页数不同,相对应的列表的大小等就不一样了,这时候列表的方法就有了局限性,在实际的运行效率方面也就没有那么好了。
On more abstract terms, iterators provide a common interface that allows you to process every element of a container while being completely isolated from the container’s internal structure.
在更加抽象的层面上,迭代器提供了一种公共的接口,允许我们去运行容器中的每一个元素,而与容器内部的结构完全的隔离。
Whether you’re dealing with a list of elements, a dictionary, an infinite sequence like the one provided by our Repeater class, or another sequence type—all of that is just an implementation detail. Every single one of these objects can be traversed in the same way with the power of iterators.
这些序列化对象中的每一个都可以用迭代器的强大功能以相同的方式进行遍历。
And as you’ve seen, there’s nothing special about for-in loops in Python. If you peek behind the curtain, it all comes down to calling the right dunder methods at the right time.
In fact, you can manually “emulate” how the loop uses the iterator protocol in a Python interpreter session:
>>> repeater = Repeater('Hello')
>>> iterator = iter(repeater)
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
...
我们还可以手动操作迭代器的循环。
This gives the same result—an infinite stream of hellos. Every time you call next(), the iterator hands out the same greeting again.
By the way, I took the opportunity here to replace the calls to __iter__
and __next__
with calls to Python’s built-in functions, iter() and next().
Internally, these built-ins invoke the same dunder methods, but they make this code a little prettier and easier to read by providing a clean “facade” to the iterator protocol.
Python offers these facades for other functionality as well. For example, len(x) is a shortcut for calling x.__len__
. Similarly, calling iter(x) invokes x.__iter__
and calling next(x) invokes x.__next__
.
Generally, it’s a good idea to use the built-in facade functions rather than directly accessing the dunder methods implementing a protocol. It just makes the code a little easier to read.
A Simpler Iterator Class
Up until now, our iterator example consisted of two separate classes, Repeater and RepeaterIterator. They corresponded directly to the two phases used by Python’s iterator protocol:
First, setting up and retrieving the iterator object with an iter() call, and then repeatedly fetching values from it via next().
Many times both of these responsibilities can be shouldered by a single class. Doing this allows you to reduce the amount of code necessary to write a class-based iterator.
I chose not to do this with the first example in this chapter because it mixes up the cleanliness of the mental model behind the iterator protocol. But now that you’ve seen how to write a class-based iterator the longer and more complicated way, let’s take a minute to simplify what we’ve got so far.
Remember why we needed the RepeaterIterator class again? We needed it to host the __next__
method for fetching new values from the iterator. But it doesn’t really matter where __next__
is defined. In the iterator protocol, all that matters is that __iter__
returns any object with a __next__
method on it.
So here’s an idea: RepeaterIterator returns the same value over and over, and it doesn’t have to keep track of any internal state. What if we added the __next__
method directly to the Repeater class instead?
That way we could get rid of RepeaterIterator altogether and implement an iterable object with a single Python class. Let’s try it out! Our new and simplified iterator example looks as follows:
class Repeater:
def __init__(self, value):
self.value = value
def __iter__(self):
return self
def __next__(self):
return self.value
We just went from two separate classes and 10 lines of code to just one class and 7 lines of code. Our simplified implementation still supports the iterator protocol just fine:
>>> repeater = Repeater('Hello')
>>> for item in repeater:
... print(item)
Hello
Hello
Hello
...
Streamlining a class-based iterator like that often makes sense. In fact, most Python iterator tutorials start out that way. But I always felt that explaining iterators with a single class from the get-go hides the underlying principles of the iterator protocol—and thus makes it more difficult to understand.
Who Wants to Iterate Forever
At this point you should have a pretty good understanding of how iterators work in Python. But so far we’ve only implemented iterators that keep on iterating forever.
Clearly, infinite repetition isn’t the main use case for iterators in Python. In fact, when you look back all the way to the beginning of this chapter, I used the following snippet as a motivating example:
numbers = [1, 2, 3]
for n in numbers:
print(n)
You’ll rightfully expect this code to print the numbers 1, 2, and 3 and then stop. And you probably wouldn’t expect it to go on spamming your terminal window by printing “3” forever until you mash Ctrl+C a few times in a wild panic…
And so, it’s time to find out how to write an iterator that eventually stops generating new values instead of iterating forever because that’s what Python objects typically do when we use them in a for-in loop.
是时候给迭代器添加一个终止条件了。
We’ll now write another iterator class that we’ll call BoundedRepeater. It’ll be similar to our previous Repeater example, but this time we’ll want it to stop after a predefined number of repetitions.
这个例子就是要写一个终止条件的例子了。
Let’s think about this for a bit. How do we do this? How does an iterator signal that it’s exhausted and out of elements to iterate over? Maybe you’re thinking, “Hmm, we could just return None from the __next__
method.”
And that’s not a bad idea—but the trouble is, what are we going to do if we want some iterators to be able to return None as an acceptable value?
Let’s see what other Python iterators do to solve this problem. I’m going to construct a simple container, a list with a few elements, and then I’ll iterate over it until it runs out of elements to see what happens:
>>> my_list = [1, 2, 3]
>>> iterator = iter(my_list)
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
Careful now! We’ve consumed all of the three available elements in the list. Watch what happens if I call next on the iterator again:
>>> next(iterator)
StopIteration
Aha! It raises a StopIteration exception to signal we’ve exhausted all of the available values in the iterator.
That’s right: Iterators use exceptions to structure control flow. To signal the end of iteration, a Python iterator simply raises the built-in
StopIteration exception.
If I keep requesting more values from the iterator, it’ll keep raising StopIteration exceptions to signal that there are no more values available to iterate over:
>>> next(iterator)
StopIteration
>>> next(iterator)
StopIteration
...
如果没有数了还持续的next,这样会持续的抛出错误。
Python iterators normally can’t be “reset”—once they’re exhausted they’re supposed to raise StopIteration every time next() is called on them. To iterate a new you’ll need to request a fresh iterator object with the iter() function.
Now we know everything we need to write our BoundedRepeater class that stops iterating after a set number of repetitions:
class BoundedRepeater:
def __init__(self, value, max_repeats):
self.value = value
self.max_repeats = max_repeats
self.count = 0
def __iter__(self):
return self
def __next__(self):
if self.count >= self.max_repeats:
raise StopIteration
self.count += 1
return self.value
This gives us the desired result. Iteration stops after the number of repetitions defined in the max_repeats parameter:
>>> repeater = BoundedRepeater('Hello', 3)
>>> for item in repeater:
print(item)
Hello
Hello
Hello
If we rewrite this last for-in loop example to take away some of the syntactic sugar, we end up with the following expanded code snippet:
repeater = BoundedRepeater('Hello', 3)
iterator = iter(repeater)
while True:
try:
item = next(iterator)
except StopIteration:
break
print(item)
Every time next() is called in this loop, we check for a StopIteration exception and break the while loop if necessary.
Being able to write a three-line for-in loop instead of an eight-line while loop is quite a nice improvement. It makes the code easier to read and more maintainable. And this is another reason why iterators in Python are such a powerful tool.
Python 2.x Compatibility
All the code examples I showed here were written in Python 3. There’s a small but important difference between Python 2 and 3 when it comes to implementing class-based iterators:
- In Python 3, the method that retrieves the next value from an iterator is called
__next__
. - In Python 2, the same method is called next (no underscores).
在python2中next没有下划线。
This naming difference can lead to some trouble if you’re trying to write class-based iterators that should work on both versions of Python. Luckily, there’s a simple approach you can take to work around this difference.
Here’s an updated version of the InfiniteRepeater class that will work on both Python 2 and Python 3:
class InfiniteRepeater(object):
def __init__(self, value):
self.value = value
def __iter__(self):
return self
def __next__(self):
return self.value
# Python 2 compatibility:
def next(self):
return self.__next__()
To make this iterator class compatible with Python 2, I’ve made two small changes to it:
First, I added a next method that simply calls the original __next__
and forwards its return value. This essentially creates an alias for the existing __next__
implementation so that Python 2 finds it. That way we can support both versions of Python while still keeping all of the actual implementation details in one place.
And second, I modified the class definition to inherit from object in order to ensure we’re creating a new-style class on Python 2. This has nothing to do with iterators specifically, but it’s a good practice nonetheless.
Key Takeaways
- Iterators provide a sequence interface to Python objects that’s memory efficient and considered Pythonic. Behold the beauty of the for-in loop!
- To support iteration an object needs to implement the iterator protocol by providing the
__iter__
and__next__
dunder methods. - Class-based iterators are only one way to write iterable objects in Python. Also consider generators and generator expressions.