Python Tricks - Looping & Iteration(3)

Beautiful Iterators

I love how beautiful and clear Python’s syntax is compared to many other programming languages. Let’s take the humble for-in loop, for example. It speaks to Python’s beauty that you can read a Pythonic loop like this, as if it was an English sentence:

numbers = [1, 2, 3]
for n in numbers:
  print(n)

But how do Python’s elegant loop constructs work behind the scenes? How does the loop fetch individual elements from the object it is looping over ?And, how can you support the same programming style in your own Python objects?

You’ll find the answers to these questions in Python’s iterator protocol: Objects that support the __iter__ and __next__ dunder methods automatically work with for-in loops.

But let’s take things step by step. Just like decorators, iterators and their related techniques can appear quite arcane and complicated on first glance. So, we’ll ease into them.

In this chapter you’ll see how to write several Python classes that support the iterator protocol. They’ll serve as “non-magical” examples and test implementations you can build upon and deepen your understanding with.

We’ll focus on the core mechanics of iterators in Python 3 first and leave out any unnecessary complications, so you can see clearly how iterators behave at the fundamental level.

I’ll tie each example back to the for-in loop question we started out with. And, at the end of this chapter we’ll go over some differences that exist between Python 2 and 3 when it comes to iterators.

Ready? Let’s jump right in!

Iterating Forever

We’ll begin by writing a class that demonstrates the bare-bones iterator protocol. The example I’m using here might look different from the examples you’ve seen in other iterator tutorials, but bear with me. I think doing it this way gives you a more applicable understanding of how iterators work in Python.

Over the next few paragraphs we’re going to implement a class called Repeater that can be iterated over with a for-in loop, like so:

repeater = Repeater('Hello')
for item in repeater:
  print(item)

Like its name suggests, instances of this Repeater class will repeatedly return a single value when iterated over. So the above example code would forever print the string 'Hello' to the console.
这个类的实例在迭代时将重复返回单个值。

To start with the implementation, we’ll first define and flesh out the Repeater class:

class Repeater:
  def __init__(self, value):
    self.value = value
  
  def __iter__(self):
    return RepeaterIterator(self)

On first inspection, Repeater looks like a bog-standard Python class. But notice how it also includes the __iter__ dunder method.

What’s the RepeaterIterator object we’re creating and returning from __iter__? It’s a helper class we also need to define for our for-in iteration example to work:

class RepeaterIterator:
  def __init__(self, source):
    self.source = source

  def __next__(self):
    return self.source.value

Again, RepeaterIterator looks like a straightforward Python class, but you might want to take note of the following two things:

  1. In the __init__ method, we link each RepeaterIterator instance to the Repeater object that created it. That way we can hold onto the “source” object that’s being iterated over.

2.In RepeaterIterator.__next__, we reach back into the “source” Repeater instance and return the value associated with it.

In this code example, Repeater and RepeaterIterator are working together to support Python’s iterator protocol. The two dunder methods we defined, __iter__ and __next__, are the keys to making a Python object iterable.
在上面代码示例中 Repeater和RepeaterIterator一起配合去支撑python的迭代协议。而iter和next是使得python对象可迭代的关键。

We’ll take a closer look at these two methods and how they work together after some hands-on experimentation with the code we’ve got so far.

Let’s confirm that this two-class setup really made Repeater objects compatible with for-in loop iteration. To do that we’ll first create an instance of Repeater that would return the string 'Hello' indefinitely:

>>> repeater = Repeater('Hello')

And now we’re going to try iterating over this repeater object with a for-in loop. What’s going to happen when you run the following code snippet?

>>> for item in repeater:
...   print(item)

Right on! You’ll see 'Hello' printed to the screen…a lot. Repeater keeps on returning the same string value, and so, this loop will never complete. Our little program is doomed to forever print 'Hello' to the console:

Hello
Hello
Hello
Hello
Hello
...

But congratulations—you just wrote a working iterator in Python and used it with a for-in loop. The loop may not terminate yet…but so far, so good!

Next up, we’ll tease this example apart to understand how the __iter__ and __next__ methods work together to make a Python object iterable.
我们开始要拆分一下。

Pro tip: If you ran the last example inside a Python REPL session or from the terminal, and you want to stop it, hit Ctrl + C a few times to break out of the infinite loop.

How do for-in loops work in Python?

At this point we’ve got our Repeater class that apparently supports the iterator protocol, and we just ran a for-in loop to prove it:

repeater = Repeater('Hello')
for item in repeater:
  print(item)

Now, what does this for-in loop really do behind the scenes? How does it communicate with the repeater object to fetch new elements from it?

To dispel some of that “magic,” we can expand this loop into a slightly longer code snippet that gives the same result:

repeater = Repeater('Hello')
iterator = repeater.__iter__()
while True:
  item = iterator.__next__()
  print(item)

As you can see, the for-in was just syntactic sugar for a simple while loop: It first prepared the repeater object for iteration by calling its __iter__ method. This returned the actual iterator object.

  • After that, the loop repeatedly called the iterator object’s __next__ method to retrieve values from it.

If you’ve ever worked with database cursors, this mental model will seem familiar: We first initialize the cursor and prepare it for reading, and then we can fetch data from it into local variables as needed, one element at a time.
作者在这里用数据库鼠标指针的例子进行了类比,每一次鼠标的点击我们就从数据库中取出来一个数据。

Because there’s never more than one element “in flight,” this approach is highly memory-efficient. Our Repeater class provides an infinite sequence of elements and we can iterate over it just fine. Emulating the same thing with a Python list would be impossible—there’s no way we could create a list with an infinite number of elements in the first place. This makes iterators a very powerful concept.
其实这里说的也比较清楚了,就是每一次取一个数,完事看看下一个有没有,如果有的话就继续,没有的话就完事了。这样就像在爬虫中有“下一页”的按钮一样,如有有下一页的按钮,那么我们就继续向后一页爬虫,如果没有的话就到此为止。不同的就是像列表一样,将所有的网页链接都获取到,然后从列表中去获取网页链接爬虫。但是不同的内容页数不同,相对应的列表的大小等就不一样了,这时候列表的方法就有了局限性,在实际的运行效率方面也就没有那么好了。

On more abstract terms, iterators provide a common interface that allows you to process every element of a container while being completely isolated from the container’s internal structure.
在更加抽象的层面上,迭代器提供了一种公共的接口,允许我们去运行容器中的每一个元素,而与容器内部的结构完全的隔离。

Whether you’re dealing with a list of elements, a dictionary, an infinite sequence like the one provided by our Repeater class, or another sequence type—all of that is just an implementation detail. Every single one of these objects can be traversed in the same way with the power of iterators.
这些序列化对象中的每一个都可以用迭代器的强大功能以相同的方式进行遍历。

And as you’ve seen, there’s nothing special about for-in loops in Python. If you peek behind the curtain, it all comes down to calling the right dunder methods at the right time.

In fact, you can manually “emulate” how the loop uses the iterator protocol in a Python interpreter session:

>>> repeater = Repeater('Hello')
>>> iterator = iter(repeater)
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
>>> next(iterator)
'Hello'
...

我们还可以手动操作迭代器的循环。

This gives the same result—an infinite stream of hellos. Every time you call next(), the iterator hands out the same greeting again.

By the way, I took the opportunity here to replace the calls to __iter__ and __next__ with calls to Python’s built-in functions, iter() and next().

Internally, these built-ins invoke the same dunder methods, but they make this code a little prettier and easier to read by providing a clean “facade” to the iterator protocol.

Python offers these facades for other functionality as well. For example, len(x) is a shortcut for calling x.__len__. Similarly, calling iter(x) invokes x.__iter__ and calling next(x) invokes x.__next__.

Generally, it’s a good idea to use the built-in facade functions rather than directly accessing the dunder methods implementing a protocol. It just makes the code a little easier to read.

A Simpler Iterator Class

Up until now, our iterator example consisted of two separate classes, Repeater and RepeaterIterator. They corresponded directly to the two phases used by Python’s iterator protocol:

First, setting up and retrieving the iterator object with an iter() call, and then repeatedly fetching values from it via next().

Many times both of these responsibilities can be shouldered by a single class. Doing this allows you to reduce the amount of code necessary to write a class-based iterator.

I chose not to do this with the first example in this chapter because it mixes up the cleanliness of the mental model behind the iterator protocol. But now that you’ve seen how to write a class-based iterator the longer and more complicated way, let’s take a minute to simplify what we’ve got so far.

Remember why we needed the RepeaterIterator class again? We needed it to host the __next__ method for fetching new values from the iterator. But it doesn’t really matter where __next__ is defined. In the iterator protocol, all that matters is that __iter__ returns any object with a __next__ method on it.

So here’s an idea: RepeaterIterator returns the same value over and over, and it doesn’t have to keep track of any internal state. What if we added the __next__ method directly to the Repeater class instead?

That way we could get rid of RepeaterIterator altogether and implement an iterable object with a single Python class. Let’s try it out! Our new and simplified iterator example looks as follows:

class Repeater:
  def __init__(self, value):
    self.value = value

  def __iter__(self):
    return self

  def __next__(self):
    return self.value

We just went from two separate classes and 10 lines of code to just one class and 7 lines of code. Our simplified implementation still supports the iterator protocol just fine:

>>> repeater = Repeater('Hello')
>>> for item in repeater:
...   print(item)

Hello
Hello
Hello
...

Streamlining a class-based iterator like that often makes sense. In fact, most Python iterator tutorials start out that way. But I always felt that explaining iterators with a single class from the get-go hides the underlying principles of the iterator protocol—and thus makes it more difficult to understand.

Who Wants to Iterate Forever

At this point you should have a pretty good understanding of how iterators work in Python. But so far we’ve only implemented iterators that keep on iterating forever.

Clearly, infinite repetition isn’t the main use case for iterators in Python. In fact, when you look back all the way to the beginning of this chapter, I used the following snippet as a motivating example:

numbers = [1, 2, 3]
for n in numbers:
  print(n)

You’ll rightfully expect this code to print the numbers 1, 2, and 3 and then stop. And you probably wouldn’t expect it to go on spamming your terminal window by printing “3” forever until you mash Ctrl+C a few times in a wild panic…

And so, it’s time to find out how to write an iterator that eventually stops generating new values instead of iterating forever because that’s what Python objects typically do when we use them in a for-in loop.
是时候给迭代器添加一个终止条件了。

We’ll now write another iterator class that we’ll call BoundedRepeater. It’ll be similar to our previous Repeater example, but this time we’ll want it to stop after a predefined number of repetitions.
这个例子就是要写一个终止条件的例子了。

Let’s think about this for a bit. How do we do this? How does an iterator signal that it’s exhausted and out of elements to iterate over? Maybe you’re thinking, “Hmm, we could just return None from the __next__ method.”

And that’s not a bad idea—but the trouble is, what are we going to do if we want some iterators to be able to return None as an acceptable value?

Let’s see what other Python iterators do to solve this problem. I’m going to construct a simple container, a list with a few elements, and then I’ll iterate over it until it runs out of elements to see what happens:

>>> my_list = [1, 2, 3]
>>> iterator = iter(my_list)
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3

Careful now! We’ve consumed all of the three available elements in the list. Watch what happens if I call next on the iterator again:

>>> next(iterator)
StopIteration

Aha! It raises a StopIteration exception to signal we’ve exhausted all of the available values in the iterator.

That’s right: Iterators use exceptions to structure control flow. To signal the end of iteration, a Python iterator simply raises the built-in
StopIteration exception.

If I keep requesting more values from the iterator, it’ll keep raising StopIteration exceptions to signal that there are no more values available to iterate over:

>>> next(iterator)
StopIteration
>>> next(iterator)
StopIteration
...

如果没有数了还持续的next,这样会持续的抛出错误。

Python iterators normally can’t be “reset”—once they’re exhausted they’re supposed to raise StopIteration every time next() is called on them. To iterate a new you’ll need to request a fresh iterator object with the iter() function.

Now we know everything we need to write our BoundedRepeater class that stops iterating after a set number of repetitions:

class BoundedRepeater:
  def __init__(self, value, max_repeats):
    self.value = value
    self.max_repeats = max_repeats
    self.count = 0

  def __iter__(self):
    return self

  def __next__(self):
    if self.count >= self.max_repeats:
      raise StopIteration
    self.count += 1
    return self.value

This gives us the desired result. Iteration stops after the number of repetitions defined in the max_repeats parameter:

>>> repeater = BoundedRepeater('Hello', 3)
>>> for item in repeater:
      print(item)

Hello
Hello
Hello

If we rewrite this last for-in loop example to take away some of the syntactic sugar, we end up with the following expanded code snippet:

repeater = BoundedRepeater('Hello', 3)
iterator = iter(repeater)
while True:
  try:
    item = next(iterator)
  except StopIteration:
    break
  print(item)

Every time next() is called in this loop, we check for a StopIteration exception and break the while loop if necessary.

Being able to write a three-line for-in loop instead of an eight-line while loop is quite a nice improvement. It makes the code easier to read and more maintainable. And this is another reason why iterators in Python are such a powerful tool.

Python 2.x Compatibility

All the code examples I showed here were written in Python 3. There’s a small but important difference between Python 2 and 3 when it comes to implementing class-based iterators:

  • In Python 3, the method that retrieves the next value from an iterator is called __next__.
  • In Python 2, the same method is called next (no underscores).
    在python2中next没有下划线。

This naming difference can lead to some trouble if you’re trying to write class-based iterators that should work on both versions of Python. Luckily, there’s a simple approach you can take to work around this difference.

Here’s an updated version of the InfiniteRepeater class that will work on both Python 2 and Python 3:

class InfiniteRepeater(object):
  def __init__(self, value):
    self.value = value

  def __iter__(self):
    return self

  def __next__(self):
    return self.value

# Python 2 compatibility:
  def next(self):
    return self.__next__()

To make this iterator class compatible with Python 2, I’ve made two small changes to it:

First, I added a next method that simply calls the original __next__ and forwards its return value. This essentially creates an alias for the existing __next__ implementation so that Python 2 finds it. That way we can support both versions of Python while still keeping all of the actual implementation details in one place.

And second, I modified the class definition to inherit from object in order to ensure we’re creating a new-style class on Python 2. This has nothing to do with iterators specifically, but it’s a good practice nonetheless.

Key Takeaways
  • Iterators provide a sequence interface to Python objects that’s memory efficient and considered Pythonic. Behold the beauty of the for-in loop!
  • To support iteration an object needs to implement the iterator protocol by providing the __iter__ and __next__ dunder methods.
  • Class-based iterators are only one way to write iterable objects in Python. Also consider generators and generator expressions.
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,084评论 6 503
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,623评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,450评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,322评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,370评论 6 390
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,274评论 1 300
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,126评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,980评论 0 275
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,414评论 1 313
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,599评论 3 334
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,773评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,470评论 5 344
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,080评论 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,713评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,852评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,865评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,689评论 2 354

推荐阅读更多精彩内容