Item 8: Use zip to Process Iterators in Parallel

Often in Python you find yourself with many lists of related objects. List comprehensions make it easy to take a source list and get a derived list by applying an expression (see Item 27: “Use Comprehensions Instead of map and filter”):

在Python中，您经常会发现自己有许多相关对象的列表。列表推导式使用一个表达式可以很容易通过一个源列表获取一个派生列表(参见第27条:使用推导式而不是映射和过滤)。

names = ['Cecilia', 'Lise', 'Marie']
counts = [len(n) for n in names]
print(counts)

>>>
[7, 4, 5]

The items in the derived list are related to the items in the source list by their indexes. To iterate over both lists in parallel, I can iterate over the length of the names source list:

派生列表中的元素通过索引与源列表中的元素关联起来。为了同时迭代两个列表，我可以遍历源列表names 的长度：

longest_name = None
max_count = 0

for i in range(len(names)):
    count = counts[i]
    if count > max_count:
        longest_name = names[i]
        max_count = count
print(longest_name)

>>>
Cecilia

The problem is that this whole loop statement is visually noisy. The indexes into names and counts make the code hard to read. Indexing into the arrays by the loop index i happens twice. Using enumerate (see Item 7: “Prefer enumerate Over range”) improves this slightly, but it’s still not ideal:

问题是整个循环语句在视觉上很混乱。在names 和 counts 里使用索引使代码难以阅读。在多个列表中使用了两次循环索引i 。使用 enumerate （见第7项:“使用 enumerate 而不是 range”）会稍微提升一些可阅读性，但仍然不是很理想：

longest_name = None
max_count = 0

for i, name in enumerate(names):
    count = counts[i]
    if count > max_count:
        longest_name = name
        max_count = count

To make this code clearer, Python provides the zip built-in function. zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator. These tuples can be unpacked directly within a for statement (see Item 6: “Prefer Multiple Assignment Unpacking Over Indexing”). The resulting code is much cleaner than the code for indexing into multiple lists:

为了使代码更清晰，python提供了内置函数 zip。zip 用一个惰性生成器包装了两个或更多迭代器。zip生成器生成元组列表，各元组中的值由每个可迭代对象的下一个值组成。元组列表可以直接在for语句内部进行解包（见第6项：使用多赋值解包而不是用索引）

for name, count in zip(names, counts):
    if count > max_count:
        longest_name = name
        max_count = count

zip consumes the iterators it wraps one item at a time, which means it can be used with infinitely long inputs without risk of a program using too much memory and crashing.

zip使用传入的迭代器进行了一次包装，这意味着它可以被无限长的输入，而不会让程序有使用太多内存和崩溃的风险。

However, beware of zip’s behavior when the input iterators are of different lengths. For example, say that I add another item to names above but forget to update counts. Running zip on the two input lists will have an unexpected result:

但是，当传入的迭代器长度不一致时，要小心 zip 的行为。比如，假设我添加了一个元素给 names , 但忘记了更新 counts。在这两个传入的列表上使用zip将会得到意想不到的结果：

names.append('Rosalind')
for name, count in zip(names, counts):
    print(name)

>>>
Cecilia
Lise
Marie

The new item for 'Rosalind' isn’t there. Why not? This is just how zip works. It keeps yielding tuples until any one of the wrapped iterators is exhausted. Its output is as long as its shortest input. This approach works fine when you know that the iterators are of the same length, which is often the case for derived lists created by list comprehensions.

新添加的Rosalind 元素并没有输出。为什么会这些呢？这就是zip的工作原理。它会持续生成元组，直到耗尽任何一个迭代对象为止。它输出的长度取决于最短迭代器的长度。当你知道迭代器长度相等时，这种方式可以很好的工作，这通常就是列表推导式生成衍生列表的情况。

But in many other cases, the truncating behavior of zip is surprising and bad. If you don’t expect the lengths of the lists passed to zip to be equal, consider using the zip_longest function from the itertools built-in module instead:

但是在许多其他情况下，zip的截断行为是令人惊讶和糟糕的。如果你不打算给zip传入长度相等的列表，可以考虑使用intertools内置模块下的 zip_longest 函数：

import itertools
for name, count in itertools.zip_longest(names, counts):
    print(f'{name}: {count}')
    
>>>
Cecilia: 7 Lise: 4
Marie: 5
Rosalind: None

zip_longest replaces missing values—the length of the string 'Rosalind' in this case—with whatever fillvalue is passed to it, which defaults to None.

zip_longest 使用传递给它的fillvalue替换缺失值，默认值是None, 在本例中的字符串Rosalind的长度就是缺失值。

**Things to Remember **
要记住的事

✦ The zip built-in function can be used to iterate over multiple iterators in parallel.
✦ zip creates a lazy generator that produces tuples, so it can be used on infinitely long inputs.
✦ zip truncates its output silently to the shortest iterator if you supply it with iterators of different lengths.
✦ Use the zip_longest function from the itertools built-in module if you want to use zip on iterators of unequal lengths without truncation.

✦ zip内置函数可用于并行遍历多个迭代器。
✦ zip创建了一个生成元组的惰性生成器，因此它可以用于无限长的输入。
✦ 如果你给zip提供不同长度的迭代器，它将按最短的迭代器的长度截断输出。
✦ 如果你想对长度不相等的迭代器使用zip而不进行截断，请使用itertools内置模块中的zip_longest函数。

Item 8: Use zip to Process Iterators in Parallel

推荐阅读更多精彩内容