Item 15: Be Cautious When Relying on dict Insertion Ordering

In Python 3.5 and before, iterating over a dict would return keys in arbitrary order. The order of iteration would not match the order in which the items were inserted. For example, here I create a dictionary mapping animal names to their corresponding baby names and then print it out (see Item 75: “Use repr Strings for Debugging Output” for how this works):

在Python 3.5及之前版本中,对dict进行迭代将返回任意顺序的键值对。迭代顺序与它插入字典的顺序不匹配。例如,这里我创建了一个字典,将动物名称与它们对应的婴儿期名称进行映射,然后将其打印出来(参见Item 75:“Use repr Strings For Debugging Output”了解其工作原理。):

# Python 3.5
baby_names = {
    'cat ': 'kitten ',
    'dog ': 'puppy ',
}
print (baby_names)

>>>
{ 'dog ': 'puppy ', 'cat ': 'kitten '}

When I created the dictionary the keys were in the order 'cat ', 'dog ', but when I printed it the keys were in the reverse order 'dog ', 'cat '. This behavior is surprising, makes it harder to reproduce test cases, increases the difficulty of debugging, and is especially confusing to newcomers to Python.

当我创建字典时,键的顺序是‘cat’,‘dog’,但打印它时,键的顺序是相反的‘dog’,‘cat’。这种行为是令人惊讶的,它使重现测试用例变得更加困难,增加了调试的难度,尤其对Python新手来说是令人困惑。

This happened because the dictionary type previously implemented its hash table algorithm with a combination of the hash built-in function and a random seed that was assigned when the Python interpreter started. Together, these behaviors caused dictionary orderings to not match insertion order and to randomly shuffle between program executions.

之所以会出现这种情况,是因为之前的字典类型实现了它的哈希表算法,它结合了哈希内置函数和Python解释器启动时分配的随机种子。总之,这些行为导致字典顺序与插入顺序不匹配,并在程序执行时随机打乱顺序。

Starting with Python 3.6, and officially part of the Python specification in version 3.7, dictionaries will preserve insertion order. Now, this code will always print the dictionary in the same way it was originally created by the programmer:

从Python 3.6开始,字典保留了插入顺序,并在3.7版正式成为Python规范的一部分。现在,这段代码将始终以程序员最初创建的方式打印字典:

baby_names = {
    'cat ': 'kitten ',
    'dog ': 'puppy ',
}
print (baby_names)

>>>
{ 'cat ': 'kitten ', 'dog ': 'puppy '}

With Python 3.5 and earlier, all methods provided by dict that relied on iteration order, including keys, values, items, and popitem, would similarly demonstrate this random-looking behavior:

在Python 3.5及更早版本中,dict提供的所有依赖于迭代顺序的方法,包括keys、values、items和popitem,都类似地证明了这种随机的行为:

# Python 3.5
print (list (baby_names.keys ()))
print (list (baby_names.values ()))
print (list (baby_names.items ()))
print (baby_names.popitem ())  # Randomly chooses an item

>>>
[ 'dog ', 'cat ']
[ 'puppy ', 'kitten ']
[ ( 'dog ', 'puppy '), ( 'cat ', 'kitten ')]
( 'dog ', 'puppy ')

These methods now provide consistent insertion ordering that you can rely on when you write your programs:

这些方法现在提供了一致的插入顺序,你可以在编写程序时依赖这些顺序:

print (list (baby_names.keys ()))
print (list (baby_names.values ()))
print (list (baby_names.items ()))
print (baby_names.popitem ()) # Last item inserted

>>>
[ 'cat ', 'dog ']
[ 'kitten ', 'puppy ']
[ ( 'cat ', 'kitten '), ( 'dog ', 'puppy ')]
( 'dog ', 'puppy ')

There are many repercussions of this change on other Python features that are dependent on the dict type and its specific implementation.

此更改对依赖于dict类型及其特定实现的其他Python特性有许多影响。

Keyword arguments to functions—including the **kwargs catch-all parameter; see Item 23: “Provide Optional Behavior with Keyword Arguments”—previously would come through in seemingly random order, which can make it harder to debug function calls:

函数的关键字参数——包括**kwargs(参见第23项:“Provide Optional Behavior with Keyword Arguments”)——在Python3.5及更早版本中会以随机顺序出现,这使调试函数调用更加困难:

# Python 3.5
def my_func (**kwargs):
    for key, value in kwargs.items ():
        print ( '%s = %s ' % (key, value))

my_func (goose= 'gosling ', kangaroo= 'joey ')

>>>
kangaroo = joey
goose = gosling

Now, the order of keyword arguments is always preserved to match how the programmer originally called the function:

现在,关键字参数的顺序总是被保留,与程序员最初调用函数的方式相匹配:

def my_func (**kwargs):
    for key, value in kwargs.items ():
        print (f '{key} = {value} ')

my_func (goose= 'gosling ', kangaroo= 'joey ')

>>>
goose = gosling
kangaroo = joey

Classes also use the dict type for their instance dictionaries. In previous versions of Python, object fields would show the randomizing behavior:

类实例的__dict__属性也是dict类型,在以前的Python版本中,实例对象属性会随机排序:

# Python 3.5
class MyClass:
    def __init__ (self):
        self.alligator = 'hatchling '
        self.elephant = 'calf '

a = MyClass ()
for key, value in a.__dict__.items ():
    print ( '%s = %s ' % (key, value))

>>>
elephant = calf
alligator = hatchling

Again, you can now assume that the order of assignment for these instance fields will be reflected in __dict__:

同样,你现在可以确信这些实例属性的赋值顺序将在__dict__中一一映射出来:

class MyClass:
    def __init__ (self):
    self.alligator = 'hatchling '
    self.elephant = 'calf '

a = MyClass ()
for key, value in a.__dict__.items ():
    print (f '{key} = {value} ')

>>>
alligator = hatchling
elephant = calf

The way that dictionaries preserve insertion ordering is now part of the Python language specification. For the language features above, you can rely on this behavior and even make it part of the APIs you design for your classes and functions.

字典保留插入顺序的特性现在是Python语言规范的一部分。您可以依赖此特性,可以在设计类和函数时,将其作为api的一部分。

Note
注意

For a long time the collections built-in module has had an OrderedDict class that preserves insertion ordering. Although this class’s behavior is similar to that of the standard dict type (since Python 3.7), the performance characteristics of OrderedDict are quite different. If you need to handle a high rate of key insertions and popitem calls (e.g., to implement a least-recently-used cache), OrderedDict may be a better fit than the standard Python dict type (see Item 70: “Profile Before Optimizing” on how to make sure you need this).

很长一段时间以来,collections内置模块都有一个OrderedDict类来保持插入顺序。虽然它的行为类似于标准dict类型(从Python 3.7开始),但OrderedDict性能会更好。如果你需要高频的插入键值或调用popitem(例如,实现一个最近最少使用的缓存),OrderedDict可能比Python的标准字典类型更适合(详见Item 70:“Profile Before optimization”)。

However, you shouldn’t always assume that insertion ordering behavior will be present when you’re handling dictionaries. Python makes it easy for programmers to define their own custom container types that emulate the standard protocols matching list, dict, and other types (see Item 43: “Inherit from collections.abc for Custom Container Types”). Python is not statically typed, so most code relies on duck typing—where an object’s behavior is its de facto type—instead of rigid class hierarchies. This can result in surprising gotchas.

但是,在处理字典时不应该总是假设插入顺序行为会出现。Python可以让程序员很容易地自定义容器类型,这些容器类型模拟list、dict和其他类型的标准协议(参见第43项:“Inherit from collections.abc for Custom Container Types”)。Python不是静态类型,所以大多数代码依赖鸭子类型(对象的行为决定了它事实上的类型),而不是严格的类层次结构,这可能会导致令人惊讶的陷阱。

For example, say that I’m writing a program to show the results of a contest for the cutest baby animal. Here, I start with a dictionary containing the total vote count for each one:

例如,假设我正在编写一个程序来显示最可爱动物宝宝竞赛的结果。在这里,我从一个字典开始,它包含了每一个动物宝宝的总票数:

votes = {
    'otter ': 1281,
    'polar bear ': 587,
    'fox ': 863,
}

I define a function to process this voting data and save the rank of each animal name into a provided empty dictionary. In this case, the dictionary could be the data model that powers a UI element :

我定义了一个函数来处理这些投票数据,并将每个动物名称的排名保存到另一个空字典中。在这种情况下,字典可以是支持UI元素的数据模型:

def populate_ranks (votes, ranks):
    names = list (votes.keys ())
    names.sort (key=votes.get, reverse=True)
    for i, name in enumerate (names, 1):
        ranks [name] = i

I also need a function that will tell me which animal won the contest. This function works by assuming that populate_ranks will assign the contents of the ranks dictionary in ascending order, meaning that the first key must be the winner:

我还需要一个函数来告诉我哪只动物赢得了比赛。这个函数假设populate_ranks将按升序排列ranks字典的内容,这意味着第一个键就是赢家:

def get_winner (ranks):
    return next (iter (ranks)) 

Here, I can confirm that these functions work as designed and deliver the result that I expected:

这里,我可以确信这些函数按照设计的方式运行并实现了预期的结果:

ranks = {}
populate_ranks (votes, ranks)
print (ranks)
winner = get_winner (ranks)
print (winner)

>>>
{ 'otter ': 1, 'fox ': 2, 'polar bear ': 3}
otter

Now, imagine that the requirements of this program have changed. The UI element that shows the results should be in alphabetical order instead of rank order. To accomplish this, I can use the collections.abc built-in module to define a new dictionary-like class that iterates its contents in alphabetical order:

现在,假设这个项目的需求发生了变化,显示结果的UI元素要按字母顺序而不是排名顺序。为此,我使用collections.abc内置模块定义一个新的dictionary-like类(像字典又不是字典的类),它将按字母顺序迭代其内容:

from collections.abc import MutableMapping

class SortedDict (MutableMapping):
    def __init__ (self):
        self.data = {}

    def __getitem__ (self, key):
        return self.data [key]
    def __setitem__ (self, key, value):
        self.data [key] = value
    def __delitem__ (self, key):
        del self.data [key]
    def __iter__ (self):
        keys = list (self.data.keys ())
        keys.sort ()
        for key in keys:
            yield key
    def __len__ (self):
        return len (self.data)

I can use a SortedDict instance in place of a standard dict with the functions from before and no errors will be raised since this class conforms to the protocol of a standard dictionary. However, the result is incorrect :

我可以使用一个SortedDict实例替换标准字典,结合前面定义好的函数,运行结果不会引发错误,因为这个类符合标准字典的协议。然而,结果却是不正确的:

sorted_ranks = SortedDict ()
populate_ranks (votes, sorted_ranks)
print (sorted_ranks.data)
winner = get_winner (sorted_ranks)
print (winner)

>>>
{ 'otter ': 1, 'fox ': 2, 'polar bear ': 3}
fox

The problem here is that the implementation of get_winner assumes that the dictionary’s iteration is in insertion order to match populate_ranks. This code is using SortedDict instead of dict, so that assumption is no longer true. Thus, the value returned for the winner is 'fox ', which is alphabetically first.

这里的问题是,get_winner函数假设字典的迭代顺序与插入顺序是一致的,这与populate_ranks相匹配。而这段代码使用的是SortedDict而不是dict,所以这个假设不再正确。fox按字母顺序排在前面,因此返回的优胜者是fox。

There are three ways to mitigate this problem. First, I can reimplement the get_winner function to no longer assume that the ranks dictionary has a specific iteration order. This is the most conservative and robust solution:

有三种方法可以缓解这个问题。第一种,我可以重新实现get_winner函数,不再假设ranks字典具有特定的迭代顺序,这是最保守稳健的解决方案:

def get_winner (ranks):
    for name, rank in ranks.items ():
        if rank == 1:
            return name

winner = get_winner (sorted_ranks)
print (winner)

>>>
otter

The second approach is to add an explicit check to the top of the function to ensure that the type of ranks matches my expectations, and to raise an exception if not. This solution likely has better runtime performance than the more conservative approach:

第二种方法是在函数的顶部添加显式检查,以确保ranks的类型符合我的期望,否则抛出异常。这个方案的运行时性能貌似比保守方案好一点:

def get_winner (ranks):
    if not isinstance (ranks, dict):
        raise TypeError ( 'must provide a dict instance ')
    return next (iter (ranks))

get_winner (sorted_ranks)
>>>
Traceback ...
TypeError: must provide a dict instance

The third alternative is to use type annotations to enforce that the value passed to get_winner is a dict instance and not a MutableMapping with dictionary-like behavior (see Item 90: “Consider Static Analysis via typing to Obviate Bugs”). Here, I run the mypy tool in strict mode on an annotated version of the code above:

第三种选择是使用类型注释来强制传递给get_winner的值是一个dict实例,而不是具有字典相似行为的MutableMapping实例(参见第90项:“Consider Static Analysis via typing to Obviate Bugs”)。在这里,我在上面代码的注释版本上以严格模式运行mypy工具:

from typing import Dict, MutableMapping

def populate_ranks (votes: Dict [str, int],
                    ranks: Dict [str, int]) -> None:
    names = list (votes.keys ())
    names.sort (key=votes.get, reverse=True)
    for i, name in enumerate (names, 1):
        ranks [name] = i

def get_winner (ranks: Dict [str, int]) -> str:
    return next (iter (ranks))

class SortedDict (MutableMapping [str, int]):
    ...

votes = {
    'otter ': 1281,
    'polar bear ': 587,
    'fox ': 863,
}
sorted_ranks = SortedDict ()
populate_ranks (votes, sorted_ranks)
print (sorted_ranks.data)
winner = get_winner (sorted_ranks)
print (winner)

$ python3 -m mypy --strict example.py
.../example.py:48: error: Argument 2 to "populate_ranks" has incompatible type "SortedDict"; expected "Dict [str, int]"
.../example.py:50: error: Argument 1 to "get_winner" has incompatible type "SortedDict"; expected "Dict [str, int]"

This correctly detects the mismatch between the dict and MutableMapping types and flags the incorrect usage as an error. This solution provides the best mix of static type safety and runtime performance.

这种方式正确地检测了dict和MutableMapping类型之间的不匹配,并将不正确的使用标记为错误。这个方案是最佳的,它同时提供了静态类型安全和运行时性能。

Things to Remember
要记住的事

✦ Since Python 3.7, you can rely on the fact that iterating a dict instance’s contents will occur in the same order in which the keys were initially added.
✦ Python makes it easy to define objects that act like dictionaries but that aren’t dict instances. For these types, you can’t assume that insertion ordering will be preserved.
✦ There are three ways to be careful about dictionary-like classes: Write code that doesn’t rely on insertion ordering, explicitly check for the dict type at runtime, or require dict values using type annotations and static analysis.

✦ 自Python 3.7以来,您可以依赖于这样一个事实: 即迭代dict实例的顺序将与键插入时的顺序保持一致。
✦ Python可以很容易的定义行为像字典但不是字典的类型。对于这些类型,不能假设插入顺序会被保留。
✦ 有三种方法避免"像字典又不是字典的类"引发的错误:编写不依赖于插入顺序的代码、在运行时显式地检查字典类型、使用类型注释和静态分析获取字典值。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容