Item 17: Prefer defaultdict Over setdefault to Handle Missing Items in Internal State

When working with a dictionary that you didn’t create, there are a variety of ways to handle missing keys (see Item 16: “Prefer get Over in and KeyError to Handle Missing Dictionary Keys”). Although using the get method is a better approach than using in expressions and KeyError exceptions, for some use cases setdefault appears to be the shortest option.

当你使用一个字典时，有各种各样的方法来处理丢失的键(参见Item 16:“Prefer get Over in and KeyError to handle missing dictionary keys”)。虽然使用get方法比in表达式和KeyError异常要好，但在某些场景下，setdefault似乎是最短的选择。

For example, say that I want to keep track of the cities I’ve visited in countries around the world. Here, I do this by using a dictionary that maps country names to a set instance containing corresponding city names:

例如，假设我想要跟踪我在世界各国访问过的城市。在这里，我使用一个字典，将国家名称和包含相应城市名称的集合实例映射起来:

visits = {  
    'Mexico ': { 'Tulum', 'Puerto Vallarta '},  
    'Japan ': { 'Hakone '},  
}

I can use the setdefault method to add new cities to the sets, whether the country name is already present in the dictionary or not. This approach is much shorter than achieving the same behavior with the get method and an assignment expression (which is available as of Python 3.8):

我可以使用setdefault方法向集合中添加新的城市，无论这个国家的名称是否已经在字典中出现。实现相同的行为，这种方法比使用get方法和赋值表达式(在Python 3.8中可用)要短得多:

visits.setdefault ( 'France ', set ()).add ( 'Arles ') # Short  

if (japan := visits.get ( 'Japan ')) is None:  # Long  
    visits [ 'Japan '] = japan = set ()  
japan.add ( 'Kyoto ')  

print (visits)  

>>>  
{ 'Mexico ': { 'Tulum', 'Puerto Vallarta '},  
'Japan ': { 'Kyoto ', 'Hakone '},  
'France ': { 'Arles '}}

What about the situation when you do control creation of the dictionary being accessed? This is generally the case when you’re using a dictionary instance to keep track of the internal state of a class, for example. Here, I wrap the example above in a class with helper methods to access the dynamic inner state stored in a dictionary:

当您要控制被访问字典的创建时，该怎么办呢? 例如，当您使用字典实例来跟踪类的内部状态，这是一种常见的情况。在这里，我将上面的例子包装在一个类中，并使用辅助方法来访问存储在字典中的动态的内部状态:

class Visits:  
    def __init__ (self):  
        self.data = {}  
    
    def add (self, country, city):  
        city_set = self.data.setdefault (country, set ())  
        city_set.add(city)

This new class hides the complexity of calling setdefault correctly, and it provides a nicer interface for the programmer:

这个新类很好的隐藏了调用setdefault的复杂性，并为程序员提供了一个更好的接口:

visits = Visits ()  
visits.add ( 'Russia ', 'Yekaterinburg ')  
visits.add ( 'Tanzania ', 'Zanzibar ')  
print (visits.data)  

>>>  
{ 'Russia ': { 'Yekaterinburg '}, 'Tanzania ': { 'Zanzibar '}}

However, the implementation of the Visits .add method still isn’t ideal. The setdefault method is still confusingly named, which makes it more difficult for a new reader of the code to immediately understand what’s happening. And the implementation isn’t efficient because it constructs a new set instance on every call, regardless of whether the given country was already present in the data dictionary.

然而，Visits .add方法的实现仍然不理想。setdefault方法的命名仍然令人困惑，这使得代码的新读者很难立即理解发生了什么。而且实现的效率不高，因为它在每次调用时都构造一个新的set实例，而不管给定的国家是否已经存在于数据字典中。

Luckily, the defaultdict class from the collections built-in module simplifies this common use case by automatically storing a default value when a key doesn’t exist. All you have to do is provide a function that will return the default value to use each time a key is missing (an example of Item 38: “Accept Functions Instead of Classes for Simple Interfaces”). Here, I rewrite the Visits class to use defaultdict:

幸运的是，来自collections内置模块的defaultdict类简化了这个常见的使用场景，它在键不存在时自动存储一个默认值。你所要做的就是提供一个函数，该函数将在每次缺少键时返回默认值(Item 38的例子:“Accept Functions Instead of Classes for Simple Interfaces”)。在这里，我重写了Visits类来使用defaultdict:

from collections import defaultdict  
class Visits: 
    def __init__ (self):  
        self.data = defaultdict (set)  
    
    def add (self, country, city):  
        self.data [country].add(city)  
  
visits = Visits ()  
visits.add ( 'England ', 'Bath ')  
visits.add ( 'England ', 'London ')  
print (visits.data)  
>>>  
defaultdict (<class 'set '>, { 'England ': { 'London ', 'Bath '}})

Now, the implementation of add is short and simple. The code can assume that accessing any key in the data dictionary will always result in an existing set instance. No superfluous set instances will be allocated, which could be costly if the add method is called a large number of times.

现在，add的实现非常简短。代码可以假设访问数据字典中的任何键都会产生一个现有的set实例，多余的set实例不再会被分配（重复分配set实例的方式，如果大量调用add方法的话，则会导致开销过大。）

Using defaultdict is much better than using setdefault for this type of situation (see Item 37: “Compose Classes Instead of Nesting Many Levels of Built-in Types” for another example). There are still cases in which defaultdict will fall short of solving your problems, but there are even more tools available in Python to work around those limitations (see Item 18: “Know How to Construct Key-Dependent Default Values with missing,” Item 43: “Inherit from collections.abc for Custom Container Types,” and the collections.Counter built-in class).

对于这种类型的情况，使用defaultdict比使用setdefault要好得多(另一个例子请参见Item 37:“Compose Classes Instead of Nesting Many Levels of Built-in Types”)。仍然存在一些defaultdict不能解决的情况，但是在Python中有更多可用的工具来解决这些问题(参见第18项:“Know How to Construct Key-Dependent Default Values with missing,” 第43项:“Inherit from collections.abc for Custom Container Types,” 和 collections.Counter内置类)。

Things to Remember
要记住的事

✦ If you’re creating a dictionary to manage an arbitrary set of potential keys, then you should prefer using a defaultdict instance from the collections built-in module if it suits your problem.
✦ If a dictionary of arbitrary keys is passed to you, and you don’t control its creation, then you should prefer the get method to access its items. However, it’s worth considering using the setdefault method for the few situations in which it leads to shorter code.

✦如果你正在创建字典来管理一组潜在的任意键，那么你应该优先使用collections内置模块中的defaultdict实例（如果它适合解决你的问题的话）。
✦如果有一个包含任意键的字典传递给你，并且你不能控制它的创建行为，你应该优先使用get方法获取它的元素。但是，在少数情况下使用setdefault方法也是值得考虑的，它可以使代码更多。

Item 17: Prefer defaultdict Over setdefault to Handle Missing Items in Internal State

推荐阅读更多精彩内容