Item 16: Prefer get Over in and KeyError to Handle Missing Dictionary Keys

The three fundamental operations for interacting with dictionaries are accessing, assigning, and deleting keys and their associated values. The contents of dictionaries are dynamic, and thus it’s entirely possible—even likely—that when you try to access or delete a key, it won’t already be present.

字典有三种基本操作: 访问、赋值、删除键值。字典的内容是动态的,因此完全有可能当您试图访问或删除某个键时它已经不存在了。

For example, say that I’m trying to determine people’s favorite type of bread to devise the menu for a sandwich shop. Here, I define a dictionary of counters with the current votes for each style:

例如,假设我试图确定人们最喜欢的面包类型,为某个三明治店设计菜单。在这里,我定义了一个计数器字典与每种风格当前投票数:

counters = {  
    'pumpernickel ': 2,  
    'sourdough ': 1,  
}  

To increment the counter for a new vote, I need to see if the key exists, insert the key with a default counter value of zero if it’s missing, and then increment the counter’s value. This requires accessing the key two times and assigning it once. Here, I accomplish this task using an if statement with an in expression that returns True when the key is present :

为新的投票增加计数器,我需要查看键是否存在,如果它缺失的话就插入一个默认值为零的键,然后增加计数器的值。这需要访问该键两次,并对其赋值一次。在这里,我使用if语句和in表达式来完成这个任务,当键存在时返回True:

key = 'wheat '  
if key in counters:  # 第一次访问
    count = counters [key]  # 第二次访问
else:  
    count = 0  

counters [key] = count + 1  # 一次赋值

Another way to accomplish the same behavior is by relying on how dictionaries raise a KeyError exception when you try to get the value for a key that doesn’t exist. This approach is more efficient because it requires only one access and one assignment :

另一种实现方式是,当您试图获取不存在的键时,字典会抛出KeyError异常。这种方法更有效,因为它只需要一次访问和一次赋值:

try:  
    count = counters [key]  # 一次访问
except KeyError:  
    count = 0  

counters [key] = count + 1  # 一次赋值

This flow of fetching a key that exists or returning a default value is so common that the dict built-in type provides the get method to accomplish this task. The second parameter to get is the default value to return in the case that the key—the first parameter—isn’t present. This also requires only one access and one assignment, but it’s much shorter than the KeyError example:

从字典中获取键的值否则返回一个默认值,这个流程非常常见,因此dict内置类型提供了get方法来完成这项任务。如果键(第一个参数)不存在,则默认返回第二个参数的值。这也只需要一次访问和一次赋值,但它比KeyError的例子短得多:

count = counters.get (key, 0)  # 一次访问
counters [key] = count + 1  # 一次赋值

It’s possible to shorten the in expression and KeyError approaches in various ways, but all of these alternatives suffer from requiring code duplication for the assignments, which makes them less readable and worth avoiding:

有多种方式可以缩短in表达式和KeyError方法,但这些替代方案都需要重复赋值,这使得它们可读性较差,不值得采纳:

if key not in counters:  
    counters [key] = 0  
counters [key] += 1  

if key in counters:  
    counters [key] += 1  
else:  
    counters [key] = 1  

try:  
    counters [key] += 1  
except KeyError:  
    counters [key] = 1  

Thus, for a dictionary with simple types, using the get method is the shortest and clearest option.

因此,对于简单类型的字典,使用get方法是最简短、可读性最高的选择。

Note
注意

If you’re maintaining dictionaries of counters like this, it’s worth considering the Counter class from the collections built-in module, which provides most of the facilities you are likely to need.

如果您像这样维护counters字典的话,可以考虑使用collections内置模块中的Counter类,它提供了您可能需要的大多数工具。

What if the values of the dictionary are a more complex type, like a list? For example, say that instead of only counting votes, I also want to know who voted for each type of bread. Here, I do this by associating a list of names with each key:

如果字典的值是更复杂的类型(比如列表),该怎么办呢? 假如除了计算选票之外,我还想知道是谁给每种面包投了票。在这里,我将每个键和一个名称列表关联起来:

votes = {  
    'baguette ': [ 'Bob ', 'Alice '],  
    'ciabatta ': [ 'Coco ', 'Deb '],  
}  
key = 'brioche '  
who = 'Elmer '  

if key in votes:  # 第一次访问
    names = votes [key]  # 第二次访问
else:  
    votes [key] = names = []  # 一次访问、一次赋值

names.append (who)  
print (votes)  

>>>  
{ 'baguette ': [ 'Bob ', 'Alice '],  
'ciabatta ': [ 'Coco ', 'Deb '],  
'brioche ': [ 'Elmer ']}  

Relying on the in expression requires two accesses if the key is present, or one access and one assignment if the key is missing. This example is different from the counters example above because the value for each key can be assigned blindly to the default value of an empty list if the key doesn’t already exist. The triple assignment statement (votes [key] = names = []) populates the key in one line instead of two. Once the default value has been inserted into the dictionary, I don’t need to assign it again because the list is modified by reference in the later call to append.

如果键存在,使用in表达式需要两次访问; 如果键不存在,则需要一次访问和一次赋值。这个例子与上面的计数器例子不同,因为如果一个键不存在,该键的值可以默认赋值为一个空列表。三重赋值语句(votes[key] = names = [])在一行而不是两行中填充了键。一旦将默认值插入到字典中,就不需要再次赋值了,因为在后面的append调用中通过引用修改了列表。

It’s also possible to rely on the KeyError exception being raised when the dictionary value is a list. This approach requires one key access if the key is present, or one key access and one assignment if it’s missing, which makes it more efficient than the in condition:

当字典值是一个列表时,也可以依赖于抛出的KeyError异常。如果键存在,这种方法需要一次键访问,如果键缺失,则需要一次键访问和一次赋值,这使得它比in条件表达式更有效:

try:  
    names = votes [key]  # 一次访问
except KeyError:  
    votes [key] = names = []  # 一次访问、一次赋值

names.append (who)  

Similarly, you can use the get method to fetch a list value when the key is present, or do one fetch and one assignment if the key isn’t present :

类似地,当键存在时,你可以使用get方法获取列表值,或者当键不存在时,执行一次获取和一次赋值:

names = votes.get (key)  
if names is None:  
    votes [key] = names = []  # 一次获取、一次赋值

names.append (who)

The approach that involves using get to fetch list values can further be shortened by one line if you use an assignment expression ( introduced in Python 3.8; see Item 10: “Prevent Repetition with Assignment Expressions”) in the if statement, which improves readability:

在if语句中使用赋值表达式获致list的值,可以进一步缩短代码(在Python 3.8中引入;参见Item 10:“Prevent Repetition with Assignment Expressions”):

if (names := votes.get (key)) is None:  
    votes [key] = names = []  

names.append (who)  

The dict type also provides the setdefault method to help shorten this pattern even further. setdefault tries to fetch the value of a key in the dictionary. If the key isn’t present, the method assigns that key to the default value provided. And then the method returns the value for that key: either the originally present value or the newly inserted default value. Here, I use setdefault to implement the same logic as in the get example above:

dict类型还提供setdefault方法来帮助进一步缩短这种模式的代码。setdefault
尝试从字典中获取某个键的值,如果键不存在,该方法则将默认值赋值给这个键。然后再返回该键的值:原始值或新插入的默认值。在这里,我使用setdefault来实现和上面get示例中相同的逻辑:

names = votes.setdefault (key, [])  
names.append (who)  

This works as expected, and it is shorter than using get with an assignment expression. However, the readability of this approach isn’t ideal. The method name setdefault doesn’t make its purpose immediately obvious. Why is it set when what it’s doing is getting a value? Why not call it get_or_set? I’m arguing about the color of the bike shed here, but the point is that if you were a new reader of the code and not completely familiar with Python, you might have trouble understanding what this code is trying to accomplish because setdefault isn’t self-explanatory.

这与预期的一样,并且它比使用get和赋值表达式短。然而,这种方法的可读性并不理想,方法名setdefault不能直接显示其目的。当我们所做的是获取一个值时,为什么是set?为什么不叫它get_or_set
?如果你是代码的新读者,并且不完全熟悉Python,你可能很难理解这段代码的目的,因为setdefault这个名字不是不言自明的。

There’s also one important gotcha: The default value passed to setdefault is assigned directly into the dictionary when the key is missing instead of being copied. Here, I demonstrate the effect of this when the value is a list:

还有一个重要的问题:当键缺失时,传递给setdefault的默认值,会直接赋值给该键并添加到到字典中。在这里,我演示当默认值是列表时的效果:

data = {}  
key = 'foo '  
value = []  
data.setdefault (key, value)  
print ( 'Before: ', data)  
value.append ( 'hello ')  
print ( 'After: ', data)  

>>>  
Before: { 'foo ': []}  
After: { 'foo ': [ 'hello ']}  

This leads to a significant performance overhead in this example because I have to allocate a list instance for each call. If I reuse an object for the default value—which I might try to do to increase efficiency or readability—I might introduce strange behavior and bugs (see Item 24: “Use None and Docstrings to Specify Dynamic Default Arguments” for another example of this problem).

这意味着我每次使用setdefault访问key时都需要构造一个新的默认值。这在本例中导致了显著的性能开销,因为我必须为每次调用分配一个列表实例。如果我为默认值重用一个对象——这样可以提高效率或可读性——但是可能会引入奇怪的行为和错误(参见Item 24:“Use None and Docstrings to Specify Dynamic Default Arguments”,这是这个问题的另一个例子)。

Going back to the earlier example that used counters for dictionary values instead of lists of who voted: Why not also use the setdefault method in that case? Here, I reimplement the same example using this approach:

回到前面"字典的值是投票数而不是投票人列表"的例子: 在这种情况下,为什么不使用setdefault方法呢? 这里,我用这种方法重新实现了相同的例子:

count = counters.setdefault (key, 0)  
counters [key] = count + 1  

The problem here is that the call to setdefault is superfluous. You always need to assign the key in the dictionary to a new value after you increment the counter, so the extra assignment done by setdefault is unnecessary. The earlier approach of using get for counter updates requires only one access and one assignment, whereas using setdefault requires one access and two assignments.

这里的问题是对setdefault的调用是多余的。在增加计数后,您总是需要将字典中的键赋值给一个新值,因此setdefault所做的额外赋值是不必要的。前面使用get更新计数器的方法只需要一次访问和一次赋值,而使用setdefault则需要一次访问和两次赋值。

There are only a few circumstances in which using setdefault is the shortest way to handle missing dictionary keys, such as when the default values are cheap to construct, mutable, and there’s no potential for raising exceptions (e.g., list instances). In these very specific cases, it may seem worth accepting the confusing method name setdefault instead of having to write more characters and lines to use get. However, often what you really should do in these situations is to use defaultdict instead (see Item17: “Prefer defaultdict Over setdefault to Handle Missing Items in Internal State”).

只有在少数情况下,使用setdefault是处理字典键缺失的最短方式,例如当默认值构造成本低、易变,并且不存在引发异常的可能性时(例如列表实例)。在这些特定情况下,使用令人困惑的setdefault似乎是值得的,使用get需要编写更多的代码和行数。然而,通常在这些情况下,你真正应该做的是使用defaultdict代替setdefault(参见Item17:“Prefer defaultdict Over setdefault to Handle Missing Items in Internal State”)。

Things to Remember
要记住的事

✦ There are four common ways to detect and handle missing keys in dictionaries: using in expressions, KeyError exceptions, the get method, and the setdefault method.
✦ The get method is best for dictionaries that contain basic types like counters, and it is preferable along with assignment expressions when creating dictionary values has a high cost or may raise exceptions.
✦ When the setdefault method of dict seems like the best fit for your problem, you should consider using defaultdict instead.

✦ 有四种常见的方法来检测和处理字典中丢失的键: in表达式、KeyError异常、get方法、setdefault方法。
✦ 当创建字典值的成本很高或可能引发异常时, get方法最适用于包含基本类型的字典(比如counters),它最好与赋值表达式一起使用。
✦ 当dict的setdefault方法看起来最适合解决你的问题时,你应该考虑使用defaultdict代替。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容