Chapter Summary: How Sets Work-Practical Consequences

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

How Sets Work—Practical Consequences

The set and frozenset types are also implemented with a hash table, except that each
bucket holds only a reference to the element (as if it were a key in a dict, but without
a value to go with it). In fact, before set was added to the language, we often used
dictionaries with dummy values just to perform fast membership tests on the keys.
Everything said in “Practical Consequences of How dict Works” on page 90 about how
the underlying hash table determines the behavior of a dict applies to a set. Without
repeating the previous section, we can summarize it for sets with just a few words:

• Set elements must be hashable objects.


• Sets have a significant memory overhead.
• Membership testing is very efficient.
• Element ordering depends on insertion order.
• Adding elements to a set may change the order of other elements.

Chapter Summary
Dictionaries are a keystone of Python. Beyond the basic dict, the standard library offers
handy, ready-to-use specialized mappings like defaultdict, OrderedDict, ChainMap,
and Counter, all defined in the collections module. The same module also provides
the easy-to-extend UserDict class.
Two powerful methods available in most mappings are setdefault and update. The
setdefault method is used to update items holding mutable values, for example, in a
dict of list values, to avoid redundant searches for the same key. The update method
allows bulk insertion or overwriting of items from any other mapping, from iterables
providing (key, value) pairs and from keyword arguments. Mapping constructors
also use update internally, allowing instances to be initialized from mappings, iterables,
or keyword arguments.
A clever hook in the mapping API is the __missing__ method, which lets you customize
what happens when a key is not found.
The collections.abc module provides the Mapping and MutableMapping abstract base
classes for reference and type checking. The little-known MappingProxyType from the
types module creates immutable mappings. There are also ABCs for Set and Mutable
Set.

Chapter Summary | 93

WOW! eBook
www.wowebook.org
The hash table implementation underlying dict and set is extremely fast. Understand‐
ing its logic explains why items are apparently unordered and may even be reordered
behind our backs. There is a price to pay for all this speed, and the price is in memory.

Further Reading
In The Python Standard Library, 8.3. collections — Container datatypes includes ex‐
amples and practical recipes with several mapping types. The Python source code for
the module Lib/collections/init.py is a great reference for anyone who wants to create a
new mapping type or grok the logic of the existing ones.
Chapter 1 of Python Cookbook, Third edition (O’Reilly) by David Beazley and Brian K.
Jones has 20 handy and insightful recipes with data structures—the majority using dict
in clever ways.
Written by A.M. Kuchling—a Python core contributor and author of many pages of the
official Python docs and how-tos—Chapter 18, “Python’s Dictionary Implementation:
Being All Things to All People, in the book Beautiful Code (O’Reilly) includes a detailed
explanation of the inner workings of the Python dict. Also, there are lots of comments
in the source code of the dictobject.cCPython module. Brandon Craig Rhodes’ pre‐
sentation The Mighty Dictionary is excellent and shows how hash tables work by using
lots of slides with… tables!
The rationale for adding sets to the language is documented in PEP 218 — Adding a
Built-In Set Object Type. When PEP 218 was approved, no special literal syntax was
adopted for sets. The set literals were created for Python 3 and backported to Python
2.7, along with dict and set comprehensions. PEP 274 — Dict Comprehensions is the
birth certificate of dictcomps. I could not find a PEP for setcomps; apparently they were
adopted because they get along well with their siblings—a jolly good reason.

Soapbox
My friend Geraldo Cohen once remarked that Python is “simple and correct.”
The dict type is an example of simplicity and correctness. It’s highly optimized to do
one thing well: retrieve arbitrary keys. It’s fast and robust enough to be used all over the
Python interpreter itself. If you need predictable ordering, use OrderedDict. That is not
a requirement in most uses of mappings, so it makes sense to keep the core implemen‐
tation simple and offer variations in the standard library.
Contrast with PHP, where arrays are described like this in the official PHP Manual:
An array in PHP is actually an ordered map. A map is a type that associates values to
keys. This type is optimized for several different uses; it can be treated as an array, list

94 | Chapter 3: Dictionaries and Sets

WOW! eBook
www.wowebook.org

You might also like