On list and dictionary comprehension.
In a Nutshell
For quick reference, here are a few examples of list and dictionary comprehension. Details—with variation and some explanation—further below.
Collect the result of a function call on every element of a list: lowercase all the words in a list.
>>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> lowercased = [x.lower() for x in tokens] >>> lowercased ['the', 'cat', 'sat', 'on', 'the', 'mat.', 'the', 'cow', 'sat', 'in', 'the', 'bow.']
"Unwrap" a list of lists: get a list of all the characters in a list of all the characters in a list of words.
>>> charInTokenList [['T', 'h', 'e'], ['c', 'a', 't'], ['s', 'a', 't'], ['o', 'n'], ['t', 'h', 'e'], ['m', 'a', 't', '.'], ['T', 'h', 'e'], ['c', 'o', 'w'], ['s', 'a', 't'], ['i', 'n'], ['t', 'h', 'e'], ['b', 'o', 'w', '.']] >>> charList = [char for token in tokens for char in token] >>> charList ['T', 'h', 'e', 'c', 'a', 't', 's', 'a', 't', 'o', 'n', 't', 'h', 'e', 'm', 'a', 't', '.', 'T', 'h', 'e', 'c', 'o', 'w', 's', 'a', 't', 'i', 'n', 't', 'h', 'e', 'b', 'o', 'w', '.']```
Pair x and F(x): for every word in a list of words, use a dictionary to pair that word with its uppercase version. (Note that the dictionary does not print in input order.)
>>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> uppercaseTokens = {x: x.upper() for x in tokens} >>> uppercaseTokens {'the': 'THE', 'mat.': 'MAT.', 'sat': 'SAT', 'on': 'ON', 'cow': 'COW', 'The': 'THE', 'bow.': 'BOW.', 'in': 'IN', 'cat': 'CAT'}
Details
Why should a beginning Python programmer learn to use list comprehensions instead of for-loops? Because, in the code, comprehensions are more succinct, and at runtime they are faster.
I discuss both of those topics below. But I start with the answer to my latest need for a list comprehension solution—it's devilishly complex (perhaps even too complex). Then I give a quick tutorial on the basic patterns that list comprehension follows. This may be the best way beginners can learn to read and write their own list comprehensions—it's helped for me. Finally, I get around to discussing the succinctness and runtime advantages of list comprehension.
With Two Lists, an If and an Any
Here's the list-comprehension solution to a programming problem I once had. It took a fair amount of searching and experimenting—I had to cobble together a number of online examples, since I couldn't find a page with a ready-made answer. Maybe other programmers with a similar problem will find this page, thereby reaching their solutions more quickly than I reached mine.
Suppose you have a list of verbs and a list of sentences:
>>> verbs = ['asks', 'goes', 'wants', 'tries'] >>> sentences = ["Mary tries her best.", "Mario visits his sister every Saturday.", "Mark always asks his parents for permission."]
And suppose you want a list of all the verbs which are not in any of the sentences. For this example, the list you want is: ["goes", "wants"]
.
Here's how you do it in a one-line list comprehension:
>>> notFound = [x for x in verbs if not any(x in sent for sent in sentences)] >>> notFound ['goes', 'wants']
For-Loop Equivalent
Here's the for-loop equivalent of the above list comprehension.
>>> notFound = [] >>> for verb in verbs: ... bFound = False ... for sent in sentences: ... if verb in sent: ... bFound = True ... break ... if not bFound: ... notFound.append(verb) >>> notFound ['goes', 'wants']
Which is better? The answer depends on which you value more, succinctness or readability. But readability is very subjective. The more familiar you are with list comprehension syntax, the easier it will be to read the list comprehension solution. I discuss the issue in detail below. But first I offer an approach to making list comprehensions easier to read.
List Comprehension Patterns
The way to learn to read list comprehensions—as well as to create your own—is to learn to recognize certain patterns.
newList = [func(x) for x in y]
- For every element of collection y invoke the function func( ) and collect the results into newList.
Here we show two examples on a list of words called tokens:
>>> tokens = "The cat sat on the mat. The cow sat in the bow.".split() >>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] # Example 1 >>> lowercased = [x.lower() for x in tokens] >>> lowercased ['the', 'cat', 'sat', 'on', 'the', 'mat.', 'the', 'cow', 'sat', 'in', 'the', 'bow.'] Example 2 >>> lengths = [len(x) for x in tokens] >>> lengths [3, 3, 3, 2, 3, 4, 3, 3, 3, 2, 3, 4]
newList = [x for y in z for x in y]
- This pattern is used to read a nested list—that is, a list within a list: for example, each letter for all the words in a sentence, or for each word of all the documents in a collection of documents.
- The pattern collects all the nested elements into a single list.
Here again we show two examples: (1) getting a list of all the characters in a list of all the characters in a list of words; and (2) getting all the characters in a list of words.
# Example 1 >>> charInTokenList [['T', 'h', 'e'], ['c', 'a', 't'], ['s', 'a', 't'], ['o', 'n'], ['t', 'h', 'e'], ['m', 'a', 't', '.'], ['T', 'h', 'e'], ['c', 'o', 'w'], ['s', 'a', 't'], ['i', 'n'], ['t', 'h', 'e'], ['b', 'o', 'w', '.']] >>> charList = [char for token in tokens for char in token] >>> charList ['T', 'h', 'e', 'c', 'a', 't', 's', 'a', 't', 'o', 'n', 't', 'h', 'e', 'm', 'a', 't', '.', 'T', 'h', 'e', 'c', 'o', 'w', 's', 'a', 't', 'i', 'n', 't', 'h', 'e', 'b', 'o', 'w', '.']``` # Example 2 >>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> characters = [char for token in tokens for char in token] >>> characters ['T', 'h', 'e', 'c', 'a', 't', 's', 'a', 't', 'o', 'n', 't', 'h', 'e', 'm', 'a', 't', '.', 'T', 'h', 'e', 'c', 'o', 'w', 's', 'a', 't', 'i', 'n', 't', 'h', 'e', 'b', 'o', 'w', '.']
Notice that in the second example, the inner list of the nested list is implicit rather than explicit. It's not really a list of characters, but Python lets you treat the characters of a string as if they were a list. In other words, in Python a string is iterable.
newList = [[x for x in y] for y in z]
- This pattern is used to read a nested list, and create a nested list as well.
- In other words, for each element in z, it creates a list.
- Compare this pattern to the pattern above, which creates a flat list. Notice two things:
- The square brackets in the pattern used to create the nested list.
- The ordering of the variable names. The flat list goes: [x for y in z for x in y]. The nested list goes: [[x for x in y] for y in z].
Example, getting all the characters for each word in a list of words:
>>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> charsInWord = [[char for char in token] for token in tokens] >>> charsInWord [['T', 'h', 'e'], ['c', 'a', 't'], ['s', 'a', 't'], ['o', 'n'], ['t', 'h', 'e'], ['m', 'a', 't', '.'], ['T', 'h', 'e'], ['c', 'o', 'w'], ['s', 'a', 't'], ['i', 'n'], ['t', 'h', 'e'], ['b', 'o', 'w', '.']]
newList = [w for y in z for x in y for w in x]
-
This pattern is for a triple-nested collection. For instance, suppose we have a group of documents, and for each document we have a list of all of its words. You would use this pattern to get a list of all the characters of all the words in all the documents.
-
To help you remember how to parse this admittedly convoluted syntax, think of it as representing a triple-for-loop:
>>> newList = [] for y in z: ... for x in y: ... for w in x: ... newList.append(w)
Notice that the order of the items in the list comprehension matches the order of the items in the for-loop equivalent: y in z for x in y for w in x.
Our example for this pattern starts with a collection of documents which consists of a list of words for each document. The list comprehension collects all the characters for every word.
First we create the data.
>>> doc1 = "The cat sat on the mat. The cow sat in the bow.".split() >>> doc1 ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> doc2 = "He lay flat on the floor of the forest.".split() >>> doc2 ['He', 'lay', 'flat', 'on', 'the', 'floor', 'of', 'the', 'forest.'] >>> docs = [doc1, doc2] >>> docs [['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'], ['He', 'lay', 'flat', 'on', 'the', 'floor', 'of', 'the', 'forest.']]
So doc1 consists of the words of two sentences, and doc2 consists of the words of a single sentence.
Now, finally, we have the data upon which we can perform our list comprehension.
>>> characters = [char for doc in docs for token in doc for char in token] >>> characters ['T', 'h', 'e', 'c', 'a', 't', 's', 'a', 't', 'o', 'n', 't', 'h', 'e', 'm', 'a', 't', '.', 'T', 'h', 'e', 'c', 'o', 'w', 's', 'a', 't', 'i', 'n', 't', 'h', 'e', 'b', 'o', 'w', '.', 'H', 'e', 'l', 'a', 'y', 'f', 'l', 'a', 't', 'o', 'n', 't', 'h', 'e', 'f', 'l', 'o', 'o', 'r', 'o', 'f', 't', 'h', 'e', 'f', 'o', 'r', 'e', 's', 't', '.']
Calling a function.
Finally, note that—just as with the first pattern—we can call any valid function on any of the components of these more complicated list comprehensions. For example, we can call the string
class's lower( ) method on the word component.
>>> docs = [doc1, doc2] >>> docs [['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'], ['He', 'lay', 'flat', 'on', 'the', 'floor', 'of', 'the', 'forest.']] >>> characters = [char for doc in docs for token in doc for char in token.lower()] >>> characters ['t', 'h', 'e', 'c', 'a', 't', 's', 'a', 't', 'o', 'n', 't', 'h', 'e', 'm', 'a', 't', '.', 't', 'h', 'e', 'c', 'o', 'w', 's', 'a', 't', 'i', 'n', 't', 'h', 'e', 'b', 'o', 'w', '.', 'h', 'e', 'l', 'a', 'y', 'f', 'l', 'a', 't', 'o', 'n', 't', 'h', 'e', 'f', 'l', 'o', 'o', 'r', 'o', 'f', 't', 'h', 'e', 'f', 'o', 'r', 'e', 's', 't', '.']
Just remember that the component of the list comprehension used for the function call must either be the first element in the expression, or one of the components following the in
keyword. Compare the valid list comprehension above to the invalid one below.
>>> characters = [char for doc in docs for token.lower() in doc for char in token] File "<input>", line 1 SyntaxError: can't assign to function call
Dictionary Comprehension
Once you've figured out list comprehension, you can easily use the same techniques to create dictionaries.
Baby steps first. For every word in a list of words, we'll use a dictionary to pair that word with its uppercase version. (Note that the dictionary does not print in input order.)
>>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> uppercaseTokens = {x: x.upper() for x in tokens} >>> uppercaseTokens['the'] 'THE' >>> uppercaseTokens['cow'] 'COW' # Print out the entire dictionary. >>> uppercaseTokens {'the': 'THE', 'mat.': 'MAT.', 'sat': 'SAT', 'on': 'ON', 'cow': 'COW', 'The': 'THE', 'bow.': 'BOW.', 'in': 'IN', 'cat': 'CAT'}
Now for each key of the dictionary, let's create an uppercase and lowercase pair.
>>> tokenVariations = {x: (x.lower(), x.upper()) for x in tokens} >>> tokenVariations['The'] ('the', 'THE') # Print out the entire dictionary. >>> tokenVariations {'the': ('the', 'THE'), 'mat.': ('mat.', 'MAT.'), 'sat': ('sat', 'SAT'), 'on': ('on', 'ON'), 'cow': ('cow', 'COW'), 'The': ('the', 'THE'), 'bow.': ('bow.', 'BOW.'), 'in': ('in', 'IN'), 'cat': ('cat', 'CAT')}
A dictionary of dictionaries, anyone?
>>> tokenDict = {x: {'lower': x.lower(), 'upper': x.upper(), 'len': len(x)} for x in tokens} >>> tokenDict['The'] {'len': 3, 'upper': 'THE', 'lower': 'the'} # The entire dictionary: >>> tokenDict {'the': {'len': 3, 'upper': 'THE', 'lower': 'the'}, 'mat.': {'len': 4, 'upper': 'MAT.', 'lower': 'mat.'}, 'sat': {'len': 3, 'upper': 'SAT', 'lower': 'sat'}, 'on': {'len': 2, 'upper': 'ON', 'lower': 'on'}, 'cow': {'len': 3, 'upper': 'COW', 'lower': 'cow'}, 'The': {'len': 3, 'upper': 'THE', 'lower': 'the'}, 'bow.': {'len': 4, 'upper': 'BOW.', 'lower': 'bow.'}, 'in': {'len': 2, 'upper': 'IN', 'lower': 'in'}, 'cat': {'len': 3, 'upper': 'CAT', 'lower': 'cat'}}
In the above dictionaries and lists, do you find the period in some of the token entries annoying?
>>> tokenDict['mat'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'mat' >>> tokenDict['mat.'] {'len': 4, 'upper': 'MAT.', 'lower': 'mat.'}
Here we fix that, with an advanced-level dictionary comprehension that itself contains a list comprehension.
>>> tokenDict = {x: {'lower': x.lower(), 'upper': x.upper(), 'len': len(x)} for x in [tok.replace('.', '') for tok in tokens]} >>> tokenDict['mat'] {'len': 3, 'upper': 'MAT', 'lower': 'mat'}
The solution begins with the list comprehension [tok.replace('.', '') for tok in tokens]
, which simply returns a list of tokens whose periods have been removed. After that, the dictionary comprehension takes over, and you already saw that in the previous example.
With a Function that Returns a List
Many Python functions return a list. Where you use this kind of function in a list comprehension controls whether the result is a flat list or a nested list of lists.
Here's our input data:
>>> doc1 = "The cat sat on the mat. The cow sat in the bow." >>> doc2 = "He lay flat on the brown, pine-needled floor of the forest, his chin on his folded arms, and high overhead the wind blew in the tops of the pine trees." >>> docs = [doc1, doc2] >>> docs ['The cat sat on the mat. The cow sat in the bow.', 'He lay flat on the brown, pine-needled floor of the forest, his chin on his folded arms, and high overhead the wind blew in the tops of the pine trees.']
You see that we have a list of documents, each document consisting of a string. A simple, and flat, list of strings.
Suppose now that we want a nested list, where at the top level we maintain the list of documents, but for each document we want a list of the words which compose that document. To do this we'll use Python's string split( ) function, which takes a string and splits it up into words.
>>> docTokens = [doc.split() for doc in docs] >>> docTokens [['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'], ['He', 'lay', 'flat', 'on', 'the', 'brown,', 'pine-needled', 'floor', 'of', 'the', 'forest,', 'his', 'chin', 'on', 'his', 'folded', 'arms,', 'and', 'high', 'overhead', 'the', 'wind', 'blew', 'in', 'the', 'tops', 'of', 'the', 'pine', 'trees.']] >>> len(docTokens) 2 >>> len(docTokens[0]) 12 >>> len(docTokens[1]) 30
On the other hand, maybe you want just a simple list of words. For this, you have to individually retrieve each word that doc.split( )
produces.
>>> tokens = [token for doc in docs for token in doc.split()] >>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.', 'He', 'lay', 'flat', 'on', 'the', 'brown,', 'pine-needled', 'floor', 'of', 'the', 'forest,', 'his', 'chin', 'on', 'his', 'folded', 'arms,', 'and', 'high', 'overhead', 'the', 'wind', 'blew', 'in', 'the', 'tops', 'of', 'the', 'pine', 'trees.'] >>> len(tokens) 42
Testing the Return of a Function
Sometimes you want to collect together, not some of the elements of a list, but the output of a function that is called on those elements.
We've already seen this example, where we put the character length of a bunch of words into a list.
>>> tokens ['The', 'cat', 'sat', 'on', 'the', 'mat.', 'The', 'cow', 'sat', 'in', 'the', 'bow.'] >>> [len(token) for token in tokens] [3, 3, 3, 2, 3, 4, 3, 3, 3, 2, 3, 4]
But what if we wanted to test the length before inserting the element into your new list? For example, suppose you want the length of only those words whose length is greater than 3. Well, you could do this:
# Don't copy this. >>> [len(token) for token in tokens if len(token) > 3] [4, 4]
That works, but, unfortunately, you're calling len( ) twice. Not so efficient, that. With len( ) the time cost may not be very much, but with more expensive functions you would definitely want to avoid this syntax.
What you can do instead is create a one-element list with the return of the function call.
[val for token in tokens for val in [len(token)] if val > 3] [4, 4]
As you can see, here we're calling len( ) only once, and we nest the call in square brackets, thus creating a one-element list. We set val
to that single element—the length of token
; then we check that val
is greater than 3; then, finally, we insert val
into the list returned by the list comprehension.
I have found this technique so valuable that I feel compelled to document where I found it. How to set local variable in list comprehension? On Stack Overflow, of course.
Runtime
List comprehension is faster than the use of for-loops.
This blog from a site called LeadShift has a nice rule-of-thumb on this topic:
If you require a list of results almost always use a list comprehension. If no results are required, using a simple loop is simpler to read and faster to run. Never use the builtin map, unless it's more aesthetically appealing for that piece of code and your application does not need the speed improvement.
The blog goes on to justify this conclusion with Python examples, graphs and a github page.
At least one Stackoverflow page explains why list comprehension is faster.
If you read the page above carefully, you'll find the argument that one reason list comprehension is faster is that it is compiled or created into C code. On the other hand, one reader calls this "one of the most pervasive, utterly false myths about Python."
Another Stackoverflow page explicitly tries to address this question. There isn't really a consensus of opinion, however. It seems that the answer may be implementation-specific. But there is no doubt that list comprehension is, in fact, faster than a for-loop.
Code Readability
Good Programmers Emulate Hemingway Over Joyce
Compare this:
Sir Tristram, violer d’amores, fr’over the short sea, had passen-core rearrived from North Armorica on this side the scraggy isthmus of Europe Minor to wielderfight his penisolate war:
To this:
He lay flat on the brown, pine-needled floor of the forest, his chin on his folded arms, and high overhead the wind blew in the tops of the pine trees.
I'm a firm believer that programmers should emulate Hemingway rather than mid-to-late Joyce. Mid-to-late Joyce is murky and mysterious. It's beautiful in the way the depths of a swamp are beautiful. But mid-to-late Joyce is not a proper role model for programmers. Hemingway, on the other hand, is. Hemingway is crystal clear. His beauty lies in the way the parts contribute to a greater sum. His is the beauty of software engineering.
Given that this is my programming manifesto—Good Programmers Emulate Hemingway Over Joyce—one might wonder why I am taken with list comprehensions in Python.
In other words, if this is Hemingway:
>>> tokens = [] >>> for doc in docs: ... for sent in doc: ... for token in sent: ... tokens.append(token)
Isn't this Joyce?
>>> tokens = [x for z in docs for y in z for x in y]
I have three answers to this question, and they are Not really, Yes and Not in some circumstances.
Not really. The second example is not really Joycean because it conforms to a common pattern in Python—or what will come to be a common pattern to the beginning Python developer as he or she becomes more familiar with Python code.
For example, consider a first grade class as they begin to learn to read. To them, each of these words must be sounded out letter by letter: make, bake, cake. As they become more accustomed to reading, they begin to see the pattern ake with a first-letter variation. In the same way, Python novitiates fail to see the patterns inherent in list comprehensions. But as they become more accustomed to "reading" Python, they'll start to recognize the list comprehension patterns described above:
- [x for x in y]
- [x for y in z for x in y]
- [w for y in z for x in y for w in x]
Yes. The second example is a little Joycean in that it is overly fond of non-descriptive variable names. The programmer who writes [x for z in docs for y in z for x in y]
is prizing succinctness over clarity, perhaps even misbelieving that succinctness and clarity go hand-in-hand. So let's fix the near-Joycean line of code with something a little more clear, even though it may also be a little wordier.
>>> tokens = [token for doc in docs for sentence in doc for token in sentence]
If my own evolution is to be any guide, I think most programmers will find a little room for compromise here. I believe that you can be a little less explicit with the final variable name, using x instead of something more literal, thus:
>>> tokens = [x for doc in docs for sentence in doc for x in sentence]
Not in some circumstances. I was under a deadline when I worked out the list comprehension at the top of this blog—the one that performs the list comprehension "With Two Lists, an If and an Any". For that reason I nearly decided not to take the time to figure out how to implement my goal with a list comprehension. In other words, I very nearly wrote out the double for-loop given in the "For-Loop Equivalent" section. I did not, however. I did not because I would have had to do that for-loop three times in a single code block—once for each list representing a different section of a document... So, in some circumstances, it might be better to implement something with for-loops rather than in a very complicated, hard-to-read list comprehension. But in these particular circumstances I really had no choice. The succinctness of the code block as a whole was worth the slight sacrifice in readability.
Input Data
For the record, here's the input data for this section's code.
sentences = ["The cat sat on the mat.".split(), "The cow sat in the bow.".split()] sentences [['The', 'cat', 'sat', 'on', 'the', 'mat.'], ['The', 'cow', 'sat', 'in', 'the', 'bow.']] doc1 = sentences sentences = ["Sir Tristram, violer d’amores, fr’over the short sea, had passen-core rearrived from North Armorica on this side the scraggy isthmus of Europe Minor to wielderfight his penisolate war:".split()] sentences [['Sir', 'Tristram,', 'violer', 'd’amores,', 'fr’over', 'the', 'short', 'sea,', 'had', 'passen-core', 'rearrived', 'from', 'North', 'Armorica', 'on', 'this', 'side', 'the', 'scraggy', 'isthmus', 'of', 'Europe', 'Minor', 'to', 'wielderfight', 'his', 'penisolate', 'war:']] doc2 = sentences sentences = ["He lay flat on the brown, pine-needled floor of the forest, his chin on his folded arms, and high overhead the wind blew in the tops of the pine trees.".split()] sentences [['He', 'lay', 'flat', 'on', 'the', 'brown,', 'pine-needled', 'floor', 'of', 'the', 'forest,', 'his', 'chin', 'on', 'his', 'folded', 'arms,', 'and', 'high', 'overhead', 'the', 'wind', 'blew', 'in', 'the', 'tops', 'of', 'the', 'pine', 'trees.']] doc3 = sentences docs = [doc1, doc2, doc3]
References
In addition to the two links provided in the "Runtime" section above, Luciano Strika's blog demonstrates some practical uses of list comprehension.
He also has a couple examples of the next logical step to list comprehension: dictionary comprehension.