the dam

This post is a collection of various facts about Python:

common mistakes that I encounter frequently when reading code written by myself or other people.
specific features of the Python language that are not very well-known and that I think should be used more.
general recommendations regarding Python.

Please note that I do not consider myself a Python expert, so it is possible that the following text contains some inaccurate statements.

Also, due to its very nature, this post is rather unstructured. The table of contents should help you jumping directly to the part you are interested in.

List comprehensions
The multiples faces of in
Manipulating lists with atomic instructions
Exceptions
Values equivalent to True or False
Generators
Decorators
- A simple decorator
- Another example
Classes with two methods
PEP 8

List comprehensions

List comprehensions give Python users a very concise and powerful syntax to build a list from another list (or any iterable object). The syntax is the following:

result_list = [expression(item) for item in original_list if condition(item)]

which means that result_list will be a list containing expression(item) (an expression computed from item) for each item element of original_list for which condition(item) (a boolean expression involving item) is True. The boolean condition which allows you to filer the list is optional.

For example, to compute the list of squares of elements in a list, instead of:

l = [1, 2, 3]
result = []
for i in l:
    result.append(i*i)

which is particularly inefficient because of the repeated use of the append function, one could use a list comprehension:

l = [1, 2, 3]
result = [i*i for i in l] # result = [1, 4, 9]

In addition to being shorter, the above code is also faster (around 3x improvement) because you build the list in one instruction.

Another example to compute the list of square roots of all non-negative elements of a list (you could get in big troubles computing the square root of a negative element):

from math import sqrt

l = [4, -3, 9]
result = [sqrt(i) for i in l if i >= 0] # result = [2, 3]

The same syntax also exists for dictionaries, this is called dict comprehensions (very original, isn't it?). For example, to transform a list of (name, phone number) pairs into a dictionary, for faster lookup:

l = [("Barthes", "+33 6 29 64 91 12"), ("John", "+001 650 472 4243")]
d = {name: phone for (name, phone) in l}

You can get more details about list comprehensions on the dedicated section of the official documentation.

The multiples faces of `in`

The in keyword has many different meanings and makes Python code so easy to write that people often forget to use it.

in gives a universal syntax to iterate over iterable objects. For example, to iterate over a list, instead of:

l = [1, 2, 3]
for i in range(len(l)):
    print l[i]

you could simply write:

l = [1, 2,3]
for i in l:
    print i

similarly, to iterate over a dictionary, instead of:

d = { ... }
for key in d.keys():
    print d[key]

you could write:

d = { ... }
for key in d:
    print d[key]

in also allows you to test whether an element belongs to some structure: list, dictionary (or any iterable object), occurrence of a substring inside a string. For example:
```
l = [line for line in open("server.log") if "Connected" in line]
```
will return the list of lines from the file server.log containing Connected as a substring.

Manipulating lists with atomic instructions

More generally, it is advised to avoid iterating over a list with a for loop. for loops are slow in Python and writing an operation over a list as a single instruction allows Python to optimize the execution of the code internally.

List comprehensions often help in replacing an iteration by a single instruction. Here are a few other functions which can be helpful in this regard:

join can be useful to format a list. For example, to print the list of words whose first letter is a in a list of words. Instead of:

l = [ ... ]
result = ""
for word in l:
    if word[0] == 'a':
        result += word + " "
print result

you could do:

l = [ ... ]
print " ".join([word for word in l if word[0] == 'a'])

sum, to sum the elements of a list.

map, to apply a given function to all elements in a list. For example to reverse all the words in a list:

l = ["Adam", "Eve"]

def reverse(word):
    return word[::-1]

m = map(reverse, l) # m = ['madA', 'evE']

slices are also very useful when it comes to manipulating lists (or sublists) in blocks. Remember that if l is a list (or any iterable) l[begin:end:step] will extract all the elements from index begin (included) to index end (excluded) with a step of step (this last parameter being optional).

If the begin parameter is omitted, it is given 0 as default value. Similarly, the default value of end when unspecified is len(l) (the numbers of elements in l). A negative value for begin or end will be subtracted from the end of the list. For example, to extract all the element but the last one:

l = [1, 2, 3]
m = l[:-1] # m = [1, 2]

Using a negative value for the step parameter can be useful to walk through an iterable object in reverse order as shown in the example given above to take the mirror image of a word:

word = "dumbo"
drow = word[::-1] # drow = "obmud"

which compensates for the scandalous lack of a reverse function for strings in Python.

Exceptions

Exceptions provide a powerful tool found in many high-level programming languages which is often under-used. They allow for a less defensive programming style by handling errors as they appear instead of making test beforehand to prevent them from happening.

In Python, every time you are trying to execute an illegal operation (e. g. trying to access an element outside a list's boundaries, dividing by zero, etc.) instead of simply crashing the program, Python raises an exception which can be caught, giving the programmer a last chance to fix the problem before the program ultimately crashes.

The syntax to catch exceptions in Python is the following:

try:
    .... # piece of code potentially raising the exception named Kaboum
except Kaboum:
    .... # piece of code to be executed if the above code raises Kaboum

For example, if a line of code contains a division by a number which could seldom be equal to zero, instead of systematically checking that the number is non zero, it is much more efficient to encapsulate the line within a try ... except ZeroDivisionError: to handle specifically the rare cases when the number will be zero. This is the well-known principle: better ask for absolution than permission.

Another example, when trying to access an unbound key in a dictionary, Python raises the KeyError exception. This exception can be used to initialize the value associated with the unbound key. For example, to compute a dictionary of word counts in a text, you can often find:

text = "..."
result = {}
for word in text.split():
    if word in result:
        result[word] += 1
    else:
        result[word] = 1

You could instead use the KeyError exception to your advantage to avoid the systematic if test:

test = "..."
result = {}
for word in text.split():
    try:
        result[word] += 1
    except KeyError:
        result[word] = 1

The difference with the previous code is that most of the time, this code will behave exactly as if the body of the for loop only contained the instruction result[word] += 1. This gives a significant speedup compared to the first code where a test was computed for each iteration of the loop.

See the dedicated page in the official documentation.

Values equivalent to `True` or `False`

If test is a boolean variable (equal to True or False), we know that it is redundant to write:

if test == True:
    ...

instead of:

if test:
    ...

More generally, Python has automatic conversion rules from standard types to booleans. This can be used to shorten the syntax in conditional tests:

as in the vast majority of programming languages, a positive integer is converted to True and zero is converted to False.
a string is converted to False if and only if it is empty. For example, to test whether a string title is empty, you can simply write:
```
if title:
    ...
```
instead of:
```
if len(title) > 0:
     ...
```
the None value, a constant used to initialize unspecified variables, is converted to False. To test that a variable var is not equal to None, you can write:
```
if not var:
    ...
```
Beware, the above code will not allow you to distinguish the case where var is None from the case where var has a value which is converted to False by Python (for example, an empty string or list). You need to be careful that this is really what you are trying to test.

Generators

Generators provide an easy way to create iterator objects (objects over which you can iterate) and can be created in several ways.

Generator expressions

Generators expressions are exactly similar to list comprehensions except that the brackets are replaced by parenthesis. Thus, the following code:

l = [1, 2, 3]
m = (i*i for i in l)
print '\n'.join(m)

would produce the exact same result had the second line been replaced by:

m = [i*i for i in l]

The difference between the two codes is that in the case where m is defined by a list comprehension the list is integrally computed and stored in memory when the variable m is defined. On the contrary, when m is defined by a generator expression, the elements in m are generated on the go when needed: only when trying to iterate over the variable m (as induced by the call to the join function in the above example) are the elements generated.

From the speed of execution point of view, both solutions are equivalent: in the end, each element in m will be computed once and only once. From the memory usage point of view however, generators present a clear advantage: because the elements are generated dynamically, one at a time, never more than one element is stored in memory at the same time. In cases when the list is too big to fit into memory, generators could be the solution.

When using a generator expression as the argument of a function, Python allows to drop one pair of parenthesis to make the code more readable. For example, in the following code:

l = [1, 2, 3]
total = sum((i*i for i in l))

the second line can be replaced by:

total = sum(i*i for i in l)

Generator functions

A second way to define a generator is by writing a function using the special keyword yield. When called, this function will return an iterable object whose behavior is the following: on each iteration step, the function is executed until a yield instruction is hit. The value following the yield keyword is returned and can be used during the iteration step. The execution of the function is frozen until the next iteration step.

For example, let us define the following function:

def min_max(filename):
    with open(filename) as f:
        for line in f:
            l = map(int, line.split())
            yield min(l), max(l)

When called, this function will produce an iterable object. When iterating over this object, at each iteration, one line of filename will be read, and the minimum and maximum values of this line will be returned when the yield keyword is reached, freezing the execution of the function until the next iteration.

Hence, the following code:

for (inf, sup) in min_max(filename):
    print (inf + sup)/2.

is exactly equivalent to:

with open(filename) as f:
    for line in f:
        l = map(int, line.split())
        inf, sup = min(l), max(l)
        print (inf + sup)/2.

but allows you to define separately the code which will generate the list of minimum and maximum values, and the code which makes use of the generated elements.

Built-in functions

Finally, some built-in functions in Python return generator objects. This is the case of the xrange function which behaves exactly as the range function. The difference is that range computes a list of integers whereas xrange defines a generator object generating the elements on the go, one at a time. A call to range(1000000000) might induce a memory error on your machine (depending on your memory capacity), but you will be fine using xrange, both calls being equivalent for iteration purposes. It is almost always more suitable to use xrange instead of range and in Python 3.x xrange has even been renamed to range.

Read more about generators on the official documentation.

Decorators

Decorators provide a very powerful way to alter the behavior of a function without redifining it. The syntax is the following:

@logging
def f(x):
    return x + 1

In the above example, we say that f has been decorated with logging. logging must be a function taking another function as an argument. The result of this decoration is equivalent to this piece of code:

def f(x):
    return x + 1

f = logging(f)

which means that by decorating f with logging, f now behaves as the composite function logging(f).

A simple decorator

Imagine that we want the logging decorator to log the calls made to the function it decorates, by printing them to the standard output. Such a decorator could be written like this

def logging(fun):
    def aux(*args, **kwargs):
        print "Calling", fun.__name__
        fun(*args, **kwargs)
    return aux

Because logging could be used to decorate any function, with an arbitrary number of arguments and keyword arguments, it is necessary to use the generic syntax aux(*args, **kwargs). This syntax stores all the arguments passed to aux in a list named args and all the keyword arguments in a dictionary named kwargs. Note that the exact same arguments are passed to fun, meaning that from the argument passing perspective, aux and fun will behave similarly. The difference being that aux logs the call to the standard output prior to doing the computation made in fun: this is how we expected the decorator to behave.

To be perfectly rigorous, the previous decorator should have been written like this:

from functools import wraps

def logging(fun):
    @wraps(fun)
    def aux(*args, **kwargs):
        print "Calling", fun.__name__
        fun(*args, **kwargs)
    return aux

aux is now itself decorated by the wraps decorator provided by the functools module. This decorators does some magic to ensure that aux behaves as closely as possible to fun. Without this decorator, the following code:

@logging
def f(x):
    return x + 1

print f.__name__

would print aux to the standard output, instead of the expected f. The wraps decorator ensures among other things that the __name__ attribute is preserved throughout a decoration.

Let us further assume that you want to extend the logging decorator to not only log the calls, but also keep track of how many times the function has been called.

You could be tempted to write something like:

from functools import wraps

def logging(fun):
    a = 0
    @wraps(fun)
    def aux(*args, **kwargs):
        a = a + 1
        print "{0} has been called {1} times".format(fun.__name__, a)
        fun(*args, **kwargs)
    return aux

However, if you apply this decorator to some function and then call it, you will get an angry face from Python complaining that the variable a is unbound. The problem comes from this line:

a = a + 1

Here, Python thinks you are redefining the variable a and forgets about its previous definition. As a consequence, when reaching the a + 1 part, a is no longer defined, causing the error. This is a current limitation of Python 2: local variables that have been defined outside the current scope are read-only.

A standard way to circumvent this limitation is to use a mutable structure for a: a itself cannot be redefined, but the structure it is pointed to can. Using this, the previous example can be rewritten as:

from functools import wraps

def logging(fun):
    a = [0]
    @wraps(fun)
    def aux(*args, **kwargs):
        a[0] = a[0] + 1
        print "{0} has been called {1} times".format(fun.__name__, a[0])
        fun(*args, **kwargs)
    return aux

where a points to a list of length 1 storing the number of calls at its first position.

Another example

A common example which is often used to illustrate decorators in Python is memoization: when a function is computation-heavy but often called using the same arguments, you can save a lot of time by caching past results returned by the function.

This idea can be nicely implemented in Python using a decorator. The decorator will store past results in a dictionary: when the decorated function will be called, the decorator will perform a lookup in its dictionary to check whether the function has already been called with the same argument. If the dictionary already contains an entry for this argument, the associated value is returned.

Here is how you could write such a decorator for a single argument function:

from functools import wraps

def memoize(fun):
    cache = {}
    @wraps(fun)
    def aux(x):
        if x in cache:
            return cache[x]
        else:
            a = fun(x)
            cache[x] = a
            return a
    return aux

Then if f is defined like this:

@memoize
def f(x):
    ... # very long and heavy computation

when calling f twice with the same argument, you will incur the computation cost only during the first call, the second call being almost instantaneous.

Classes with two methods

Let us briefly recall how classes work in Python. A class is defined like this:

class Cipher:

    def __init__(self, key):
        self.key = key

    def decrypt(self, message):
        return (message & self.key)

all the methods of a class take as their first argument the instance on which the method is being called. By convention, this first arguments is always named self. If a is an instance of Cipher, the instruction a.decrypt(message) is equivalent to decrypt(a, message).

The special function __init__ is the class constructor and is called every time an instance of the class is created. Its typical use is to initialize some attributes of the instance. An instance of Cipher class can be created like this:

d = Cipher(key)

A flaw commonly found in code written by people coming from object-oriented programing languages is to create classes for everything. This often leads to classes containing only two methods, one being the __init__ function. This is the case in the class written above as an example. By looking to this example a bit closer, you can see that it is possible to completely get rid of the class definition: a decrypt function taking the key as an additional argument is sufficient:

def decrypt(key, message):
    return (message & key)

Some people could object that it still makes sense to use a class in the example above, if we plan to extend the Cypher class in the future, for example by adding an encrypt function. In my opinion, it is better to start by writing your code as simply as possible. If you really need to extend the code, then you can start restructuring it and group several related functions in a class.

PEP 8

When writing about good practices in Python, it is impossible not to mention the PEP8. It is a set of recommendations on coding style in Python. These recommendations are of course not absolute rules and should be taken as advice. However, I noticed that following these recommendations generally leads to greater code readability. Moreover, as many people who code in Python also follow these recommendations, adopting them reduces the gap between your code and code written by others: this will save you some time when reading code.

Here are a few points extracted from the PEP8:

you should follow English typographic rules: no space before a colon, no space before a comma, but a space after, etc.
you should put spaces around operators like the equal sign, plus sign, etc.
you should try to limit the length of the lines of code to a maximum of 80 characters.

More details on the PEP8 page.