Nidelven IT - All about Python, Zope & Plone - and Open Source!

Here you'll find issues related to our services. Mostly about Python, Zope and Plone, as well as hosting-related issues.

"Keeping IT real"






Older entries



Atom - Subscribe - Categories
Previous | Next

List comprehensions vs filter and for loop [update 2]

I've been looking more into the use of list comprehensions lately, as I was looking to learn a bit more about Python. I think list comprehensions "look good" but wondered a bit about why it was necessary to add more variation to the Python syntax (thinking less is more).

One of the arguments for list comprehensions is that is executes faster than a for loop or a run through the filter function for example.

Well, OK. So I sat down and wrote this script, called list_comprehensions.py (needs this name to work properly).

import timeit

def simple_range_lc():
    return [x for x in range(100) if not x%2]

def simple_range_filter():
    return filter(lambda x: not x%2, range(100))

def simple_range_for():
    result = []
    for x in range(100):
        if not x%2:
            result.append(x)
    return result

if __name__ == '__main__':
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_lc()')
    print 'list comprehensions', runner.timeit()
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_filter()')
    print 'filter', runner.timeit()
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_for()')
    print 'for loop', runner.timeit()

Running this script on my laptop with nothing else big running yields these results:

list comprehensions 20.1266300678
filter 31.8105700016
for loop 26.9202849865

and the percentage difference is pretty consistent. This tells me that filter uses 50% more time than list comprehensions and a for loop 25%.

I was surprised that the filter function used 50% more time, but that's probably because it is doing a function call.

So, some food for thought, and a good reason to start using list comprehensions more.

[Later..]

DaveE suggested running a test with generators as well, with this snippet:

def simple_gen():
    def gen():
        for x in range(100):
            if not x%2:
                yield x
    return list(gen())

The timing results now were:

list comprehensions 19.3058991432
filter 31.7537689209
for loop 25.6983029842
generator 25.2884697914

Interestingly the generator is a wee bit faster than a pure for loop, I'd like to see some suggestions on why this is. :)

[Even later..]

Nathan Wright wrote in and suggested that the list append method lookup could be sped up, and it does speed things up just a little bit, see the simple_range_for2 function below. Stephen Simmons quite correctly pointed out that not all the functions used a function to test if (not x%2) was true. This was discussed in the blog post as well, that being able to "embed code" instead of doing a function call was a benefit of list comprehensions (and for loops as well).

See the updated code below and the results now that all iterators are using a function call.

import timeit

def simple_range_lc():
    test = lambda x: not x%2
    return [x for x in range(100) if test(x)]

def simple_range_filter():
    return filter(lambda x: not x%2, range(100))

def simple_range_for():
    result = []
    test = lambda x: not x%2
    for x in range(100):
        if test(x):
            result.append(x)
    return result

def simple_range_gen():
    def gen():
        test = lambda x: not x%2
        for x in range(100):
            if test(x):
                yield x
    return list(gen())

def simple_range_for2():
    result = []
    result_append = result.append
    test = lambda x: not x%2
    for x in range(100):
        if test(x):
            result_append(x)
    return result

if __name__ == '__main__':
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_lc()')
    print 'list comprehensions', runner.timeit()
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_filter()')
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_for()')
    print 'for loop', runner.timeit()
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_for2()')
    print 'for loop 2', runner.timeit()
    print 'filter', runner.timeit()
    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_gen()')
    print 'generator', runner.timeit()

And the timing results this time with all tests being function calls:

list comprehensions 38.9694008827
for loop 42.6255960464
for loop 2 39.0485188961
filter 38.9864330292
generator 45.6232538223

Which shows that things are pretty much the same when all sequence handlers are using a function call to test whether to include the item in the new sequence. Skipping the attribute lookup in simple_range_for2 gives a few % speed up. But this is interesting, it seems to me that list comprehensions should be used where possible, as they make it possible to "embed" expressions instead of using function calls.



[Permalink] [By morphex] [Python-only or > 0.5 Python related (Atom feed)] [2012 15 Jun 15:59 GMT+2]

Comments

Generators

By: DaveE

How about generators?

def simple_gen():
    def gen():
        for x in range(100):
            if not x%2:
                yield x
    return list(gen())


I don't know how favorably that would compare to the range-for time.

Numpy?

By: faucon

How about

from numpy import arange

def numpy_range():
    return arange(0,100,2)

for me this is about 10 times faster then a list comprehension.

Readability

By: David Avraamides

I use list comprehensions primarily for readability. I'm not sure performance should be a factor here. In most cases, each of the variations you tried are all going to be very fast, and if they are slow, it's going to be because of the calculation within the loop or comprehension, not the loop construct itself.

OK

By: Morten W. Petersen

DaveE, I'll have a look at running a generator-based one as well.

faucon: The point was to illustrate the speed difference when working with sequences, but OK, nice to know. :)

David: Yes the speed of these sequence handlers may be moot, but I think it strengthens the argument for using list comprehensions. List comprehensions has its limitations, for example a for loop has the else clause which can be used when the loop hasn't been exited with a break statement. But I'll try to use list comprehensions from now on because of readability, and speed. If I'm not mistaken, list comprehensions are faster than the filter approach because the statements can be evaluated without a function call.

range

By: brentp

if you're going to do the numpy one, then might as well do:

    range(0, 100, 2)

Attribute lookup ahead of time

By: Nathan Wright

The for loop is slower in this example, sure, but I wonder how much of that cost is the result.append attribute lookup that happens on each hit? How about this?

def simple_range_for2():
    result = []
    result_append = result.append
    for x in range(100):
        if not x%2:
            result_append(x)
    return result

I think it'll still be slower because you still have to call the append function call each time, but it should be a bit closer at least.

Your examples are testing function calls rather than loop speed

By: Stephen Simmons

Your results indicate the number of function calls in each example rather than the intrinsic speed of the looping constructs.

Here are the results I got with ipython's timeit function:

>>> timeit [ x for x in range(100) if not x%2 ]
100000 loops, best of 3: 15.3 us per loop

xrange() makes it slightly quicker:

>>> timeit [ x for x in xrange(100) if not x%2 ]
100000 loops, best of 3: 14.4 us per loop

Now make a filter function so we can see how much the function call affects the loop speed:

>>> f = lambda x: not x%2
>>> timeit [ x for x in xrange(100) if f(x) ]
10000 loops, best of 3: 27.1 us per loop

Hmmm, that halves the speed of the list comprehension.

This should be our baseline for testing list comprehension speed. Now try it with filter():

>>> timeit filter(f, xrange(100) )
100000 loops, best of 3: 19.1 us per loop

So filter is substantially faster than a list comprehension, all other factors being equal. The list comprehension is only faster if the comprehension syntax lets us avoid a function call.

Attribute lookup and function calls

By: Morten W. Petersen

Nathan, yes the attribute lookup makes a big difference, I'll include your function and the timing results in an update.

Stephen, yes, the function calls could make a big difference, I'm including this in the update.

I'll update with both of these comments shortly. :)

Results messed up

By: Gabriel Genellina

Is the above the actual code executed? I'm afraid you messed up the print statements and the results are not meaningful.

Also, as someone already pointed out, you're basically measuring the cost of function calls. I choose list comprehensions because of readability, not speed. (If you're overly concerned about speed, Python is not the right language for the task.)

Results messed up yes

By: Morten W. Petersen

Gabriel,

right you are. One of the timers was run 2 times, here are the correct results:

list comprehensions 39.3512871265
for loop 43.52003479
for loop 2 39.9887089729
filter 30.7876300812
generator 45.8817689419

Which shows that filter is the fastest when all tests are using a function to test not x%2.

So this

    runner = timeit.Timer(setup='import list_comprehensions',
        stmt='list_comprehensions.simple_range_filter()')

has to be inserted after this:

    print 'for loop 2', runner.timeit()

Which shows that filter is indeed fastest when all tests are using a function.. it makes me think that it would be nice to be able to pass just an expression to filter and map and see what the times are then.

The thing I'm trying to figure out, is how to do Python programming in the most sensible way, what I should do by default.. up until now I never paid much attention to list comprehensions (...) because whatever I did worked and was fast enough. Yes, I don't program in Python because of the speed of Python but because of the libraries and frameworks that are available, and the ease of using Python (reading and writing).

I leave you with the last results where I'm running python with nice to ensure that the process gets as much CPU as possible and functions are replaced by statements where possible:

sudo nice -n -20 python list_comprehensions.py

list comprehensions 19.2929069996
for loop 25.8317708969
for loop 2 21.9519889355
filter 31.5145151615
generator 25.0801270008

The conclusion is: Use list comprehensions where possible, because they "look good" and they are the fastest. If the program is too slow and you find a particular method / segment of code that is the culprit, try using map or filter. For loops and generators have their uses, for example when using break/else in a for loop or generating parts of a (huge) sequence.


Add comment (text format)

Passphrase

A passphrase is required to comment on this weblog. It is required to make sure that bots aren't doing automatic spamming. It is: nit is the best!.

Title

Name

Email

Comment