List comprehensions vs filter and for loop [update 2]
I've been looking more into the use of list comprehensions lately, as I was looking to learn a bit more about Python. I think list comprehensions "look good" but wondered a bit about why it was necessary to add more variation to the Python syntax (thinking less is more).
One of the arguments for list comprehensions is that is executes faster than a for loop or a run through the filter function for example.
Well, OK. So I sat down and wrote this script, called list_comprehensions.py (needs this name to work properly).
import timeit def simple_range_lc(): return [x for x in range(100) if not x%2] def simple_range_filter(): return filter(lambda x: not x%2, range(100)) def simple_range_for(): result = [] for x in range(100): if not x%2: result.append(x) return result if __name__ == '__main__': runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_lc()') print 'list comprehensions', runner.timeit() runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_filter()') print 'filter', runner.timeit() runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_for()') print 'for loop', runner.timeit()
Running this script on my laptop with nothing else big running yields these results:
list comprehensions 20.1266300678
filter 31.8105700016
for loop 26.9202849865
and the percentage difference is pretty consistent. This tells me that filter uses 50% more time than list comprehensions and a for loop 25%.
I was surprised that the filter function used 50% more time, but that's probably because it is doing a function call.
So, some food for thought, and a good reason to start using list comprehensions more.
[Later..]
DaveE suggested running a test with generators as well, with this snippet:
def simple_gen(): def gen(): for x in range(100): if not x%2: yield x return list(gen())
The timing results now were:
list comprehensions 19.3058991432 filter 31.7537689209 for loop 25.6983029842 generator 25.2884697914
Interestingly the generator is a wee bit faster than a pure for loop, I'd like to see some suggestions on why this is. :)
[Even later..]
Nathan Wright wrote in and suggested that the list append method lookup could be sped up, and it does speed things up just a little bit, see the simple_range_for2 function below. Stephen Simmons quite correctly pointed out that not all the functions used a function to test if (not x%2) was true. This was discussed in the blog post as well, that being able to "embed code" instead of doing a function call was a benefit of list comprehensions (and for loops as well).
See the updated code below and the results now that all iterators are using a function call.
import timeit def simple_range_lc(): test = lambda x: not x%2 return [x for x in range(100) if test(x)] def simple_range_filter(): return filter(lambda x: not x%2, range(100)) def simple_range_for(): result = [] test = lambda x: not x%2 for x in range(100): if test(x): result.append(x) return result def simple_range_gen(): def gen(): test = lambda x: not x%2 for x in range(100): if test(x): yield x return list(gen()) def simple_range_for2(): result = [] result_append = result.append test = lambda x: not x%2 for x in range(100): if test(x): result_append(x) return result if __name__ == '__main__': runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_lc()') print 'list comprehensions', runner.timeit() runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_filter()') runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_for()') print 'for loop', runner.timeit() runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_for2()') print 'for loop 2', runner.timeit() print 'filter', runner.timeit() runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_gen()') print 'generator', runner.timeit()
And the timing results this time with all tests being function calls:
list comprehensions 38.9694008827 for loop 42.6255960464 for loop 2 39.0485188961 filter 38.9864330292 generator 45.6232538223
Which shows that things are pretty much the same when all sequence handlers are using a function call to test whether to include the item in the new sequence. Skipping the attribute lookup in simple_range_for2 gives a few % speed up. But this is interesting, it seems to me that list comprehensions should be used where possible, as they make it possible to "embed" expressions instead of using function calls.
[Permalink] [By morphex] [Python-only or > 0.5 Python related (Atom feed)] [2012 15 Jun 15:59 GMT+2]
Comments
GeneratorsBy: DaveE How about generators? def simple_gen(): def gen(): for x in range(100): if not x%2: yield x return list(gen()) I don't know how favorably that would compare to the range-for time. |
Numpy?By: faucon How about from numpy import arange def numpy_range(): return arange(0,100,2) for me this is about 10 times faster then a list comprehension. |
ReadabilityBy: David Avraamides I use list comprehensions primarily for readability. I'm not sure performance should be a factor here. In most cases, each of the variations you tried are all going to be very fast, and if they are slow, it's going to be because of the calculation within the loop or comprehension, not the loop construct itself. |
OKBy: Morten W. Petersen DaveE, I'll have a look at running a generator-based one as well. faucon: The point was to illustrate the speed difference when working with sequences, but OK, nice to know. :) David: Yes the speed of these sequence handlers may be moot, but I think it strengthens the argument for using list comprehensions. List comprehensions has its limitations, for example a for loop has the else clause which can be used when the loop hasn't been exited with a break statement. But I'll try to use list comprehensions from now on because of readability, and speed. If I'm not mistaken, list comprehensions are faster than the filter approach because the statements can be evaluated without a function call. |
rangeBy: brentp if you're going to do the numpy one, then might as well do: range(0, 100, 2) |
Attribute lookup ahead of timeBy: Nathan Wright The for loop is slower in this example, sure, but I wonder how much of that cost is the result.append attribute lookup that happens on each hit? How about this? def simple_range_for2(): result = [] result_append = result.append for x in range(100): if not x%2: result_append(x) return result I think it'll still be slower because you still have to call the append function call each time, but it should be a bit closer at least. |
Your examples are testing function calls rather than loop speedBy: Stephen Simmons Your results indicate the number of function calls in each example rather than the intrinsic speed of the looping constructs. Here are the results I got with ipython's timeit function: >>> timeit [ x for x in range(100) if not x%2 ] 100000 loops, best of 3: 15.3 us per loop xrange() makes it slightly quicker: >>> timeit [ x for x in xrange(100) if not x%2 ] 100000 loops, best of 3: 14.4 us per loop Now make a filter function so we can see how much the function call affects the loop speed: >>> f = lambda x: not x%2 >>> timeit [ x for x in xrange(100) if f(x) ] 10000 loops, best of 3: 27.1 us per loop Hmmm, that halves the speed of the list comprehension. This should be our baseline for testing list comprehension speed. Now try it with filter(): >>> timeit filter(f, xrange(100) ) 100000 loops, best of 3: 19.1 us per loop So filter is substantially faster than a list comprehension, all other factors being equal. The list comprehension is only faster if the comprehension syntax lets us avoid a function call. |
Attribute lookup and function callsBy: Morten W. Petersen Nathan, yes the attribute lookup makes a big difference, I'll include your function and the timing results in an update. Stephen, yes, the function calls could make a big difference, I'm including this in the update. I'll update with both of these comments shortly. :) |
Results messed upBy: Gabriel Genellina Is the above the actual code executed? I'm afraid you messed up the print statements and the results are not meaningful. Also, as someone already pointed out, you're basically measuring the cost of function calls. I choose list comprehensions because of readability, not speed. (If you're overly concerned about speed, Python is not the right language for the task.) |
Results messed up yesBy: Morten W. Petersen Gabriel, right you are. One of the timers was run 2 times, here are the correct results: list comprehensions 39.3512871265 for loop 43.52003479 for loop 2 39.9887089729 filter 30.7876300812 generator 45.8817689419 Which shows that filter is the fastest when all tests are using a function to test not x%2. So this runner = timeit.Timer(setup='import list_comprehensions', stmt='list_comprehensions.simple_range_filter()') has to be inserted after this: print 'for loop 2', runner.timeit() Which shows that filter is indeed fastest when all tests are using a function.. it makes me think that it would be nice to be able to pass just an expression to filter and map and see what the times are then. The thing I'm trying to figure out, is how to do Python programming in the most sensible way, what I should do by default.. up until now I never paid much attention to list comprehensions (...) because whatever I did worked and was fast enough. Yes, I don't program in Python because of the speed of Python but because of the libraries and frameworks that are available, and the ease of using Python (reading and writing). I leave you with the last results where I'm running python with nice to ensure that the process gets as much CPU as possible and functions are replaced by statements where possible: sudo nice -n -20 python list_comprehensions.py list comprehensions 19.2929069996 for loop 25.8317708969 for loop 2 21.9519889355 filter 31.5145151615 generator 25.0801270008 The conclusion is: Use list comprehensions where possible, because they "look good" and they are the fastest. If the program is too slow and you find a particular method / segment of code that is the culprit, try using map or filter. For loops and generators have their uses, for example when using break/else in a for loop or generating parts of a (huge) sequence. |