Using Python Language Features and the Standard Library
Last updated on 2025-03-24 | Edit this page
Overview
Questions
- Why are Python loops slower than specialised functions?
- How can I make my code more readable and faster?
Objectives
- Able to utilise Python language features effectively
- Able to search Python documentation for functionality available in built-in types and in the standard library
- Able to identify when Python code can be rewritten to perform execution in the back-end.
Before we look at data structures, algorithms and third-party libraries, it’s worth reviewing the fundamentals of Python. If you’re familiar with other programming languages, like C or Delphi, you might not know the Pythonic approaches. Whilst you can write Python in a way similar to other languages, it is often more effective to take advantage of Python’s principles and idioms.
Built-in Functions
For example, you might think to sum a list of numbers by using a for
loop, as would be typical in C, as shown in the function
manualSumC()
and manualSumPy()
below.
PYTHON
import random
from timeit import timeit
N = 100000 # Number of elements in the list
# Ensure every list is the same
random.seed(12)
my_data = [random.random() for i in range(N)]
def manualSumC():
n = 0
for i in range(len(my_data)):
n += my_data[i]
return n
def manualSumPy():
n = 0
for evt_count in my_data:
n += evt_count
return n
def builtinSum():
return sum(my_data)
repeats = 1000
print(f"manualSumC: {timeit(manualSumC, globals=globals(), number=repeats):.3f}ms")
print(f"manualSumPy: {timeit(manualSumPy, globals=globals(), number=repeats):.3f}ms")
print(f"builtinSum: {timeit(builtinSum, globals=globals(), number=repeats):.3f}ms")
Even just replacing the iteration over indices (which may be a habit
you’ve picked up if you first learned to program in C) with a more
pythonic iteration over the elements themselves speeds up the code by
about 2x. But even better, by switching to the built-in
sum()
function our code becomes about 8x faster and much
easier to read while doing the exact same operation!
OUTPUT
manualSumC: 1.624ms
manualSumPy: 0.740ms
builtinSum: 0.218ms
This is because built-in functions (i.e. those that are available without importing packages) are typically implemented in the CPython back-end, so their performance benefits from bypassing the Python interpreter.
In particular, those which are passed an iterable
(e.g. lists) are likely to provide the greatest benefits to performance.
The Python documentation provides equivalent Python code for many of
these cases.
-
all()
: boolean and of all items -
any()
: boolean or of all items -
max()
: Return the maximum item -
min()
: Return the minimum item -
sum()
: Return the sum of all items
This is a nice illustration of the principle we discussed earlier: It is often best to tell the interpreter/library at a high level what you want, and let it figure out how to do it.
Example: Searching an element in a list
A simple example of this is performing a linear search on a list.
(Though as we’ll see in the next section, this isn’t the most efficient
approach!) In the following example, we create a list of 2500 integers
in the (inclusive-exclusive) range [0, 5000)
. The goal is
to search for all even numbers within that range.
The function manualSearch()
manually iterates through
the list (ls
) and checks each individual item using Python
code. On the other hand, operatorSearch()
uses the
in
operator to perform each search, which allows CPython to
implement the inner loop in its C back-end.
PYTHON
import random
N = 2500 # Number of elements in list
M = 2 # N*M == Range over which the elements span
def generateInputs():
random.seed(12) # Ensure every list is the same
return [random.randint(0, int(N*M)) for i in range(N)]
def manualSearch():
ls = generateInputs()
ct = 0
for i in range(0, int(N*M), M):
for j in range(0, len(ls)):
if ls[j] == i:
ct += 1
break
def operatorSearch():
ls = generateInputs()
ct = 0
for i in range(0, int(N*M), M):
if i in ls:
ct += 1
repeats = 1000
gen_time = timeit(generateInputs, number=repeats)
print(f"manualSearch: {timeit(manualSearch, number=repeats)-gen_time:.2f}ms")
print(f"operatorSearch: {timeit(operatorSearch, number=repeats)-gen_time:.2f}ms")
This results in the manual Python implementation being 5x slower, doing the exact same operation!
OUTPUT
manualSearch: 152.15ms
operatorSearch: 28.43ms
An easy approach to follow is that if two blocks of code do the same operation, the one that contains less Python is probably faster. This won’t apply if you’re using 3rd party packages written purely in Python though.
Python bytecode
You can use dis
to view the bytecode generated by
Python, the amount of byte-code more strongly correlates with how much
code is being executed by the Python interpreter. However, this still
does not account for whether functions called are implemented using
Python or C.
The pure Python search compiles to 82 lines of byte-code.
PYTHON
import dis
def manualSearch():
ls = generateInputs()
ct = 0
for i in range(0, int(N*M), M):
for j in range(0, len(ls)):
if ls[j] == i:
ct += 1
break
dis.dis(manualSearch)
OUTPUT
11 0 LOAD_GLOBAL 0 (generateInputs)
2 CALL_FUNCTION 0
4 STORE_FAST 0 (ls)
12 6 LOAD_CONST 1 (0)
8 STORE_FAST 1 (ct)
13 10 LOAD_GLOBAL 1 (range)
12 LOAD_CONST 1 (0)
14 LOAD_GLOBAL 2 (int)
16 LOAD_GLOBAL 3 (N)
18 LOAD_GLOBAL 4 (M)
20 BINARY_MULTIPLY
22 CALL_FUNCTION 1
24 LOAD_GLOBAL 4 (M)
26 CALL_FUNCTION 3
28 GET_ITER
>> 30 FOR_ITER 24 (to 80)
32 STORE_FAST 2 (i)
14 34 LOAD_GLOBAL 1 (range)
36 LOAD_CONST 1 (0)
38 LOAD_GLOBAL 5 (len)
40 LOAD_FAST 0 (ls)
42 CALL_FUNCTION 1
44 CALL_FUNCTION 2
46 GET_ITER
>> 48 FOR_ITER 14 (to 78)
50 STORE_FAST 3 (j)
15 52 LOAD_FAST 0 (ls)
54 LOAD_FAST 3 (j)
56 BINARY_SUBSCR
58 LOAD_FAST 2 (i)
60 COMPARE_OP 2 (==)
62 POP_JUMP_IF_FALSE 38 (to 76)
16 64 LOAD_FAST 1 (ct)
66 LOAD_CONST 2 (1)
68 INPLACE_ADD
70 STORE_FAST 1 (ct)
17 72 POP_TOP
74 JUMP_FORWARD 1 (to 78)
15 >> 76 JUMP_ABSOLUTE 24 (to 48)
>> 78 JUMP_ABSOLUTE 15 (to 30)
13 >> 80 LOAD_CONST 0 (None)
82 RETURN_VALUE
Whereas the in
variant only compiles to 54.
PYTHON
import dis
def operatorSearch():
ls = generateInputs()
ct = 0
for i in range(0, int(N*M), M):
if i in ls:
ct += 1
dis.dis(operatorSearch)
OUTPUT
4 0 LOAD_GLOBAL 0 (generateInputs)
2 CALL_FUNCTION 0
4 STORE_FAST 0 (ls)
5 6 LOAD_CONST 1 (0)
8 STORE_FAST 1 (ct)
6 10 LOAD_GLOBAL 1 (range)
12 LOAD_CONST 1 (0)
14 LOAD_GLOBAL 2 (int)
16 LOAD_GLOBAL 3 (N)
18 LOAD_GLOBAL 4 (M)
20 BINARY_MULTIPLY
22 CALL_FUNCTION 1
24 LOAD_GLOBAL 4 (M)
26 CALL_FUNCTION 3
28 GET_ITER
>> 30 FOR_ITER 10 (to 52)
32 STORE_FAST 2 (i)
7 34 LOAD_FAST 2 (i)
36 LOAD_FAST 0 (ls)
38 CONTAINS_OP 0
40 POP_JUMP_IF_FALSE 25 (to 50)
8 42 LOAD_FAST 1 (ct)
44 LOAD_CONST 2 (1)
46 INPLACE_ADD
48 STORE_FAST 1 (ct)
>> 50 JUMP_ABSOLUTE 15 (to 30)
6 >> 52 LOAD_CONST 0 (None)
54 RETURN_VALUE
Example: Parsing data from a text file
In C, since there is no high-level string
datatype,
parsing strings can be fairly arduous work where you repeatedly look for
the index of a separator character in the string and use that index to
split the string up.
Challenge
Let’s say we have read in some data from a text file, each line containing a time bin and a mean energy:
If you’ve a C programming background, you may write the following code to parse the data into a dictionary:
PYTHON
def manualSplit():
data = {}
for line in f:
first_char = line.find("0")
end_time = line.find(" ", first_char, -1)
energy_found = line.find(".", end_time, -1)
begin_energy = line.rfind(" ", end_time, energy_found)
end_energy = line.find(" ", energy_found)
if end_energy == -1:
end_energy = len(line)
time = line[first_char:end_time]
energy = line[begin_energy + 1:end_energy]
data[time] = energy
return data
Can you find a shorter, more easily understandable way to write this in Python?
Python strings have a lot of methods to perform common operations, like removing suffixes, replacing substrings, joining or splitting, stripping whitespaces, and much more. See Python’s string methods documentation for a full list.
PYTHON
def builtinSplit():
data = {}
for line in f:
time, energy = line.split()
data[time] = energy
return data
This code is not just much more readable; it is also more flexible,
since it does not rely on the precise formatting of the input strings.
(For example, the line first_char = line.find("0")
in the
original code assumes that the time bin starts with the digit 0. That
code would likely malfunction if the input file had more than 1000 time
bins.)
The code that’s executed by CPython may use a similar approach as in
manualSplit()
; however, since this is all happening “under
the hood” in C code, it is once again faster.
PYTHON
N = 10_000 # Number of elements in the list
# Ensure every list is the same
random.seed(12)
f = [f" {i:0>6d} {random.random():8.4f} " for i in range(N)]
repeats = 1000
print(f"manualSplit: {timeit(manualSplit, globals=globals(), number=repeats):.3f}ms")
print(f"builtinSplit: {timeit(builtinSplit, globals=globals(), number=repeats):.3f}ms")
OUTPUT
manualSplit: 1.797ms
builtinSplit: 0.796ms
Challenge
If you’ve brought a project you want to work on: Do you have any similar code in there, which is hard to understand because it contains a lot of low-level step-by-step instructions?
(Before you try to rewrite those parts of your code, use a profiler to see whether those parts have a noticeable impact on the overall performance of your project. Remember the Donald Knuth quote!)