Instructor Notes
This is a placeholder file. Please add content here.
Introduction to Profiling
Instructor Note
The bottlenecked implementation was naively parsing a 10MB JSON file to create a list of unique items.
Repeatedly:
- Checking the length of (C) strings, e.g. iterating till the terminating character is found, resolved by caching the results.
- Performing a linear search of a list to check for duplicates before
inserting, resolved by using an appropriate data structure (dictionary).
- Allegedly duplicates were never even present in the JSON.
Why wasn’t this caught by one of the hundreds of developers with access to the source code?
Was more money saved by not investigating performance than committing time to profiling and fixing the issue?
Function Level Profiling
Instructor Note
It can help to run these examples by running snakeviz
live. For the worked example you may wish to also show the code (e.g. in
split screen).
Demonstrate features such as moving up/down the call-stack by clicking the boxes and changing the depth and cutoff via the dropdown.
Download pre-generated profile reports:
snakeviz example screenshot: files/schelling_out.prof
Worked example: files/snakeviz-worked-example/out.prof
Instructor Note
Demonstrate this!
Instructor Note
Arguments 1-9 passed to travellingsales.py
should
execute relatively fast (less than a minute)
This will be slower via the profiler, and is likely to vary on different hardware.
Larger values should be avoided.
Download the set of profiles for arguments 1-10, these can be opened
by passing the directory to snakeviz
.
Instructor Note
The default configuration of the Predator Prey model takes around 10 seconds to run, it may be slower on other hardware.
Download the pre-generated cProfile
output, this can be
opened with snakeviz
to save waiting for the profiler.
Break
Line Level Profiling
Instructor Note
Download the pre-generated line_profiler
output, this
can be opened be to save waiting for the profiler.
Profiling Conclusion
Introduction to Optimisation
Instructor Note
- Fixtures: A test fixture is a common class which multiple tests can inherit from. This class will typically include methods that perform common initialisation and teardown actions around the behaviour to be tested. This reduces repeated code.
- Mocking: If you wish to test a feature which would relies on a live or temperamental service, such as making API calls to a website. You can mock that API, so that when the test runs synthetic responses are produced rather than the real API being used.
- Test skipping: You may have configurations of your software that cause certain tests to be unsupported. Skipping allows conditions to be added to tests, to decide whether they should be executed or skipped.
Using Python Language Features and the Standard Library
Instructor Note
This episode discusses relatively fundamental features of Python.
For students experienced with writing Python, many of these points may be unnecessary. However, self-taught students—especially if they have previously studied lower-level languages with a less powerful standard library—may have adopted “unpythonic” habits and will particularly benefit from this section.
Data Structures & Algorithms
Instructor Note
The important information for students to learn within this episode are the patterns demonstrated via the benchmarks.
This episode introduces many complex topics, these are used to ground the performant patterns in understanding to aid memorisation.
It should not be a concern to students if they find the data-structure/algorithm internals challenging, if they are still able to recognise the demonstrated patterns.
Instructor Note
The large bookcases in the second illustration, with many shelves almost empty, take up a lot more space than the single shelf in the first illustration. This may also be interpreted as the dictionary using more memory than a list.
In principle, this is correct. However:
- The actual difference is much less pronounced than in the illustration. (A list requires about 8 bytes to keep track of each item, while a dictionary requires about 30 bytes.)
- In most cases this net size of the list/dictionary itself is negligibly small compared to the size of the objects stored in the list or dictionary (e.g. 41 bytes for an empty string or 112 bytes for an empty NumPy array).
In practice, therefore, this trade-off between memory usage and speed is usually worth it.
Break
Using Scientific Python Packages (NumPy, Pandas and more)
Instructor Note
A simple analogy:
If you’re baking cookies, the oven (CPU register) is big enough to operate on multiple cookies (numbers) simultaneously. So whether you bake 1 cookie or 10, it’ll take exactly the same amount of time. However, this requires that the cookies are neatly arranged on a baking tray (in a contiguous chunk of memory).
Basic ints/floats in NumPy arrays are arranged like that, so this works great. In contrast, numbers in a Python list are spread across memory in a fairly complex arrangement, so cannot benefit from this unless you convert them to a NumPy array first.
Instructor Note
The following code snippet demonstrates how this works for a simplified example.
PYTHON
>>> from shapely import Point, Polygon
>>> import numpy as np
>>> polygon = Polygon([(0,0), (1,0), (1,1), (0,1), (0,0)])
>>> points_array = np.array((Point(0.1, 0.1), Point(0.5, 0.5), Point(2, 2)))
>>> point_names_array = np.array(("P1: Periphery", "P2: Centre", "P3: Outside"))
>>> points_in_polygon_idx = polygon.contains(points_array)
>>> points_in_polygon_idx
array([ True, True, False])
>>> points_in_polygon = point_names_array[points_in_polygon_idx]
>>> points_in_polygon
array(['P1: Periphery', 'P2: Centre'], dtype='<U13')
>>> points_in_polygon.tolist()
['P1: Periphery', 'P2: Centre']