If you’ve been dabbling in programming for a while you may have heard of “linting your code” which is a process of static code analysis to remove the “fluff” from your code. Just as physically linting your clothes removes unwanted fluff, linting your code removes “fluff” and can help…
This helps reduce the technical debt which impacts the amount of time required for maintenance and further development of a code base. The main focus of this article is the use of linting to ensure consistent coding style, it focuses on Python under Linux but similar tools are available for other operating systems and languages.
What has style got to do with writing code? Trends come and go in fashion but coding styles are meant to be relatively static and not change with the season, although they can and do evolve over time. This is because using a consistent and widely used style when writing code makes it easier for other people, often your future self, to read and understand the code you have written. If code is easier to understand then its easier to modify, update, extend, improve and in general maintain.
A useful insight from Gudio van Rossum, the creator of Python is that “code is read much more often than it is written” and so it should be easy to understand and not obfuscate its intent. Python is quite good for this as it is an expressive language which encourages coders to be explicit when naming variables, functions, classes and so forth so that their purpose and intention is clear, although the same is true of most modern languages. However, going a step further and using consistent styles to format and layout code helps enhance this.
The most widely used Python style is defined in the long established PEP 8: The Style Guide for Python Code. There are a number of tools available that will lint your Python code for you and most integrate with your IDE, whether that is Visual Studio Code, PyCharm or Emacs. Some of the formatting and linting tools available for Python are…
Pylint - checks for errors in Python code, tries to enforce a coding standard and looks for code smells.
YAPF - takes the code and reformats it to the best formatting that conforms to the style guide.
Black - The Uncompromising Code Formatter
Flake8 - Your Tool For Style Guide Enforcement
Prospector - Python Static Analysis
mypy - Optional Static Typing for Python
Here we will work through linting and formatting the simple file below (available as a download here) using PyLint and Black.
import numpy as np
from pathlib import Path
from typing import Union
import csv
def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
"""Save a list of random numbers (floats) to the given file.
The stated number of random numbers will be saved to the given target file, if the directory structure
doesn't exist it will be created. Output will by default be over-written.
Parameters
----------
size : int
Number of random numbers to generate
seed: int
Seed for random number generation
save_as : Union[str, Path]
Directory/file to save numbers to.
"""
rng = np.random.default_rng()
random_numbers = rng.random(size)
with Path(save_as).open('w') as out:
writer = csv.write(out)
writer.writerows(random_numbers)
We will lint this file using Pylint to find out what errors there are and how its style can be improved to conform with PEP8 guidelines.
First you need to install pylint
, typically in your virtual
environment.
pip install pylint
Pylint can be configured using a ~/.pylintrc
file in your home directory and over time this will grow as you customise your
configuration but for now we will make one simple change from the default which is to increase the accepted line
length. Create the file and save it with the following content.
[FORMAT]
## Maximum number of characters on a single line.
max-line-length=120
Open a terminal and navigate to the location you saved the example file save_random_numbers.py
activate the virtual
environment you installed pylint under if its not already being used and then type the following to run Pylint against
your code…
pylint save_random_numbers.py
You should see output similar to the following…
❱ pylint save_random_numbers.py
************* Module save_random_numbers
save_random_numbers.py:1:0: C0114: Missing module docstring (missing-module-docstring)
save_random_numbers.py:5:66: E0602: Undefined variable 'Union' (undefined-variable)
save_random_numbers.py:5:35: W0613: Unused argument 'seed' (unused-argument)
save_random_numbers.py:2:0: C0411: standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
save_random_numbers.py:3:0: C0411: standard import "import csv" should be placed before "import numpy as np" (wrong-import-order)
-------------------------------------------------------------------
Your code has been rated at 0.00/10
The output tells us which module has been inspected on the first line. Each subsequent line indicates
At the moment we are only looking at one file, but when using PyLint against larger code bases this information is vital
in helping direct you to the location of code that needs changing. At the end PyLint rates your code, ideally you should aim to get a score of 10.0/10
.
The messages are quite informative, taking each in turn we can work through resolving them.
Missing module docstring (missing-module-docstring)
Each Python module should have a docstring as the very first line that describes what it does. In this example it might be considered superflous but its good practice to get in the habit of writing these as it comes in useful when documentation is automatically generated from the docstrings in the code. To fix it we can add a short docstring at the top.
"""Module for saving randomly generated numbers."""
import numpy as np
from pathlib import Path
Undefined variable 'Union' (undefined-variable)
This error arises because the type hint
uses Union
but it hasn’t been imported. It’s from the
typing module so we can import it.
"""Module for saving randomly generated numbers."""
import numpy as np
from pathlib import Path
from typing import Union
Unused argument 'seed' (unused-argument)
This is very useful to be informed about because the seed
argument, according to the docstring, is meant to be used in
the call to the random number generator and ensures we will get the same set of random numbers generated each time we
call the function with that seed, however, as Pylint has informed us we haven’t actually used it within the
save_random_number()
function. We can correct that by adding it when we instantiate the random number generator.
rng = np.random.default_rng(seed=seed)
standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
This message, like the one that follows it, is telling us that the order in which we have imported modules is incorrect,
because the PEP8 guide recommends that core modules, which both csv
and pathlib
are, should be imported before other
modules. We can correct this by changing the order (and because we have added an import from the typing
module which
is also a core module we move that too).
"""Module for saving randomly generated numbers."""
import csv
from pathlib import Path
from typing import Union
import numpy as np
Once corrected your file should look like this…
"""Module for saving randomly generated numbers."""
import csv
from pathlib import Path
from typing import Union
import numpy as np
def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
"""Save a list of random numbers (floats) to the given file.
The stated number of random numbers will be saved to the given target file, if the directory structure
doesn't exist it will be created. Output will by default be over-written.
Parameters
----------
size : int
Number of random numbers to generate
seed: int
Seed for random number generation
save_as : Union[str, Path]
Directory/file to save numbers to.
"""
rng = np.random.default_rng(seed)
random_numbers = rng.random(size)
with Path(save_as).open('w') as out:
writer = csv.write(out)
writer.writerows(random_numbers)
…and you can now run PyLint against it to see if you’ve improved your score.
❱ pylint save_random_numbers.py
************* Module save_random_numbers
save_random_numbers.py:7:66: E1136: Value 'Union' is unsubscriptable (unsubscriptable-object)
------------------------------------------------------------------
Your code has been rated at 5.00/10 (previous run: 4.00/10, +1.00)
That is an improvement in score (of +1.00
) but we now have another error telling us that E1136: Value 'Union' is
unsubscriptable (unsubscriptable-object)
. You are unlikely to know what all the error codes mean, but there are a
few handy on-line lists all PyLint codes or all PyLint
messages and what they are telling you are worth consulting (The
Little Book of Python Anti-Patterns is also
useful). In this instance PyLint has returned a false-positive because Union
can and should be subscripted here
because it means the argument can be either a string (str
) or a
pathlib Path (Path
). So how do we get around this complaint?
You can disable PyLint from complaining about specific error codes/messages on a per-file basis by adding a line that
disables them. You can use either codes or messages (the bit in the brackets at the end of the line, in this case
unsubscriptable-object
) and it is advisable to use the message form as it is more informative to those who read your
code subequently.
If we add the following line it prevents PyLint from reporting the specific error…
import numpy as np
# pylint: disable=unsubscriptable-object
def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
…running PyLint against our code again we get a much better score.
❱ pylint save_random_numbers_tidy.py
-------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 5.00/10, +5.00)
The last error we encountered is something that is likely to crop up again if you use Typehints liberally throughout
your Python code (and I would encourage you to do so). Rather than having to remember to disable the error in each
file/module we create we can configure PyLint via its configuration file ~/.pylintrc
to always ignore this error. To
do so add the following…
[MESSAGES CONTROL]
# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once).
disable=unsubscriptable-object
For more on configuriong PyLint refer to the documentation and also details of how to integrate with your editor and IDE
Black is The Uncompromising Code Formatter and is very strict about the way in which it formats code. This could be a good or bad thing depending on your point of view, but it does result in highly consistent code when applied to all files. It formats files in place, so be mindful of this if you run it against one of your files it will change it.
Install black
in your virtual environment and make a backup of your save_random_number.py
file that you have just
tidied up with linting.
pip install black
cp save_random_numbers.py tidy_save_random_numbers.py
To run black against your code pass it the input file, it will re-write it and you can then compare it against the backup you just made…
black save_random_numbers.py
❱ diff save_random_numbers.py tidy_save_random_numbers.py
5,8c5
<
< def save_random_numbers(
< size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt"
< ) -> None:
---
> def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
27c24
< with Path(save_as).open("w") as out:
---
> with Path(save_as).open('w') as out:
In this instance Black hasn’t changed much but it has reformatted the def save~randomnumbers~(...)
line and moved the
with Path()
line as a consequence.
It is worth linting your code from the outset of a project as not only does it result in a consistent style across
your code base it also avoids the problem that can arise when applying linting retrospectively. If an existing code base
has linting applied then the git blame
, which indicates who the last person
to edit a section was, then resides with the person who applied the linting, rather than the original author of the
code. Its possible though that the person who applied the linting knows very little about the underlying functionality
of the code but they may receive questions about it if they are indicated as the last person to have modified particular
lines.
Fortunately there are a number of ways to automate and integrate linting into your workflow.
When programming it is really useful to use an Integrated Development Environment (IDE) as most allow the integration of linting tools and apply them to your code automatically, whether its using PyLint, YAPF, Black or otherwise. Setup and configuration is beyond the scope of this article but some links are provided to useful resources to get you started.
VSCode supports linting in most languages, and both Python and R are supported along with other languages.
PyCharm supports automated formatting of code, for more information please refer to Reformat and rearrange code | PyCharm.
There are various options available for linting within Emacs, which you use depends on your preferences but LSP mode integrates with YAPF (via yapfify), Flake8 (via flycheck) and Black (via blacken).
If you are using an IDE then if configured correctly your code should be linted automatically for you, but an additional
step that can capture anything that hasn’t been correctly formatted is to use a git hook to
run linting on your code prior to making commits. There is
git-pylint-commit-hook available on PyPi which runs automatically
when you make commits to .py
files.
Including a linting stage in your Continuous Integration (CI) pipeline pays dividends as we all make mistakes and sometimes forget to lint our code before making pushes.
Perhaps not necessary for everyone but worth mentioning the beast that is MegaLinter which will lint code across multiple languages and integrates easily into your pipeline (GitHub Action, CI on GitLab, Jenkins etc.). A useful article on doing so is Limit your technical debt and secure your code base using MegaLinter.
For queries relating to collaborating with the RSE team on projects: rse@sheffield.ac.uk
Information and access to JADE II and Bede.
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.
Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to the University IT helpdesk.