Linting - What is all the fluff about?

Neil Shephard

19 April 2022 13:00

If you’ve been dabbling in programming for a while you may have heard of “linting your code” which is a process of static code analysis to remove the “fluff” from your code. Just as physically linting your clothes removes unwanted fluff, linting your code removes “fluff” and can help…

Reduce bugs.
Improve performance.
Mitigate against some security flaws.
Improve coding skills.
Consistent coding style.

This helps reduce the technical debt which impacts the amount of time required for maintenance and further development of a code base. The main focus of this article is the use of linting to ensure consistent coding style, it focuses on Python under Linux but similar tools are available for other operating systems and languages.

Style Matters

What has style got to do with writing code? Trends come and go in fashion but coding styles are meant to be relatively static and not change with the season, although they can and do evolve over time. This is because using a consistent and widely used style when writing code makes it easier for other people, often your future self, to read and understand the code you have written. If code is easier to understand then its easier to modify, update, extend, improve and in general maintain.

A useful insight from Gudio van Rossum, the creator of Python is that “code is read much more often than it is written” and so it should be easy to understand and not obfuscate its intent. Python is quite good for this as it is an expressive language which encourages coders to be explicit when naming variables, functions, classes and so forth so that their purpose and intention is clear, although the same is true of most modern languages. However, going a step further and using consistent styles to format and layout code helps enhance this.

Linting in Python

The most widely used Python style is defined in the long established PEP 8: The Style Guide for Python Code. There are a number of tools available that will lint your Python code for you and most integrate with your IDE, whether that is Visual Studio Code, PyCharm or Emacs. Some of the formatting and linting tools available for Python are…

Pylint - checks for errors in Python code, tries to enforce a coding standard and looks for code smells.
YAPF - takes the code and reformats it to the best formatting that conforms to the style guide.
Black - The Uncompromising Code Formatter
Flake8 - Your Tool For Style Guide Enforcement
Prospector - Python Static Analysis
mypy - Optional Static Typing for Python

Here we will work through linting and formatting the simple file below (available as a download here) using PyLint and Black.

import numpy as np
from pathlib import Path
from typing import Union
import csv

def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
    """Save a list of random numbers (floats) to the given file.

    The stated number of random numbers will be saved to the given target file, if the directory structure
    doesn't exist it will be created. Output will by default be over-written.
    Parameters
    ----------
    size : int
        Number of random numbers to generate
    seed: int
        Seed for random number generation
    save_as : Union[str, Path]
        Directory/file to save numbers to.
    """
    rng = np.random.default_rng()
    random_numbers = rng.random(size)

    with Path(save_as).open('w') as out:
        writer = csv.write(out)
        writer.writerows(random_numbers)

Linting with PyLint

We will lint this file using Pylint to find out what errors there are and how its style can be improved to conform with PEP8 guidelines.

First you need to install pylint, typically in your virtual environment.

pip install pylint

Pylint can be configured using a ~/.pylintrc file in your home directory and over time this will grow as you customise your configuration but for now we will make one simple change from the default which is to increase the accepted line length. Create the file and save it with the following content.

[FORMAT]
## Maximum number of characters on a single line.
max-line-length=120

Open a terminal and navigate to the location you saved the example file save_random_numbers.py activate the virtual environment you installed pylint under if its not already being used and then type the following to run Pylint against your code…

pylint save_random_numbers.py

You should see output similar to the following…

 ❱ pylint save_random_numbers.py
************* Module save_random_numbers
save_random_numbers.py:1:0: C0114: Missing module docstring (missing-module-docstring)
save_random_numbers.py:5:66: E0602: Undefined variable 'Union' (undefined-variable)
save_random_numbers.py:5:35: W0613: Unused argument 'seed' (unused-argument)
save_random_numbers.py:2:0: C0411: standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
save_random_numbers.py:3:0: C0411: standard import "import csv" should be placed before "import numpy as np" (wrong-import-order)

-------------------------------------------------------------------
Your code has been rated at 0.00/10

The output tells us which module has been inspected on the first line. Each subsequent line indicates

The file.
The line the problem has been encountered on.
The column.
A somewhat cryptic error code and then a message about the problem
A more descriptive generic message associated with the error code.

At the moment we are only looking at one file, but when using PyLint against larger code bases this information is vital in helping direct you to the location of code that needs changing. At the end PyLint rates your code, ideally you should aim to get a score of 10.0/10.

The messages are quite informative, taking each in turn we can work through resolving them.

`Missing module docstring (missing-module-docstring)`

Each Python module should have a docstring as the very first line that describes what it does. In this example it might be considered superflous but its good practice to get in the habit of writing these as it comes in useful when documentation is automatically generated from the docstrings in the code. To fix it we can add a short docstring at the top.

"""Module for saving randomly generated numbers."""
import numpy as np
from pathlib import Path

`Undefined variable 'Union' (undefined-variable)`

This error arises because the type hint uses Union but it hasn’t been imported. It’s from the typing module so we can import it.

"""Module for saving randomly generated numbers."""
import numpy as np
from pathlib import Path
from typing import Union

`Unused argument 'seed' (unused-argument)`

This is very useful to be informed about because the seed argument, according to the docstring, is meant to be used in the call to the random number generator and ensures we will get the same set of random numbers generated each time we call the function with that seed, however, as Pylint has informed us we haven’t actually used it within the save_random_number() function. We can correct that by adding it when we instantiate the random number generator.

rng = np.random.default_rng(seed=seed)

`standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)`

This message, like the one that follows it, is telling us that the order in which we have imported modules is incorrect, because the PEP8 guide recommends that core modules, which both csv and pathlib are, should be imported before other modules. We can correct this by changing the order (and because we have added an import from the typing module which is also a core module we move that too).

"""Module for saving randomly generated numbers."""
import csv
from pathlib import Path
from typing import Union

import numpy as np

Once corrected your file should look like this…

"""Module for saving randomly generated numbers."""
import csv
from pathlib import Path
from typing import Union
import numpy as np

def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
    """Save a list of random numbers (floats) to the given file.

    The stated number of random numbers will be saved to the given target file, if the directory structure
    doesn't exist it will be created. Output will by default be over-written.

    Parameters
    ----------
    size : int
        Number of random numbers to generate
    seed: int
        Seed for random number generation
    save_as : Union[str, Path]
        Directory/file to save numbers to.
    """
    rng = np.random.default_rng(seed)
    random_numbers = rng.random(size)

    with Path(save_as).open('w') as out:
        writer = csv.write(out)
        writer.writerows(random_numbers)

…and you can now run PyLint against it to see if you’ve improved your score.

 ❱ pylint save_random_numbers.py
************* Module save_random_numbers
save_random_numbers.py:7:66: E1136: Value 'Union' is unsubscriptable (unsubscriptable-object)

------------------------------------------------------------------
Your code has been rated at 5.00/10 (previous run: 4.00/10, +1.00)

That is an improvement in score (of +1.00) but we now have another error telling us that E1136: Value 'Union' is unsubscriptable (unsubscriptable-object). You are unlikely to know what all the error codes mean, but there are a few handy on-line lists all PyLint codes or all PyLint messages and what they are telling you are worth consulting (The Little Book of Python Anti-Patterns is also useful). In this instance PyLint has returned a false-positive because Union can and should be subscripted here because it means the argument can be either a string (str) or a pathlib Path (Path). So how do we get around this complaint?

You can disable PyLint from complaining about specific error codes/messages on a per-file basis by adding a line that disables them. You can use either codes or messages (the bit in the brackets at the end of the line, in this case unsubscriptable-object) and it is advisable to use the message form as it is more informative to those who read your code subequently.

If we add the following line it prevents PyLint from reporting the specific error…

import numpy as np

# pylint: disable=unsubscriptable-object

def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:

…running PyLint against our code again we get a much better score.

 ❱ pylint save_random_numbers_tidy.py

-------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 5.00/10, +5.00)

Configuring PyLint

The last error we encountered is something that is likely to crop up again if you use Typehints liberally throughout your Python code (and I would encourage you to do so). Rather than having to remember to disable the error in each file/module we create we can configure PyLint via its configuration file ~/.pylintrc to always ignore this error. To do so add the following…

[MESSAGES CONTROL]
# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once).
disable=unsubscriptable-object

For more on configuriong PyLint refer to the documentation and also details of how to integrate with your editor and IDE

Automated Formatting with Black

Black is The Uncompromising Code Formatter and is very strict about the way in which it formats code. This could be a good or bad thing depending on your point of view, but it does result in highly consistent code when applied to all files. It formats files in place, so be mindful of this if you run it against one of your files it will change it.

Install black in your virtual environment and make a backup of your save_random_number.py file that you have just tidied up with linting.

pip install black
cp save_random_numbers.py tidy_save_random_numbers.py

To run black against your code pass it the input file, it will re-write it and you can then compare it against the backup you just made…

black save_random_numbers.py
❱ diff save_random_numbers.py tidy_save_random_numbers.py
5,8c5
<
< def save_random_numbers(
  <     size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt"
  < ) -> None:
---
> def save_random_numbers(size: int, seed: int = 87653546, save_as: Union[str, Path] = "./random_numbers.txt") -> None:
27c24
<     with Path(save_as).open("w") as out:
---
>     with Path(save_as).open('w') as out:

In this instance Black hasn’t changed much but it has reformatted the def save~randomnumbers~(...) line and moved the with Path() line as a consequence.

When to Lint

It is worth linting your code from the outset of a project as not only does it result in a consistent style across your code base it also avoids the problem that can arise when applying linting retrospectively. If an existing code base has linting applied then the git blame, which indicates who the last person to edit a section was, then resides with the person who applied the linting, rather than the original author of the code. Its possible though that the person who applied the linting knows very little about the underlying functionality of the code but they may receive questions about it if they are indicated as the last person to have modified particular lines.

Fortunately there are a number of ways to automate and integrate linting into your workflow.

Automating Linting

IDE Integration

When programming it is really useful to use an Integrated Development Environment (IDE) as most allow the integration of linting tools and apply them to your code automatically, whether its using PyLint, YAPF, Black or otherwise. Setup and configuration is beyond the scope of this article but some links are provided to useful resources to get you started.

VSCode

VSCode supports linting in most languages, and both Python and R are supported along with other languages.

PyCharm

PyCharm supports automated formatting of code, for more information please refer to Reformat and rearrange code | PyCharm.

Emacs

There are various options available for linting within Emacs, which you use depends on your preferences but LSP mode integrates with YAPF (via yapfify), Flake8 (via flycheck) and Black (via blacken).

Git Integration

If you are using an IDE then if configured correctly your code should be linted automatically for you, but an additional step that can capture anything that hasn’t been correctly formatted is to use a git hook to run linting on your code prior to making commits. There is git-pylint-commit-hook available on PyPi which runs automatically when you make commits to .py files.

Continuous Integration

Including a linting stage in your Continuous Integration (CI) pipeline pays dividends as we all make mistakes and sometimes forget to lint our code before making pushes.

Megalinter

Perhaps not necessary for everyone but worth mentioning the beast that is MegaLinter which will lint code across multiple languages and integrates easily into your pipeline (GitHub Action, CI on GitLab, Jenkins etc.). A useful article on doing so is Limit your technical debt and secure your code base using MegaLinter.