Pre-commit is a powerful tool for executing a range of hooks prior to making commits to your Git history. This is useful because it means you can automatically run a range of linting tools on your code across an array of languages to ensure your code is up-to-scratch before you make the commit.
Photo by Neil Shephard.
Pre-commit is written in Python but that isn’t a limitation as it will lint YAML, JSON, C, JavaScript, Go, Rust, TOML, Terraform, Jupyter Notebooks, and so on. The list of supported hooks is vast.
For those unfamiliar with version control and Git in particular this will likely all sound alien. If you are new to the world of version control and Git I can highly recommend the Git & Github through GitKraken Client - From Zero to Hero! course offered by the Research Software Engineering at the University of Sheffield and developed by Alumni Anna Krystalli.
In computing a “hook” refers to something that is run prior to or in response to a requested action. In the context of the current discussion we are talking about hooks that relate to actions undertaken in Git version control and specifically actions that are run before a “commit” is made.
When you have initialised a directory to be under Git version control the settings and configuration are stored in the
.git/
sub-directory. There is the .git/config
file for the repositories configuration but also the .git/hooks/
directory that is populated with a host of *.sample
files with various different names that give you an in-road into
what different hooks you might want to run. Its worth spending a little time reading through these if you haven’t done
so yet as they provide useful examples of how various hooks work.
Typically when writing code you should lint your code to ensure it conforms to agreed style guides and remove any “code smells” that may be lingering (code that violates design principles). It won’t guarantee that your code is perfect but its a good starting point to improving it. People who write a lot of code have good habits of doing these checks manually prior to making commits. Experienced coders will have configured their Integrated Development Environment (IDE) to apply many such “hooks” on saving a file they have been working on.
At regular points in your workflow you save your work and check it into Git by making a commit and that is
where pre-commit
comes in to play because it will run all the hooks it has been configured to run against the files
you are including in your commit. If any of the hooks fail then your commit is not made. In some cases pre-commit
will automatically correct the errors (e.g. removing trailing white-space; applying
black formatting if configured) but in others you have to correct them yourself before a
commit can be successfully made.
Initially this can be jarring, but it saves you, and more importantly those who you are asking to review your code, time and effort. Your code meets the required style and is a little bit cleaner before being sent out for review. Long term linting your code is beneficial (see Linting - What is all the fluff about?).
Pre-commit is written in Python and so you will need Python installed on your system in order to use it. Aside from that there is little else extra that is required to be manually installed as pre-commit installs virtual environments specific for each enabled hook.
Most systems provide pre-commit
in their package management system but typically you should install pre-commit
within your virtual environment or under your user account.
pip install pre-commit
conda install -c conda-forge pre-commit
If you are working on a Python project then you should include pre-commit
as a requirement (either in
requirements-dev.txt
) or under the dev
section of [options.extras_require]
in your setup.cfg
as shown below.
[options.extras_require]
dev =
pre-commit
pytest
pytest-cov
Configuration of pre-commit is via a file in the root of your Git version controlled directory called
.pre-commit-config.yaml
. This file should be included in your Git repository, you can create a blank file or
pre-commit
can generate a sample configuration for you.
# Empty configuration
touch .pre-commit-config.yaml
# Auto-generate basic configuration
pre-commit sample-config
git add .pre-commit-config.yaml
Each hook is associated with a repository (repo
) and a version (rev
) within it. Many are available from the
https://github.com/pre-commit/pre-commit-hooks
. The default set of pre-commit
hooks might look like the following.
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0 # Use the ref you want to point at
hooks:
- id: trailing-whitespace
types: [file, text]
- id: check-docstring-first
- id: check-case-conflict
- id: end-of-file-fixer
types: [python]
- id: requirements-txt-fixer
- id: mixed-line-ending
types: [python]
args: [--fix=no]
- id: debug-statements
- id: fix-byte-order-marker
- id: check-yaml
Some hooks are available from dedicated repositories, for example the following runs Black, Flake8 and Pylint on your code and should follow under the above (with the same level of indenting to be valid YAML).
- repo: https://github.com/psf/black
rev: 22.6.0
hooks:
- id: black
types: [python]
- repo: https://github.com/pycqa/flake8.git
rev: 3.9.2
hooks:
- id: flake8
additional_dependencies: [flake8-print]
types: [python]
- repo: https://github.com/pycqa/pylint
rev: v2.15.3
hooks:
- id: pylint
An extensive list of supported hooks is available. It lists the repository from which the hook is derived along with its name.
You can also define new hook and configure them under the - repo: local
.
- repo: local
hooks:
- id: <id>
name: <descriptive name>
language: python
entry:
types: [python]
For some examples of locally defined hooks see the Pandas .pre-commit-config.yaml.
Before pre-commit
will run you need to install it within your repository. This puts the file
.git/hooks/pre-commit
in place that contains the hooks you have configured to run. To install this you should have
your .pre-commit-config.yaml
in place and then run the following.
pre-commit install
Once installed and configured there really isn’t much to be said for using pre-commit
, just make commits and before
you can make a successful commit pre-commit
must run with all the hooks you have configured passing. By default
pre-commit
only runs on files that are staged and ready to be committed, if you have unstaged files these will be
stashed prior to running the pre-commit
hook and restored afterwards. Should you wish to run these manually without
making a commit then, after activating a virtual environment if you are using one simply, or you can make a git commit
.
pre-commit run
If any of the configured hooks fail then the commit will not be made. Some hooks such as black may reformat files in place and you can then make another commit recording those changes and the hook should pass. Its important to pay close attention to the output.
If you want to run a specific hook you simply add the <id>
after run
.
pre-commit run <id>
Or if you want to force running against all files (except unstaged ones) you can do so.
pre-commit run --all-files # Across all files/hooks
And these two options can be combined to run a specific hook against all files.
pre-commit run <id> --all-files
You may find that you wish to switch branches to work on another feature or fix a bug but that your current work doesn’t
pass the pre-commit
and you don’t wish to sort that out immediately. The solution to this is to use git stash
to
temporarily save your current uncommitted work and restore the working directory and index to its previous state. You
are then free to switch branches and work on another feature or fix a bug, commit and push those changes and then switch
back.
Imagine you are working on branch a
but are asked to fix a bug on branch b
. You go to commit your work but find that
a
does not pass pre-commit
but you wish to work on b
anyway. Starting on branch a
you stash your changes, switch
branches, make and commit your changes to branch b
then switch back to a
and unstash your work there.
git stash
git checkout b
... # Work on branch b
git add <changed_files_on_branch_b>
git commit -m "Fixing bug on branch b"
git push
git checkout a
git stash apply
You can update hooks locally by running pre-commit autoupdate
. This will update your .pre-commit-config.yaml
with
the latest version of repositories you have configured and these will run both locally and if you use CI/CD as described
below. However this will not update any packages that are part of the - repo: local
that you may have implemented
and it is your responsibility to handle these.
Ideally contributors will have setup their system to work with pre-commit and be running such checks prior to making pushes. It is however useful to enable running pre-commit as part of your Continuous Integration/Development pipeline (CI/CD). This can be done with both GitLab and GitHub although similar methods are available for many continuous integration systems.
GitHub actions reside in the .github/workflows/
directory of your project. A simple pre-commit action is available on
the Marketplace at pre-commit/action. Copy this template to
.github/workflows/pre-commit.yml
and include it in your Git repository.
git add .github/workflows/pre-commit.yml
git commit -m "Adding pre-commit GitHub Action" && git push
If you use GitLab the following article describes how to configure a CI job to run as part of your repository.
For queries relating to collaborating with the RSE team on projects: rse@sheffield.ac.uk
Information and access to JADE II and Bede.
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.
Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to the University IT helpdesk.