The Research Software Engineering team at Sheffield has worked on projects involving a variety of methods and technologies:
agile · ansible · cloud · collectd · containers · continuous integration · docker · documentation · Gaussian processes · GitHub · Grafana · Grid Engine · HPC · InfluxDB · machine learning · matlab · MATLAB · Nagios · Puppet · R · REDCap · scrum · servers · SGE · Singularity · Slurm · software testing · static analysis · testing · training · version control · XNAT · databases · HPC · python
Some projects we have worked on (not a comprehensive list):
AirQo makes use of “low cost” air quality sensors to increase the coverage of air quality montioring in Kampala, Uganda. It applies machine learning methods to data to better inform descision making. Its mission to to provide this technology to other cities in sub-saharan Africa.
AirQo has a strong team of academics and software engineers, mostly based at Makerere University in Kampala, but with some at the University of Sheffield and elsewhere. University of Sheffield RSE involvement is helping with software quality in the implementation of cutting edge machine learning methods developed by researchers within live “production” web software, hosted in the cloud. This is achieved through engagement with AirQo’s Scrum (agile) project management approach.
AirQo shares its code via its GitHub organisation. In addition to live air quality data, blog posts describe some recent milestones.
The project outputs are hoped to include improved descisions based on newly available air quality data and information derived from machine learning. Ultimately this will improve air quality and health in Kampala and other African cities.
Statistical and machine learning approaches are able to make excellent predictions of an output based on several pieces of input data. However, the extent to which each of the inputs contribute to causing the output is generally unclear. Causal inference is a method that seeks to quantify causal relationships between inputs and outputs often by using (or with reference to) a “causal graph”, informed by someone expert in the data and the subject being analysed. This project, CITCoM, will democratise access to causal inference, bringing this powerful technique to a broader range of researchers in academia, government and the public sector.
This project is in its early stages, and has no outputs at present. It is anticipated that RSE will help with version control of code, good practice in python development, software testing and deployment (including on the DAFNI platform).
Understanding causality in predictive models is key to understanding which of the inputs are leading to changes in outputs. This is essential to anyone wishing to enact a policy to change the output in future. In the private sector, this might mean understanding what features of a web page (inputs) make it more likely for a customer to buy a product (output). In the public sector, it might mean understanding which public health interventions lead to behavioral change promoting better health.
The University operates several high-performance computing (HPC) clusters available for use by all research students and staff and facilitates access to specialist, multi-institution (HPC) clusters.
The RSE team has collaborated with IT Services on improving and maintaining these computing facilities over several years. Some outputs of this work include:
This partnership has also resulted in greater visibility of certain HPC user needs through the RSE team being involved in a number of projects across the University.
Multiple output Gaussian processes are useful where several different measurements are made at different points in a parameter space, but not all measurements are made at each point. For example:
A Gaussian process model can be created that allows prediction, with a measure of uncertainty, of any of the measurements at any point in the space. Furthermore, inference of underlying functions driving the measurements is also possible.
GPy is a popular framework for Gaussian processes in written in Python. However, code to execute advanced multiple output Gaussian processes was only available in MatLab, which is less widely used for Gaussian processes and less well suited as a development platform for this work.
The RSE involvement was in documenting aspects of GPy and converting MatLab code into reliable Python in the GPy framework. We used tests to compare code python code output to a MatLab baseline as a means of driving development. The architectural documentation contributes to the sustainability of GPy making it easier for new developers to add functionality and fix bugs.
Broader access to multiple output Gaussian process modelling is of potential benefit to a range of fields and activities. Examples include pollution modelling, robotics, gene regulation and financial services.
The Polar Thematic Exploitation Portal (Polar TEP) is a platform that allows research software to be uploaded and run, accessing a reservoir of satellite imaging data without having to download enormous image files. It enables software to be run interactively by end users using a web graphical user interface, and chained together into workflows.
Software for tracking the sea ice edge and icebergs had previously been written using MatLab. In order to be deployed on Polar TEP, this needed to be converted into Python and made to run in a Docker container.
Tracking sea ice is of global concern in the context of climate change research, but also of immediate use in shipping and coastal economic activity in the polar regions.
Following work on developing covid-19 models as part of the Royal Society Rapid Assistance in Modelling the Pandemic (RAMP) initiative, a need was identified to be able to track the provenance of epidemiological model outputs in order to build trust from politicians, academics, the media and members of the public. This entails building on the Scottish Covid Response Consortium’s (SCRC) data pipeline which comprises a database and APIs to allow epidemiological modelling software to:
RSE involvement contributes to consortium strategy and leadership of the software API development work in an Agile mindset. RSE will also be involved with developing the Python data pipeline API.
The output of this project will enable epidemiologists to manage data better and deliver improved advice based on more traceable software and data versioning. This will extend beyond the current covid epidemic into future human and animal disease outbreaks.
Early in the covid-19 epidemic the Royal Society began the Rapid Assistance in Modelling the Pandemic (RAMP) initiative to bring together epidemiological modelling and supporting expertise to provide advice to government. This led to the formation of the Scottish Covid Response Consortium (SCRC) comprising members from over 30 organisations in academia and the private sector, with skills spanning epidemiology, software engineering, data management, policy / media engagement and visualisation. SCRC has produced multiple high quality pieces of epidemiological modelling software in a variety of programming languages, and a data pipeline system for rapid, reproducible outputs.
The University of Sheffield contributed an RSE to provide software engineering leadership on the Simple Network Sim epidemiological modelling software. This entailed collaborating with the epidemiological modelling lead and co-ordinating the efforts of software engineers and data scientists (largely volunteered by the Man Group) using an Agile project management approach. The work was carried out against the background of media coverage of government policy informed by imperfect research software, so a key part of what we did was to help define and ensure software quality. This resulted in the development of an epidemiological modelling software checklist.
Whilst of limited direct input to government policy, SCRC has set a new standard for open epidemiological modelling, which will, by example, drive future policy-informing research to be more open and reproducible. This gives organisations the technical and scientific foundation to behave in a trustworthy way when using evidence from epidemiological models.
For queries relating to collaborating with the RSE team on projects: rse@sheffield.ac.uk
To contact the RSE team about seminars, training or JADE: rse-team-group@sheffield.ac.uk
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.
Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to University central IT support.