The Research Software Engineering team at Sheffield has worked on projects involving a variety of methods and technologies:
Some projects we have worked on (not a comprehensive list):
AirQo makes use of “low cost” air quality sensors to increase the coverage of air quality montioring in Kampala, Uganda. It applies machine learning methods to data to better inform descision making. Its mission to to provide this technology to other cities in sub-saharan Africa.
AirQo has a strong team of academics and software engineers, mostly based at Makerere University in Kampala, but with some at the University of Sheffield and elsewhere. University of Sheffield RSE involvement is helping with software quality in the implementation of cutting edge machine learning methods developed by researchers within live “production” web software, hosted in the cloud. This is achieved through engagement with AirQo’s Scrum (agile) project management approach.
The project outputs are hoped to include improved descisions based on newly available air quality data and information derived from machine learning. Ultimately this will improve air quality and health in Kampala and other African cities.
The NIHR-funded Sheffield Biomedical Research Centre (BRC) is a research partnership between the University of Sheffield and Sheffield Teaching Hospitals (STH) NHS Foundation Trust, dedicated to improving the treatment and care of people living with chronic neurological disorders.
It is an umbrella project for a number of clinical studies and brings together:
RSE time has been costed for the duration of the BRC project (at 50% FTE), with the main study we’re involved with being MoStrAct. MoStrAct aims to explore whether data from movement sensors such as gait monitors can provide biomarkers for changes in neurological conditions, with additional sources of information such as standard clinical tests and MRI imaging data being used to provide independent information on changes. Reliable gait-based biomarkers have the potential to make the monitoring of chronic neurological disorders a cheaper and more continual process.
MoStrAct has been very much a collaboration, with the parties listed above all contributing to the experimental protocol and ethics application along with the subsequent workflows/pipelines. The RSE team worked with the STH Scientific Computing team to advise on information governance and to set up systems for
Other RSE contributions to the project include:
In general, many contributions have been around data storage/management/governance and making analysis of that data more robust/reproducible, with this focus resulting in part from the high high cost (in terms of time and money) per data point in this study.
Statistical and machine learning approaches are able to make excellent predictions of an output based on several pieces of input data. However, the extent to which each of the inputs contribute to causing the output is generally unclear. Causal inference is a method that seeks to quantify causal relationships between inputs and outputs often by using (or with reference to) a “causal graph”, informed by someone expert in the data and the subject being analysed. This project, CITCoM, will democratise access to causal inference, bringing this powerful technique to a broader range of researchers in academia, government and the public sector.
This project is in its early stages, and has no outputs at present. It is anticipated that RSE will help with version control of code, good practice in python development, software testing and deployment (including on the DAFNI platform).
Understanding causality in predictive models is key to understanding which of the inputs are leading to changes in outputs. This is essential to anyone wishing to enact a policy to change the output in future. In the private sector, this might mean understanding what features of a web page (inputs) make it more likely for a customer to buy a product (output). In the public sector, it might mean understanding which public health interventions lead to behavioral change promoting better health.
During the initial wave of the COVID-19 pandemic in 2020, First Draft worked with researchers in the GATE team to analyse the availability and demand of fact checking in various areas of misinformation relating to COVID-19.
The RSE team worked with GATE RSEs to build a data visualisation dashboard to provide data visualisation and analysis for this research.
The dashboard calculates and visualises the geographic spread of availability of fact checking vs demand on different topics of misinformation. As well as providing interactive components showing indicators for information demand and information supply.
The visualisations and analysis made available by this dashboard helped to inform researchers at First Draft on the supply and demand for media fact checking relating to the pandemic and a report on their findings was written up here.
Working with a number of research projects in the Department of Computer Science, the RSE team have worked with researchers to enhance the research impact of projects identified for REF impact case studies.
Some of these have included:
GATEcloud usage statistics
Few research software projects measure their own usage, but data demonstrating impact can be essential to gaining future funding for software. GATE is a software framework for text analytics, developed by the Natural Language Processing group. A large number of analytic pipelines are available via API calls to the widely-used GATEcloud service.
The RSE team worked with GATE to build a simple tool to regularly analyse the usage of these services and make up to date statistics available internally for use in funding applications and reports.
A neural network-based tool for quality estimation in machine translation. The RSE team worked with researchers to refactor a proof of principle codebase which demonstrated excellent results in using deep learning to estimate the quality of linguistic translations. The impact of the project was enhanced by refactoring the codebase in order to create a Python package, developing command line and Python interfaces, reducing the size of the codebase and adding tests and documentation.
The University operates several high-performance computing (HPC) clusters available for use by all research students and staff and facilitates access to specialist, multi-institution (HPC) clusters.
The RSE team has collaborated with IT Services on improving and maintaining these computing facilities over several years. Some outputs of this work include:
This partnership has also resulted in greater visibility of certain HPC user needs through the RSE team being involved in a number of projects across the University.
Multiple output Gaussian processes are useful where several different measurements are made at different points in a parameter space, but not all measurements are made at each point. For example:
A Gaussian process model can be created that allows prediction, with a measure of uncertainty, of any of the measurements at any point in the space. Furthermore, inference of underlying functions driving the measurements is also possible.
GPy is a popular framework for Gaussian processes in written in Python. However, code to execute advanced multiple output Gaussian processes was only available in MatLab, which is less widely used for Gaussian processes and less well suited as a development platform for this work.
The RSE involvement was in documenting aspects of GPy and converting MatLab code into reliable Python in the GPy framework. We used tests to compare code python code output to a MatLab baseline as a means of driving development. The architectural documentation contributes to the sustainability of GPy making it easier for new developers to add functionality and fix bugs.
Broader access to multiple output Gaussian process modelling is of potential benefit to a range of fields and activities. Examples include pollution modelling, robotics, gene regulation and financial services.
The Polar Thematic Exploitation Portal (Polar TEP) is a platform that allows research software to be uploaded and run, accessing a reservoir of satellite imaging data without having to download enormous image files. It enables software to be run interactively by end users using a web graphical user interface, and chained together into workflows.
Software for tracking the sea ice edge and icebergs had previously been written using MatLab. In order to be deployed on Polar TEP, this needed to be converted into Python and made to run in a Docker container.
Tracking sea ice is of global concern in the context of climate change research, but also of immediate use in shipping and coastal economic activity in the polar regions.
PRIMAGE (PRedictive In-silico Multiscan Analytics to support cancer personalised diaGnosis and prognosis, Empowered by imaging biomarkers) is an EU Horizon 2020 funded collaboration between 16 partners to develop an open cloud-based platform to support decision making int he clinical management of two paediatric cancers, Neuroblastoma (NB), the most frequent solid cancer of early childhood, and the Diffuse Intrinsic Pontine Glioma (DIPG) the leading cause of brain tumour-related death in children.
The Sheffield RSE team has been working closely with Insigneo’s team on the project to develop a highly scalable CUDA GPU parallel cell-level model of a Neuroblastoma tumour. Parallel development has been used, whereby an RA with understanding of the biological processes has developed a model using their language of choice (Python), each incremental change to the model has then been transfered to a separate CUDA implementation of the model using the agent-based modelling framework FLAMEGPU and validated for consistent behaviour with the original implementation.
The separation of concerns has allowed the modeller to focus on the correctness of their model rather than performance, whilst the RSE-developed CUDA model has made it practical to model tumours 1,000+ times larger, enabling much faster access to model calibration and parameter sweep results. Furthermore, the Sheffield RSE team has handled much of the discussion with the project’s international members with regards to the integration of the model into the overarching platform.
RSE involvement in this project has also supported development of FLAMEGPU2, which aims to provide a more-accessible (Python and C++) interface to highly-scalable CUDA GPU parallel modelling of complex systems.
PyKale is a (PyTorch-based) Python package aimed at standardising machine learning workflows for graphs, images, and videos with a unified pipeline-based API to accelerate cross-disciplinary research. This package also makes it easier for end-users to access machine learning. An important target audience are clinicians and clinical researchers who have abundant data to analyse, but lack the time and expertise to apply machine learning approaches. The work is supported by the Wellcome Trust.
A key RSE involvement in the project to date was the development of a documented strategy to add automated testing to the package. We also advise on the use of version control and collaboration via GitHub, as well as static analysis, release mechanisms and packaging.
Machine learning has the potential to improve diagnosis, prognosis and selection of treatment, improving health and quality of life for patients. A key example is analysis of medical imaging data to identify unusual features that may be of clinical interest. Lack of expertise and governance barriers to transferring data from clinical organisations can mean that adoption of machine learning is reduced. This easily-accessible machine learning package will bring machine learning software closer to clinicians for improved clinical outcomes powered by AI.
Following work on developing covid-19 models as part of the Royal Society Rapid Assistance in Modelling the Pandemic (RAMP) initiative, a need was identified to be able to track the provenance of epidemiological model outputs in order to build trust from politicians, academics, the media and members of the public. This entails building on the Scottish Covid Response Consortium’s (SCRC) data pipeline which comprises a database and APIs to allow epidemiological modelling software to:
RSE involvement contributes to consortium strategy and leadership of the software API development work in an Agile mindset. RSE will also be involved with developing the Python data pipeline API.
The output of this project will enable epidemiologists to manage data better and deliver improved advice based on more traceable software and data versioning. This will extend beyond the current covid epidemic into future human and animal disease outbreaks.
COM4521 is a 4th year MEng and MSc Computer Science module that teaches students how to write high performance parallel code with a specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module is to teach understanding of what the implications of program code are on the underlying hardware so that it can be optimised.
The module was developed in 2015 by Paul Richmond, director of the team and is facilitated annually by Paul with assistance from others in the team who also have GPU programming (inc. CUDA) expertise. For the 2018-2019 academic year Mozhgan Kabiri Chimeh, alumnus of the team, facilitated the module as part of paternity cover for Paul.
The Rumour Veracity project provides a web-based platform for automatically identifying rumours in social media and assessing their veracity. The project features deep learning models for classification of veracity of a social media post and the stance of their replies. The project has resulted in the publication of the paper Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis.
RSES is involved in the development of the entire service stack including the development of the web GUI and the server back-end for deploying the deep learning models. The service uses Vue.js for the user-side web interface and Django to provide server-side application. We are currently working on a general framework for automatically retraining the DL models based on data collected on the website.
Development project contributing to the development and optimisation of the Chaste cell modelling software. In particular, the addition a new feature allowing ODE systems to be simulated for each cell-edge, the use of OpenMP for taking advantage of multiple CPU cores and a tool for scheduling parameter sweeping on the HPC. The RSES team worked to integrate the development workflow within the established and well-tested code ecosystem.
Basisflow is a computational chemistry project headed by Dr. Grant Hill that aims to apply machine learning approach to the generation of basis sets, mathematical functions used in the representation of atomic and molecular orbitals. RSES is providing support and consultancy on the possible ML approaches and the methods to fully utilise available HPC systems.
JADE II is a GPU-based, deep Learning focused, Tier-2 HPC system associated with 19 academic partners including Sheffield. It is the successor of the JADE system due to be decommissioned in 2021.
RSES is a local institutional contact and is involved in promoting the system, managing user access and on-boarding, provide on-going technical support and training.
In order to train users and promote the use of the system, RSES have created an Introduction to Deep Learning Course, a one-day mixture of theoretical lectures and practical labs in Python and R.
Early in the covid-19 epidemic the Royal Society began the Rapid Assistance in Modelling the Pandemic (RAMP) initiative to bring together epidemiological modelling and supporting expertise to provide advice to government. This led to the formation of the Scottish Covid Response Consortium (SCRC) comprising members from over 30 organisations in academia and the private sector, with skills spanning epidemiology, software engineering, data management, policy / media engagement and visualisation. SCRC has produced multiple high quality pieces of epidemiological modelling software in a variety of programming languages, and a data pipeline system for rapid, reproducible outputs.
The University of Sheffield contributed an RSE to provide software engineering leadership on the Simple Network Sim epidemiological modelling software. This entailed collaborating with the epidemiological modelling lead and co-ordinating the efforts of software engineers and data scientists (largely volunteered by the Man Group) using an Agile project management approach. The work was carried out against the background of media coverage of government policy informed by imperfect research software, so a key part of what we did was to help define and ensure software quality. This resulted in the development of an epidemiological modelling software checklist.
Whilst of limited direct input to government policy, SCRC has set a new standard for open epidemiological modelling, which will, by example, drive future policy-informing research to be more open and reproducible. This gives organisations the technical and scientific foundation to behave in a trustworthy way when using evidence from epidemiological models.
RateSetter is an RSSB-funded project that aims to create a software for predicting and optimising passenger flow at the Platform Train Interface (PTI). The project resulted in the publication of the paper RateSetter: roadmap for faster, safer, and better platform train interface design and operation using evolutionary optimisation.
RSE was involved in the creation of the FLAME GPU-based pedestrian model that simulates passengers boarding and alighting the train which is used as a basis for predicting the results of changes in the train layout during the optimisation exploration process.
For queries relating to collaborating with the RSE team on projects: firstname.lastname@example.org
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.