Seminar Series

The RSE group organises seminars and promotes seminars from other groups which are of interest to the community. If you have an interesting seminar then announce it on the mailing-list. Below are a number of seminars which are scheduled or have taken place.

We aim to hold lunchtime seminars in the last week of every month, inviting speakers to talk about a wide range of topics that involves practical issues relating to research software.

If you would like to recommend a speaker or would like to give a talk then please contact us. We have a budget to support inviting speakers.

Note: The GPU computing seminar series has now been merged with the RSE seminar series. Talks on GPU computing will be advertised through both mailing lists.


Wednesday 15 December 2021 — 9:30-11:30, Online

Various, Research IT

Talk: "Research-IT Forum: Analysing Human Language: Mining Textual Data and Natural Language Processing"

This event is being run by Research IT

The Research-IT forum is a regular event providing updates on developments relating to research computing covering research software, computational science, data science, qualitative and quantitative analysis and the management of research outputs. It is a forum where researchers can showcase their work in an environment conducive to creative discussion.

This event will explore data mining and natural language processing at the University. Our six speakers, from various departments, will present their specific research and the topics range from analysing social media speech to corpus analysis.

To register, please go to:

For more information about the event, please visit:

Thursday 2 December 2021 — 14:00-15:00, Online

Jens Krinke, University College London

Talk: "Toxic Code on Stack Overflow"

This event is being run by the Testing Research Group

Software developers use Stack Overflow to interact and exchange code snippets and research uses Stack Overflow to harvest code snippets for use with recommendation systems. However, code on Stack Overflow may have quality issues, such as security or license problems. In this talk I will present our work on how users of Stack Overflow perceive such issues, how we studied the problem of outdated code and potential license violations, and how code is non-compliant with coding style. The prevalence of such toxic code may affect mining or machine learning tasks using Stack Overflow as a code base.

Google meet link:

Friday 21 May 2021 — 12:30-13:50, Online

Will Furnass, Joe Heffer, Gemma Ives

Talk: "The software and people behind academic research"

This event was part of the Sheffield Digital festival.

The use of data and computational analysis in research is increasing and it is critical in getting new insights. But how does the code that drives this research get written? What is different about writing software in a research environment? What are the challenges around data management and infrastructure? And how can visualisation technologies make the outputs of research of greater value and interest to other researchers and the public?

Recording of the talks

Will Furnass, Research Software Engineer: Turning ideas into code

Much of modern research is facilitated by code that researchers have had a hand in writing. In this first talk we explore the processes through which research ideas are encapsulated as software, which often involves interactive data exploration, then how this software evolves (or not) over time. We also discuss the computational demands of research and common technologies, before concluding with a look at challenges relating to the development of coding skills in academia, the drive for reproducible research, and how funding and incentives shape the research software lifecycle.


Joe Heffer, Research Data Engineer: Supporting Coronavirus Genomics

We provided software engineering expertise for researchers from The University of Sheffield who are participating in the COVID-19 Genomics UK (COG-UK) Consortium which provides large-scale, rapid genomic sequencing to guide the public health response to the pandemic.


Gemma Ives, Research Data Scientist: Communicating Research Through Visualisation

More and more often researchers are openly sharing their research data and their findings, encouraging engagement and scrutiny from politicians and the public. This final talk will discuss how and why data visualisation is becoming an essential skill in academic research.


Thursday 28 January 2021 — 10:00-11:30, Online

Paul Richmond, others to be confirmed, N8

Talk: "Bede Introductory Event"

This introductory event hosted by the N8 CIR offers those who are new to HPC the opportunity to learn more about Bede’s hardware and software, as well as the support that is available to users, regardless of their experience of HPC.

Bede is the N8’s newest high performance computing (HPC) platform. Since its formal launch in December 2020 researchers have been using the system across a range of academic domains to accelerate their research.

This event will feature information regarding the Bede platform, support available for it and case studies highlighting how it is already being used beneficially in research. One of the case studies will be presented by our very own Dr Paul Richmond, discussing the upcoming FLAME GPU 2 software and its use on Bede.

Tuesday 19 January 2021 — 14:00-15:00, Online

Anna Krystalli, White Rose PGR, Open Research Workshop

Talk: "Putting the R into Reproducible Research"

R and its ecosystem of packages offers a wide variety of statistical and graphical techniques and is increasing in popularity as the tool of choice for data analysis in academia.

In addition to its powerful analytical features, the R ecosystem provides a large number of tools and conventions to help support more open, robust and reproducible research. This includes tools for managing research projects, building robust analysis workflows, documenting data and code, testing code and disseminating and sharing analyses.

In this talk we’ll take a whistle-stop tour of the breadth of available tools, demonstrating the ways R and the Rstudio integrated development environment can be used to underpin more open reproducible research and facilitate best practice


Monday 18 January 2021 — 12:00-13:00, Online

Anna Krystalli, The University of Sheffield, Open Research Conversation: Reproducibility in Practice

Talk: "ReproHacks: Practicing reproducibility makes better"

Join us for a discussion on the importance of reproducibility and the challenges faced when attempting to reproduce others’ research.

Event Description

Reproducibility, recreating the results others have found in their research, presents many challenges for researchers, with several studies suggesting that much published cannot be reproduced. The University of Sheffield has an active reproducibility network, is a member of the UK Reproducibility Network and has created a new post of Research Practice Lead. This event will include presentations and discussions on reproducibility and the practical lessons learnt from running ReproHacks. Speakers will include:

  • Tom Stafford, Department of Psychology and the University’s Research Practice Lead
  • Anna Krystalli, Research Software Engineer

This event is part of a series of Open Research Conversations facilitated by The University of Sheffield Library, which include talks from researchers and other experts to share their experiences, promote best practices and discuss the challenges they are facing. Have a look at our other events and sign up here.

Talk by Anna Krystalli - ReproHacks: Practicing reproducibility makes better

In this talk, I introduce ReproHacks, one day reproducbility hackathons, where participants attempt to reproduce research from associated published code and data. I discuss the importance of providing space to practice reproducibility, some of the lessons learnt from running ReproHack events so far and some thoughts on moving forward.

Materials (slides) associated with the event can be found at

Thursday 17 September 2020 — 15:40-16:00, Online

Twin Karmakharm

Talk: "Singularity for HPC"

Twin Karmakharm gave a talk on Singularity for HPC at the HPC Champions event. The talk gives an introduction to the Singularity container technology and experiences learned from its uses at the University of Sheffield.

Link to slides.

Thursday 17 September 2020 — 13:50-14:00, Online

Twin Karmakharm

Talk: "Update on JADE and JADE 2 Tier 2 GPU-eqipped HPC systems"

Twin Karmakharm gave short update on the state of JADE and JADE 2 Tier 2 GPU-equipped HPC systems at the HPC Champions event.

Link to slides.

Monday 6 July 2020 — 14:00-16:00, Online

Will Furnass

Talk: "Deep Learning at TUOS: workflows, tooling and platforms"

The Faculty of Engineering at TUOS and the AMRC are to hold a two-hour workshop on data science to help better identify relevant opportunities and collaborators for future AMRC projects.

At this workshop the RSE team will provide a brief overview of:

  • What the team can offer in general
  • Deep learning (DL) projects the team has contributed to
  • Computational platforms that would be suitable for DL (specifically HPC systems containing GPUs)
  • How to collaborate with us.

Link to slides.

Tuesday 16 June 2020 — 11:30-12:10, Online

Bob Turner and Will Furnass

Talk: "Reproducible research through software"

Bob Turner and Will Furnass from the RSE team talked about methods and tools for reproducible computational workflows at one of ScHARR’s Statistics Academic Group’s regular seminars, covering:

  • The importance and purpose of software to/in research;
  • A definition of ‘good’ software;
  • The value of making data and code available inc. copyright and licensing;
  • Managing source code using version control;
  • Testing software for correctness and robustness;
  • How the RSE team can help and how researchers can collaborate with us.

Thursday 11 June 2020 — 15:00-16:00, Online


Talk: "N8 CIR Meet-Ups – N8 CIR RSE Highlights"

A set of four short talks from research software engineers working within the N8 Research Partnership. Full details of the speakers are yet to be confirmed. Full details will be published here closer to the event:

Thursday 4 June 2020 — 15:00-16:00, Online

Mark Turner, N8

Talk: "N8 CIR Meet-Ups – RSE Career Progression"

In this session Mark Turner (University of Newcastle) will explain some of the work done with Human Resources departments to set roles and responsibilities for each job grade from new RSEs to group leaders. There will also be feedback from senior RSEs who have sat on interview panels.

Find out more here:

Thursday 28 May 2020 — 15:00-16:00, Online


Talk: "N8 CIR Meet-Ups – N8 CIR RSE Highlights"

A set of four short talks from research software engineers working within the N8 Research Partnership. Full details of the speakers are yet to be confirmed.

Full details will be published here closer to the event:

Thursday 21 May 2020 — 15:00-16:00, Online

Alan Real and Paul Richmond, N8

Talk: "Introducing the N8 Bede GPU HPC cluster"

Following a successful EPSRC application, the N8 Centre for Computationally Intensive Research (N8 CIR) will soon host a new national Tier-2 HPC service, called Bede.

Bede is a novel architecture supercomputer based upon IBM Power 9 CPUs and NVIDIA Volta GPUs. This architecture supports memory coherence between the GPU and CPU and a hierarchy of interconnects to allow effective distributed GPU use. This architecture will extend the size of problems that can tackled beyond that of other GPU-accelerated architectures, increasing data sizes for accelerated simulation and analysis codes, and reducing the ‘time to science’ for a range of ‘hard’ problems.

The purpose of this facility is to enable new science across many different disciplines. The machine will be hosted by Durham University.

In this session Alan Real (Durham University) will introduce the machine and Paul Richmond (RSE team, University of Sheffield) will introduce some of the GPU work he has been doing and highlight the potential of GPUs for computational research.

This event is only open to those working or studying at one of the N8 Universities.


Tuesday 21 April 2020 — 15:00-17:00, Online @ 'Research reproducibility and good practice in statistical software', RSS Leeds / Bradford Local Group Seminar

Anna Krystalli, University of Sheffield

Talk: "Putting the R into Reproducible Research: Director's Cut"



14:55-15:00 Chair of the Leeds/Bradford Group


15:00-15:45 Michael Croucher, Product Manager Cloud & Machine Learning, Numerical Algorithm Group

Is your statistical software correct?

Performing modern statistical research is almost impossible without the aid of software. Many of us use statistical software to one degree or another – from small analyses in Microsoft Excel through to R or Python Scripts on our laptops right up to large machine learning pipelines in the cloud.

How can we be sure that our computational results are correct when we employ these techniques? In this talk, I will discuss what can go wrong and strategies we can employ to increase the probability that our computational research gives the right answer.

15:45-16:30 Anna Krystalli, Research Software Engineer, University of Sheffield

Putting the R into Reproducible Research

R and its ecosystem of packages offers a wide variety of statistical and graphical techniques and is increasing in popularity as the tool of choice for data analysis in academia.In addition to its powerful analytical features, the R ecosystem provides a large number of tools and conventions to help support more open, robust and reproducible research. This includes tools for managing research projects, building robust analysis workflows, documenting data and code, testing code and disseminating and sharing analyses.In this talk we’ll take a whistle-stop tour of the breadth of available tools, demonstrating the ways R and the Rstudio integrated development environment can be used to underpin more open reproducible research and facilitate best practice

16:30 - 16:45: Wrap-up

The committee and our speakers will stay online for a few more minutes to wrap up any discussions

Tuesday 24 March 2020 — 12:00-13:00, COM-G12-Main Lewin

Alberto Biancardi , University of Sheffield

Talk: "[POSTPONED] Object Oriented Programming in MATLAB?"

UPDATE: this has been postponed and will be rescheduled shortly.

Abstract: In the context of MATLAB, Object-oriented programming (OOP) offers a convenient way of creating new capabilities that are easily customised to specific needs. This talk will describe the basics of how OOP is handled in MATLAB, some selected cases where OOP is used within MATLAB itself, and then some examples, mostly from my personal experience in image computing, to illustrate how turning to OOP can help achieve two core outcomes:

  • streamline the implementation of new features and
  • simplify the way they are accessed by other users.

Bio: Alberto Biancardi is an Image Computing Scientist in the POLARIS group, Department of Infection, Immunity & Cardiovascular Disease, The University of Sheffield. His focus is on the analysis of images of the lungs, both structural and functional, reflecting the group’s expertise in the use of hyperpolarised noble gases in magnetic resonance imaging.

Tuesday 25 February 2020 — 12:00-13:00, COM-G12-Main Lewin

Martin Callaghan, University of Leeds

Talk: "From laptop to HPC to cloud: How to containerise your workflow to make it portable and reproducible"


In this talk I’ll discuss a number of use-cases for containerising your research codes and applications using both the Docker and Singularity ecosystems.

I’ll demonstrate the NVIDIA HPC Container Maker language (write your container definitions once to create both Docker and Singularity definitions) and how to combine Git, Github, Dockerhub and Singularity Hub to create a continuous test and development pipeline for your containers which you can then go on to use on your Desktop, in the Cloud or on HPC.


I’m Research Computing Manager and lead the Research Computing team at the University of Leeds.

Our team provides Research Infrastructures (including Cloud and High Performance Computing), Programming and Software Development consultancy across the University’s diverse research community.

My role is part Research Software Engineer, part trainer, part consultant and part manager, including running Research Computing’s comprehensive training programme for research staff and students.

My personal research interests are in text analytics, particularly using neural networks to summarise text at scale.

Link to slides.

Tuesday 28 January 2020 — 12:00-13:00, COM-G12-Main Lewin

David Jones, University of Sheffield

Talk: "CANCELLED - What is Garbage, and Why are we still Collecting it after 50 years?"

Please note: this event has been CANCELLED.

Abstract: Garbage Collection is a way of automatically reclaiming and reusing program memory that is no longer needed. It was invented in 1959 by John McCarthy as part of the implementation of the programming language LISP. A language with garbage collection means the programmer never has to worry about free’ing memory and frees their intellect to work on more interesting problems.

Once the province of languages that were regarded as esoteric and slow, garbage collection is now in every dominant language in use today (Java, R, Python, Ruby, Go, Lua, Lisp).

I give an overview of garbage collection algorithms and discuss some of the trade-offs that affect space/time performance, ease of implementation, and portability. We’ll have a brief splash around inside the implementation of the programming language Lua (PUC-Rio, Brazil).

Bio: David is a Research Software Engineer in the University of Sheffield’s RSE group.

David graduated from the University of Cambridge with a degree in mathematics and a Post-Graduate Diploma in Computer Science, and has since taken a variety of mostly systems programming roles in industry before recently being employed in The Academy.

David has expertise in C, Python, Go, Lua, embedded microcontrollers, programming language runtimes, Software Engineering Management, /bin/awk, and the PNG image format.

Tuesday 29 October 2019 — 12:00-13:00, COM-G12-Main Lewin

Chris Bording, IBM

Talk: "Virtualisation for Scientific Computing"

Abstract: Basic prime on using Virtual Compute Resources to do Science!

Based on my experiences working at a number of universities and in industry supporting researchers and helping them to migrate their computational work to HPC resources, there is a significant gap in the knowledge of a majority of researcher in how to do this migration from their laptop/desktop to HPC. What I have identified is that having access to virtual compute resource can help close this knowledge gap and improve the basic compute skills of researchers and greatly reduce the effort when taking that final step of migrating work to HPC resources.

There are a number of open source tools available that enable people to use virtual machines and we will talk about Virtualbox and Vagrant. Will cover how to setup the environments and integrate the virtual systems into your work. The basics of DEVOPS so you can work in a more collaborative and sustainable environment.

Bio: Chris Bording, is a Research Staff Member- Senior IT Architect Research User Support Environment with IBM Research in Daresbury. Chris’ areas of interest are in supporting scientific computing with High Performance Computing resources and developing virtual resources to support an array of scientific research projects. Leads several open source tools that are focused on make HPC and Cloud resources easier to support and use. He is also certified Software Carpentry trainer and regularly gives training at the Hartree Centre.

Link to slides.

Tuesday 24 September 2019 — 12:00-13:00, Ada Love Lace room

Rich FitzJohn, Imperial College London

Talk: "Fast and high-level solutions of ordinary differential equations with odin"

Abstract: Researchers often have to make trade-offs between expressiveness and performance when implementing their models. For sufficiently specific problems, these trade-offs can be removed by interfaces that reduce the gap between the way they would describe their problem and optimal ways for a computer to solve it.Ordinary Differential Equations (ODEs) turn up in many areas of infectious disease modelling and represent a trade-off of this form, especially in high-level languages such as R. We have developed a new R package “odin” ( to reduce this trade-off by creating a domain specific language (DSL) hosted in R that compiles a subset of R to C in order to efficiently express and solve ODEs.

I will present applications of “odin” both in a research context for implementing epidemiological models with 10’s of thousands of equations and in teaching contexts where we are using the introspection built into the DSL to automatically generate interactive web-based interfaces. I will also discuss how our RSE group works to support a large group of epidemiological modellers, how we use agile processes to manage our development, and to briefly highlight other tools from our group that help with general problems.

Bio: Rich has been a research software engineer in the “RESIDE” group ( at the Department of Infectious Disease Epidemiology and MRC Centre for Global Infectious Disease Analysis for the last 4 years. His focuses are infrastructure and tools that generalise problems common to research groups across the department. He is interested in reproducible research and in helping researchers get more science done per line of code that they write. Before moving to work full-time as an RSE, his research career involved modelling coexistence in tropical forests, diversification over macro-evolutionary timescales and the potential for gene flow from genetically-modified crops.

Tuesday 27 August 2019 — 12:00-13:00, COM-G12-Main Lewin

Ian Sudbery, University of Sheffield

Talk: "Going with the flow: Using workflow managers to co-ordinate multistep analysis pipelines across multiple compute nodes in a reproducible manner. "

Abstract: High performance compute facilities were originally envisaged to run single, monolithic programs that could run across multiple compute nodes. However many scientific analyses consist of a “pipeline” of smaller, interdependent tasks, with each step consisting of transforming, combining or summarizing data in some fashion or another. While each job will generally use only a small amount of resources (e.g. a single core, or a small number of cores on a single machine), a complete analysis will consist of running many of these smaller jobs for different input files(or subsets of input files) in an “embarrassingly parallel” manner. Orchestrating such an analysis and tracking the dependencies across multiple analysis steps manually can be a confusing, time consuming and error prone-task. Software tools that track dependencies in a pipeline are not new, with the oldest and most well known example being make, but a new generation of tools, designed specially with data analysis pipelines in mind has appeared in the last few years. Such systems allow the coordination of the many steps of an analysis into a reproducible/reusable recipe or script, handle submission of jobs that can be run in parallel to compute clusters (or even cloud appliances) and restarting half-finished analyses and log metadata about pipeline runs. They can be used to generate production grade, reusable systems, or as computational lab book that eases record keeping and distribution of prototype analyses across clusters.

I will review the common systems, the advantages of using them, and give a demonstration of developing a simple pipeline using one example tool: the CGAT-core/ruffus system.

Bio: Ian Sudbery is a Lecturer in Bioinformatics. He began his career as a wet-lab scientist working in Genomics and System Biology (in Cambridge and Boston, MA respectively), before laying down his pipettes and taking up the keyboard and code. A graduate of the MRC CGAT program to retrain lab scientists in computation, he joined the University of Sheffield in 2014 where he heads a group of 6 PhD students and postdocs studying Gene Regulation and its break down in disease, mainly through the analysis of high-throughput DNA and RNA sequencing data.

Link to slides.

Tuesday 30 July 2019 — 12:00-13:00, COM-G12-Main Lewin

Phil Tooley, Numerical Algorithms Group (NAG)

Talk: "HPC the easier way: Tools and techniques for managing software and making the most of your High Performance Computing resources"

Abstract: Modern HPC infrastructure such as ShARC at the University of Sheffield allows researchers across a broad range of fields access to resources for solving computationally complex problems. However, accessing and making best use of these resources is not always straightforward. Common problems faced by researchers wishing to use HPCs include difficulty in building the software packages that they need to use, as well as locating the documentation and information to enable them to make most efficient use of the available resources.

In this seminar I will describe some common problems faced by HPC users and demonstrate some of the available tools and techniques to overcome them. In particular I will look at how to use information gathered from previous jobs to choose the optimum parameters to submit a job with, optimising for minimum queuing time and maximum efficiency. I will also tackle the problem of installing software and dependencies, highlighting how package managers such as Conda and S-pack can be used to manage the installation of software. Using package managers can vastly reduce the complexity and time needed to manage research software as well as increasing reliability for the user.

Bio: Phil Tooley was a member of the RSE team here in Sheffield before leaving to take on the role of HPC Application analyst at the Numerical Algorithms group. He is a former theoretical and computational plasma physicist with particular interest in mathematical modelling, code optimisation and parallelism. He is an experienced developer of “traditional” parallel HPC codes using MPI and OpenMP in C, C++ and Fortran, but also champions the use of the Numpy/Scipy stack for scientific computing with python. This includes the use of accelerator technologies including Numba and Cython to write custom python code which is speed competitive with traditional compiled languages, possible in conjunction with parallel frameworks such as Dask.

Link to slides.

Thursday 27 June 2019 — 11:00-14:00, Richard Roberts building, University of Sheffield

Will Furnass, University of Sheffield

Talk: "'But it worked on my computer!' - curating research computing environments (lightning talk at University of Sheffield Reproducibility Showcase)"

On 27th June 2019 the university will host its first Reproducibility Showcase. The aim is to showcase current good practices at Sheffield and identify the challenges that make reproducing our research difficult.

This event will interest everyone who has wondered about the reliability of research they read in journals, thought about the best way to do their own research, or who is interested in innovations which are helping make research more reliable and open.

It includes a keynote from Professor Marcus Munafò, University of Bristol and UK Reproducibility Network lead: “Scientific Ecosystems and Research Reproducibility”. Also, Will Furnass from the RSE team will be giving a short talk on the challenges of curating research computing environments to make workflows (more) reproducible. The agenda:

  • 11:00: “Scientific Ecosystems and Research Reproducibility”. Keynote by Professor Marcus Munafo, University of Bristol and UK Reproducibility Network lead
  • 12:00 - 12.30: Lightning talks from attendees
  • 12.30 - 14:00: Lunch and Reproducibility Showcase

Registration (free)

Wednesday 26 June 2019 — 13:00-14:00, COM-G12-Main Lewin

Hannah Sewell, Sky Betting and Gaming

Talk: "CANCELLED: Transitioning into the tech Industry from a PhD"

Unfortunately we’ve had to cancel this event as the speaker is no longer available.

Abstract: Writing code in your PhD and in industry are wildly different. From the processes surrounding getting your code out into live to the way you work in a team, industry has its own set of rules to learn in order to become an effective software engineer. Having gone from working on code that maybe my PhD supervisor would flick through to helping to build a login used by 1500 customers a second the past 8 months have been enormous learning curve. In this talk I will discuss some of the differences I have experienced between writing code in your PhD and writing code in a large company. I’ll discuss the best practices we carry out as Login and Recovery Squad at Sky Bet, what its like to work in industry and what I wish I’d done with my code during my PhD in plant genetics.

Bio: Dr Hannah Sewell is a software engineer at Sky Betting and Gaming in the Login and Recovery Squad. She carried out her PhD in Quantitative Genetics in Animal and Plant Science at The University of Sheffield, finishing in 2018. She first learned to code through Code First: Girls HTML and CSS course and now codes predominantly in Javascript doing full stack web development.

Tuesday 28 May 2019 — 13:00-14:00, COM-G12-Main Lewin

Luke Mason, Application Performance Engineering Group Leader at Hartree Center

Talk: "Software Outlook"

Abstract: Software Outlook is a project funded by the EPSRC as part of the Computational Science Centre for Research Communities. It aims to support the UK’s Collaborative Computational Projects (CCPs) and High-End Computing (HEC) Consortia in the development and optimisation of their world-leading software. These activities include

  • Evaluations of new programming languages, libraries and frameworks
  • Investigations into timely and cost-effective exploitation of current and near-future systems
  • Demonstrations into how specific software technologies can be applied to existing applications

The challenges faced by CCP’s and HEC consortia in modernising their codes are fed directly into Software Outlook’s remit. During this talk, we will discuss these challenges and the subsequent work being done by Software Outlook.

Bio: Luke leads the High Performance Software Engineering group at the Hartree Centre. The group specialises in code scalability and performance on HPC (high performance computing) systems as well as porting and optimisation for emerging and novel architectures. He is experienced in working alongside both industrial and academic scientists to produce accurate and efficient code across a range of disciplines and architectures. His group is involved in development for the Met Office and the EPSRC funded “Big Hypotheses” project, to name just a few of its current activities.

Link to slides.

Wednesday 24 April 2019 — 12:00-13:00, COM-G12-Main Lewin

Tania Allard, Microsoft

Talk: "Machine learning at scale and in research: how different are they and what can they learn from each other?"

Abstract: The last decade has seen the largest growth in the areas of theoretical and applied machine learning, deep learning, and artificial intelligence of all history. This is in part due to the widespread availability of better, more powerful and cheaper computational resources than ever before. Another factor leading to this growth is the embedding of research machine learning groups in larger corporations. As a consequence, we have seen the emergence of ‘data powered’ services and applications in almost any industry, from retail and fintech to highly regulated areas such as healthcare and security.

Although the practice of embedding research scientists in startups and companies providing machine learning services has become more common there is still the belief that production and research and development ML are fundamentally different. When the truth is, they are both different sides of the same coin. But there is yet another question to ask: how different is industrial to academic research (in machine learning and data science) and what practices can and should be cross-pollinated from one environment to the other?

In this talk, I will cover the infrastructure and software practices which make it possible to serve ML services to the wider population at scale. I will then compare the industrial and academic machine learning research practices and provide insight into industry practices that could be adapted by researchers to improve their outputs, workflows, foster reproducible and transparent science and work collaboratively to solve global challenges.

By the end of the talk, the attendees will be better informed about the technologies that could improve and boost machine learning and data science research and practices.

Bio: Tania is a Developer Advocate with vast experience in academic research and industrial environments. Her main areas of expertise are within data-intensive applications, scientific computing, and machine learning. One of her main areas of expertise is the improvement of processes, reproducibility, and transparency in research, data science and artificial intelligence. Over the last few years, she has trained hundreds of people on scientific computing reproducible workflows and ML models testing, monitoring and scaling and delivered talks on the topic worldwide.

Link to slides.

Monday 8 April 2019 — 14:55-15:15, Firth Hall

Phil Tooley, University of Sheffield

Talk: "Here there be dragons: The HPC code development journey"

Abstract: Developing high performance codes, especially parallel codes can be a complex and time consuming process. In this talk I will describe the general process of taking a serial code or algorithm and developing a high performance parallel version using modern techniques and libraries, using my own projects as examples. I will discuss some of the available high level frameworks available in several languages and their advantages and drawbacks, and compare this to more traditional low level development using MPI or CUDA. Following this I will describe methods for testing, benchmarking and profiling of HPC codes using open source tools and free services. Finally I will discuss strategies for ensuring that software is sustainably developed to ensure that it is straightforwardly portable to other hardware and easily accessible to other users.

Link to slides.

Wednesday 27 March 2019 — 13:00-14:00, COM-G12-Main Lewin

Dr Christopher Woods, University of Bristol

Talk: "Running Serverless HPC Workloads on Top of Kubernetes and Jupyter Notebooks"

Abstract: The cloud holds the promise of a new way to perform digital science - interactive, elastically scaling, open data, open compute, and sharing reproducible workflows to collaboratively solve global grand challenge problems. The Research Software Engineering group at Bristol work with most of the major public cloud companies (Amazon, Google, IBM, Microsoft and Oracle) on a range of projects creating everything from elastically scaling slurm clusters, through workflows for Cryo-EM image refinement, to national platforms for tracking UK greenhouse gas emissions. Through this, we’ve recognised that comparing the cloud to on-premise HPC is like comparing a fixed-line telephone to a smartphone. Just as an iPhone is more than just a mobile telephone, so the cloud is more than just an “on-demand cluster”.

To this end, via all of our projects, we have been gradually building Acquire. Acquire provides multi-cloud identity and access management to cloud-based storage and compute services. Acquire is designed to make it easy for HPC jobs to be run interactively within Jupyter notebooks. Jupyter notebooks, deployed on top of kubernetes (k8s), are finding rapid adoption in universities and industry. While k8s can spawn new pods for each notebook session, launching high performance computing (HPC) jobs during dynamic workflows, and then managing access to the resulting output data is complicated. Acquire builds on top of the Fn serverless framework ( to deploy individual simulations as Fn functions that are called dynamically from workflows run within Jupyter notebooks. A notebook running on a lightweight k8s cluster can burst HPC workloads via Fn serverless calls to a dynamically provisioned cluster running on a bare metal or VM-based HPC/GPU cloud. Using Fn, we are constructing a distributed identity, access, and accounting layer around dynamically scaling compute resources and globally distributed object stores. This adds security and accountability, thereby making it easy for end users to manage complex multi-cloud workflows. Researchers can control costs by translating billing into units of “simulation” rather than “core hours”, and will be able to publish and share the results via access-controlled DOIs. Altogether, Acquire will help us realise the potential of the cloud as a truly planetary supercomputer. Put more succinctly, Acquire is helping us build the Netflix of simulation.

Bio: Christopher is an EPSRC Research Software Engineering (RSE) Fellow, managing the RSE Group in the Advanced Computing Research Centre at the University of Bristol. Christopher’s started his research career as a computational chemist, developing new methods and software for biomolecular simulation (, This software is now sold and used in the pharmaceutical industry ( Christopher’s aim is to improve the quality and sustainability of research software by raising awareness of the importance of software engineering skills, and advocating the development of sustainable funding pathways and careers for people who develop research software. Christopher is joint-chair of the Research Software Engineering Association ( He provides software engineering training to researchers across the UK (, and regularly provides advice to universities on how to set up and manage successful RSE teams.

From a start in computational chemistry, Christopher’s research now covers software engineering in everything from monitoring greenhouse gases, via manufacturing airplane components, to resolving Cryo-EM images and managing complex biomolecular simulation and data analysis workflows. Most of these projects now require developing and adapting software for the cloud. As such, the Bristol RSE group has a growing international reputation for being at the forefront of cloud software engineering research. Christopher contributes to long-term UK cloud strategy via close working relationships with engineers at many of the public cloud companies, and membership of the EPSRC eInfrastructure Strategic Advisory Team and UKRI eInfrastructure expert group.

Link to slides.

Wednesday 27 February 2019 — 11:00-12:00, COM-G12-Main Lewin

Will Furnass, David Jones, University of Sheffield

Talk: "A short talk on AWK and a slightly longer talk on conda"

The AWK Programming Language

Heralded by some[1] as “235 times faster than hadoop”, AWK is a programmable text processing tool that has been packaged with Unix, since 1979-ish, and has been in all major Unix standards since. AWK builds on the grep and sed tools, borrowing some of their syntax and style in a tool that is almost as concise but fully programmable. AWK is good at cutting through text-based datasets at blazing speed. David will introduce you to some of its ways.

Slides and notes:


conda: reproducible research software environments on your machine, HPC and the cloud”

conda is a tool for creating isolated environments of software packages. Over the last few years it has greatly simplified the challenge of providing the software dependencies required for many research computing workflows and it allows certain workflows to be easily reproduced on different hardware and operating systems (e.g. in the cloud or on HPC systems). In this talk Will covers: what problems conda tries to solve, the basics of using conda, alternative tools that address similar problems and issues that may arise when using conda.

Slides and source for slides.

Wednesday 30 January 2019 — 15:30-16:30, Lecture theatre 1, Hicks building

Dr Christopher Woods, University of Bristol

Talk: "How to design and engineer good code for research."

Abstract: The code you write to analyse your data and conduct your work is a valuable research tool. Understanding how to write good research code and software is very important. A broad range of skills are needed to ensure that software is sustainable, and that it fulfils research software’s unique requirements of being trustably correct, flexible, and performance portable. Research Software Engineering is a discipline that embraces these skills. Research software engineers (RSEs) employ tools such as documentation, version control and unit testing to ensure software is trustably correct. They use modular, component-based design to ensure that software is flexible and performance portable. Finally, an emerging career path for research software engineers in both academia and industry is supporting teams who now enhance the sustainability, quality and capability of research software, and provide training in research software engineering best practice to the next generation of researchers.

Tuesday 29 January 2019 — 12:00-13:00, COM-G12-Main Lewin

Dr Spiridon Siouris, University of Sheffield

Talk: "Taking up development of existing research codes. A discussion on common issues and effective ways of dealing with them"

Abstract: Research codes in Engineering are written with the aim of generating results as quick as possible so that interesting physical phenomena can be analysed. However, time constraints, lack of formal training in Computer Science/Software development, and pressure for results leads to software that can be problematic. In this talk we will discuss about:

  • The issues faced with research software and the need for maintainable and reliable code
  • The steps taken for quickly getting familiarised and productive in code development
  • The tools used, such as for documentation, debugging, profiling
  • Holistic approach on code optimisation
  • Good programming practices, along with references for further reading The aspects of software development discussed here will be helpful to researchers joining projects that require further development of existing software, and also researchers that are tasked to create new software.

Bio: Dr. Spiridon Siouris is a Research Fellow in the Low Carbon Combustion Group in Mechanical Engineering at The University of Sheffield. His research is focused on modeling chemically reacting flows in gas turbine fuel and lubrication systems using CFD and lower order modelling codes. Currently he is working on further developing a CFD code in which wall boundaries can deform according to deposition growth layers due to thermal oxidative degradation in jet fuels (FINCAP project with Rolls Royce). For over a decade he has been involved in numerous software projects for aerospace applications, and has worked alongside students, PhD researchers, and professional engineers developing modeling software. He has been exposed to a wide range in quality of coding, and his enthusiasm for computers, programming and engineering, drive him to educate people towards developing high quality software.

Link to slides.

Thursday 24 January 2019 — 14:00-15:00, Lecture Theatre 5, Hick Building

Phil Tooley and Will Furnass (RSE team), University of Sheffield

Talk: "Make it right, then make it go fast enough"

Abstract: The theory seminar group in the Department of Physics invited RSE Sheffield to give a talk on code optimisation. We took this opportunity to stress the importance of ensuring, using tests, that code is correct before and after optimisation and the value of profiling methods and tools to ensure that optimisation efforts are precisely targetted. We finished the talk by focussing on the things that most greatly contribute to software being sustainble: version control, test frameworks and automated testing and good documentation.

Bio: Phil and Will are Research Software Engineers in the University of Sheffield’s RSE group.

Link to slides.

Tuesday 30 October 2018 — 12:00-13:00, COM-G12-Main Lewin

Andrew Turner, EPCC

Talk: "Open source UK HPC benchmarking"

Abstract: The recent investments in national HPC services in the UK (for example the national Tier2 HPC services and DiRAC 2.5x) and the offerings from public cloud providers have led to a wider range of advanced computing architectures available to UK-based researchers. We have undertaken a comparative benchmarking exercise across these different services to help improve our understanding of the performance characteristics of these platforms and help researchers choose the best services for different stages of their research workflows. In this seminar we will present results comparing the performance of different architectures for traditional HPC applications (e.g. CFD, periodic electronic structure) and synthetic benchmarks (for assessing I/O and interconnect performance limits). We will also describe how we have used an open research model where all the results and analysis methodologies are publicly available at all times. We will comment on the ease (or not) of compiling applications on the different platforms in a performant way, assess the strengths and weakness of architectures for different workloads, and demonstrate the benefits of working in an open way

Benchmarking repository:

Bio: Andy works for EPCC at the University of Edinburgh and has a particular focus on helping users get the best from the national HPC services around the UK. Much of his recent work has focussed on achieving this through an open benchmarking exercise comparing performance across different HPC systems and different applications and by developing HPC Carpentry training to provide researchers with the tools they need to use advanced computing in their research.

Link to slides.

Wednesday 12 September 2018 — 15:30-16:30, Hicks building, Lecture Theatre 7

Rob Baxter, EPCC

Talk: "Don’t Panic! Demystifying Big Data, Data Science and all that."

Abstract: The world’s digital data are still doubling in number every year or so, and research is a big culprit. Whether you measure it by volume, velocity or variety, research data is getting Big. All is not lost, though. It is challenging, and it does take a slightly different approach, but dealing with oodles of research data is not as terrifying as it seems. In this talk I aim to give researchers some practical ideas on working with “big” research data, illustrated with examples of how we manage stuff like this at EPCC.


Wednesday 27 June 2018 — 15:30-16:30, Hicks building, Lecture Theatre 7

Jos Martin, Mathworks

Talk: "Software engineering in practise"

Abstract: Jos will discuss how MathWorks develop and extend MATLAB and its associated tools from an engineering perspective. Focusing on the opportunities – and challenges – in ensuring MATLAB continues to be the tool of choice in the world of multicore CPUs, in grids and clouds, on mobile platforms, and on GPUs. He discusses the processes involved in developing and testing MATLAB to drive usability features, development environment enhancements, and overall reliability.

Bio: Jos Martin is the senior engineering manager for parallel computing products at MathWorks. He has responsibility for all parts of Parallel Computing Toolbox, including the use of GPU and big data types in other areas. In addition, he is also responsible for MATLAB Drive, MATLAB Connector, and products that help link the desktop to the cloud. Before moving to a development role within MathWorks, he worked as a consultant in the U.K., writing large-scale MATLAB applications, particularly in the finance and automotive sectors. Prior to joining MathWorks in 2000, he held a Royal Society Post-Doctoral Fellowship at the University of Otago, New Zealand. His area of research was Experimental Bose-Einstein Condensation (BEC), a branch of low-temperature atomic physics.

Tuesday 19 June 2018 — 12:00-13:00, COM-G12-Main Lewin

Dr Patricio Ortiz, University of Sheffield

Talk: "Tackling the learning curve of scientific programming"

Abstract: Programming is part of the curriculum of students of computer science, and it will be complemented with other related subjects to make them knowledgeable on the subject. The situation of a science or engineering student is the opposite; typically they have one course to learn one language, and that language is usually not the one they will first face in real-life situations. This situation has occurred for decades, and it is likely not going to change, but there is a real need to better prepare science and engineering students to face the very steep learning curve of having to start programming as part of an ongoing project or their thesis.

Universities like ours offer excellent facilities like the HPCs supplied by IT Services, yet the reality is that many students and young researchers may have never used a Unix based system, let alone a parallel system.

The book I wrote, “first steps in scientific programmings” aims at facilitating the passage through the learning curve by providing tips based on years of experience and my interaction with students and brilliant young researchers who did not have the opportunity to learn anywhere else the challenges which programming in a scientific environment involve.

I will briefly describe the points which I think are more important to emphasise, points which I’ve confirmed as important by interacting with other experienced researchers at the U. of Sheffield, who are trying to provide support for the people starting in this field.

Link for the book:

A supportive link:

Bio: Patricio Ortiz holds a PhD. in astronomy from the University of Toronto, Canada. He has a keen interest in programming as a mean to create tools to help his research when no tools were available. He has taught at the graduate and undergraduate levels in subjects about astronomy, instrumentation, and applied programming. Throughout his career, Patricio has interacted with students at any level as well as post-graduates, helping him identify the most critical subjects needed by young scientists in the physical sciences and usually not covered by current literature. He has worked on projects involving automated super-nova detection systems; detection of fast moving solar system bodies, including Near-Earth objects and he was involved in the Gaia project (European Space Agency) for nearly ten years. Patricio also developed an ontology system used since its conception by the astronomical community to identify equivalent quantities. He also worked on an Earth Observation project, which gave him the opportunity to work extensively with high-performance computers, leading to his development of an automated task submission system which significantly decreases the execution time of data reduction of extended missions.

Patricio now works as a Research Software Engineer at the Department of Automatic Control and Systems Engineering at the University of Sheffield. He uses C, Fortran, Python, Java and Perl as his main toolkits, and as a pragmatic person, he uses the language which suits a problem best. Amongst his interests are: scientific data visualisation as a discovery tool, photography and (human) languages.

Link to slides.

Monday 30 April 2018 — 15:00-16:00, Broad Lane Lecture Theater 11 (BROAD-LT11)

Peter Heywood, University of Sheffield

Talk: "Accelerating Road Network Simulations using GPUs"

Abstract: Road network Simulations are a vital tool used in the planning and management of transport network infrastructure. Large-scale simulations are computationally expensive tasks, taking many-hours for a single simulation to complete using multi-core CPU architectures, hindering the use of large-scale simulations.

Using Graphics Processing Units (GPUs) we demonstrate that both Macroscopic (top-down) simulations and Microscopic (bottom-up) simulations can be considerably accelerated using many-core architectures and novel fine-grained data-parallel algorithms.

A multi-GPU version of the SATURN macroscopic road network simulation package has been developed, demonstrating assignment performance improvements of over 11x compared to a multi-socket CPU. A FLAME GPU based microscopic simulation demonstrates performance improvements of up to 65x using a single GPU compared to the commercial multi-core microsimulation tool Aimsun.

Bio: Peter Heywood is a Research Software Engineer and PhD candidate at the University of Sheffield. His research is focussed on using Graphics Processing Units (GPUs) to improve the performance of complex systems simulations; including transport network simulation and biological cellular simulations. Peter has presented his work at multiple international conferences and recently publisheded his work in the Simulation Modelling Practice and Theory Journal.

Monday 16 April 2018 — 15:30-16:30, Hicks building, Lecture Theatre 6

David Hubber, Researcher at Ludwig-Maximilians-Universität Munich

Talk: "Structuring code efficiently"

Abstract: David will discuss how to structure code efficiently. He will also discuss code module design, decoupling strategies and test-driven development.

Bio: David Hubber is an astrophysicist by training that has worked extensively on writing software such as GANDALF and SEREN to simulate star forming regions. He has been a researcher at the university of Cardiff, the university of Sheffield, and is currently a Postdoc researcher Ludwig-Maximilians-Universitat Munich.

Tuesday 28 November 2017 — 11:00-12:00, Lecture Theatre G02, Firth Court

Dr Stephen McGough, Newcastle University

Talk: "PARALLEM: massively Parallel Landscape Evolution Modelling"

Abstract: Landscape Evolution Modelling (LEM) is used to predict how landscapes evolve over the millennia due to weathering and erosion. LEM’s normally operate on a regular grid of cells each representing the height of a point within a landscape. The process can be broken up into the following stages applied to each cell: determination of the flow direction of water out of that cell, summation of the volume of water flowing through the cell and computation of the erosion / deposition in the cell. This process is repeated at regular intervals - either an annual time-step (due to computation complexity) or once for each major storm-event - for periods of around one million years.

Within each year/rainstorm there is great potential for speedup when executing on a GPGPU - flow direction for most cells can be performed independently, flow accumulation can easily be performed in parallel. However, due to certain ‘real world’ landscape features this potential can easily be lost. Landscape features such as plateaus - where all cells in a region have the same height - or sinks - (sets of) cells which have no lower neighbour, meaning water cannot escape - prevent optimal speedup being achieved. Likewise, computing the volume of water passing through a cell is inherently sequential - the sum of water through a cell at the end of a river depends on knowing the volume of water passing through all points upstream.

CUDA algorithms have been developed for computing the LEM processes: Flow direction can be computed independently for each cell; A parallel breadth-first algorithm can be used for routing water over a plateau to an outlet cell; A parallel technique for ‘filling’ sinks (making them into lakes) which performs much of its work through parallel pointer jumping is used. Flow accumulation can be performed using a developed ‘correct’ algorithm where each cell which has no incorrect flows entering it can be computed and marked correct. The process of erosion / deposition can then be performed in a similar manner to flow accumulation with the eroded material being passed around.

In this talk we will present parallel techniques for overcoming these problems, demonstrating between two and three orders of magnitude speedup over the best-in- class LEM software for landscapes between 0.1 and 56 million cells - far larger than the traditional 5 thousand cell simulations which have previously been performed. This has led to a need for a re-evaluation of the models used within the LEM community. Errors in LEM simulation results, which have been used over the last 30+ years, have been attributed to the very small simulation sizes. However, with our 46+ million cell simulations - a realistic scale - we are now able to determine that these errors are not due to scale but rather due to the equations themselves.

We will present our approach moving forwards to overcome these limitations and present initial results of this work.

Bio: Dr Stephen McGough is a Senior Lecturer in the School of Computing at Newcastle University, UK. He obtained his PhD in the area of Parallel simulation and has worked for many years in the areas of parallel computing and simulation. Holding posts at Imperial College London, UCL, Newcastle University and Durham University. This has led to over fifty publications in the area of parallel computing including receiving the NVIDIA best paper award at HiPC 2012. His research focuses on the use of novel computing technologies, such as GPGPU, to solve real-world challenges.

Link to slides.

Saturday 9 September 2017 — 13:00-14:00, Workroom 3 (205), The Diamond

Dr Thomas Nowotny, University of Sussex

Talk: "Opportunities and challenges for spiking neural networks on GPUs"

Abstract: In the past 6 years, we have developed the GeNN (GPU enhanced neuronal networks) framework for GPU accelerated spiking neuronal network simulations. In essence, GeNN is based on a simple design of code generation that allows a large extent of flexibility for computational models while at the same time taking care of some of the GPU specific optimisation work in the background. In this talk I will present the main features of GeNN, its design principles and show benchmarks. I will then discuss limitations, both specific to GeNN and to numerical simulation work on GPUs more generally and present some further work, including the SpineML-GeNN and Brian2GeNN interfaces. GeNN is developed within the Green Brain and Brains on Board projects and is available under GPL v2 at Github

Bio: Thomas Nowotny received his PhD in theoretical physics in 2001 from the University of Leipzig and worked for five years at the University of California, San Diego. In 2007 he joined the University of Sussex, where he is now a Professor of Informatics and the Director for Research and Knowledge Exchange at the School of Engineering and Informatics. His research interests include olfaction in animals and machines, GPU accelerated scientific computing, hybrid brain-computer systems and bio-inspired machine learning.

Link to slides.

Tuesday 8 August 2017 — 13:00-14:00, Workroom 3 (205), The Diamond

Twin Karmakharm, Research Software Engineer, University of Sheffield

Talk: "Containers for High Performance Computing"

Abstract: Containerization is a lightweight virtualisation technology where users can package workflows, software, libraries and data for running on various machines with minimal loss of performance. The technology can be used a way for scientists to deploy custom software on the HPC cluster and easily share reproducible software along with their data. The talk explores the concept of containers and existing container technologies suitable for the HPC. Particular focus is given for Singularity which has recently been introduced on the new Sheffield Advance Research Computer (ShARC) cluster.

Tuesday 30 May 2017 — 13:00-14:00, Workroom 2 (G05), The Diamond

Ania Brown, Research Software Engineer, Oxford e-Research Centre

Talk: "Towards achieving GPU-native adaptive mesh refinement"

Abstract: Modern simulations model increasingly complex multiscale systems, and the need to capture details at multiple length scales can lead to large memory requirements. Adaptive mesh refinement (AMR) is a method for reducing memory cost by varying the accuracy in each region to match the physical characteristics of the simulation, at the cost of increased data structure complexity. This complexity is a particular problem on the GPU architecture, which is most naturally suited to regular data sets. I will describe some of the optimisation and software challenges that need to be considered when implementing AMR on GPUs, based on my experience working on a GPU-native framework for stencil calculations on a tree-based adaptively refined mesh as part of my Master degree. Topics covered will include achieving coalesced access with the AMR data structure, memory defragmentation after grid changes and load balancing using space-filling curves.

Bio: Ania is a research software engineer at the Oxford e-Research Centre. Her research interests are a combination of performance optimisation for large scale scientific simulation and software development methodology to improve the quality of such codes. She received her Master degree from the Tokyo Institute of Technology in 2015.

Link to slides.

Thursday 4 August 2016 — 13:00-15:00,

Paul Richmond, Research Software Engineer, Sheffield

Talk: "Performance Optimisation for GPU computing in CUDA"

Paul Richmond gave a two hour lecture on Performance Optimisation for GPU computing in CUDA. The lecture is part of COM4521 on GPU computing but any CUDA developers who were keen on optimising their code were invited to attend.

Monday 25 April 2016 — 13:00-15:00,

Mike Croucher, Research Software Engineer, Sheffield

Talk: "Project Jupyter and SageMathCloud"

Mike Croucher gave a presentation at the University of Sheffield Chemical Engineering teaching away day on the use of `Project Jupyter and SageMathCloud for teaching computational subjects.

Contact Us

For queries relating to collaborating with the RSE team on projects:

Information and access to JADE II and Bede.

Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.

Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to the University IT helpdesk.