First GPU Computing Seminar - Towards achieving GPU-native adaptive mesh refinement


We've kicked off our first GPU Computing group seminar this year with a talk by Ania Brown from Oxford e-Research Centre titled "Towards achieving GPU-native adaptive mesh refinement" on 30th of May 2017. Adaptive mesh refinement (AMR) is a method for reducing memory cost by varying the accuracy in each region to match the physical characteristics of the simulation, at the cost of increased data structure complexity. Ania described the optimisation and software challenges that need to be considered when implementing AMR on GPUs, based on her experience working on a GPU-native framework for stencil calculations on a tree-based adaptively refined mesh as part of her Master degree.

There's no offical GPU Computing talks in June but we highly recommend the upcoming talk "From Democratic Consensus to Cannibalistic Hordes: The Principles of Collective Animal Behaviour" by Prof. Iain Couzin on the 29th of June 2017.

Links and More Information

For presentation slides and more information on both of these talks, visit the GPU Computing seminars page.

Research Software Engineer in High Performance Computing

A job opportunity within the RSE Sheffield group is available under the job title of "Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling".

The purpose of the Research Software Engineer post is to enhance The University’s capability and expertise in developing Research Software. This role will be based in the newly formed Research Software Engineering group, which aims to improve all aspects of research software including reproducibility, usability, efficiency and correctness.

The primary function of this role is to support the EC funded CompBioMed project. A user-driven Centre of Excellence in Computational Biomedicine, to nurture and promote the uptake and exploitation of high performance computing within the biomedical modelling community.

The post is fixed-term with an end date of 30 September 2019. The deadline for applications is 19th June 2017.

Links and More Information posting

University of Sheffield Application Link

Sheffield Code First:girls


Anyone working on any STEM area knows, as a fact, that we are facing a digital and technical skills gap. In the governement's digital strategy report last year it was highlighted that we would need an extra 745,000 workers with digital skills by 2017, as 90% of jobs require digital skills to some degree. On top of this, many technical areas suffer a diversity deficit (cultural and gender based). With the UK being among the European countries with the smaller number of female professionals in STEM areas.

Being this a rather complex problem, many people and organisations work hard to provide a solution to this issue. Some approaches adopted by such individuals and organisations are:

  • Encourage the hiring of highly skilled immigrants
  • Provide wider support for underrepresented minorities
  • Leverage inclusive and encouraging environments for those who demonstrate an interest in STEM areas
  • Support and train those willing to make a career change/or follow non-traditonal career paths
  • Approach the new generations and provide them with useful skills that would help them make an informed career choice

... and the list goes on.

I do believe, however, that the most fruitful approach is to work with the upcoming generations and provide them with useful technical and personal skills early on. This would not only make them better qualified for their future but would enable them to make informed decisions with regards to their professional future.

Code First: girls is a multi-award organization that aims to tackle the gender imbalance in three ways: training women, building a strong and supportive community, and helping companies to train, recruit, and retain their female force.

Belonging to a minority within STEM has lead me to take an active role as an equality and diversity ambassador, which eventually lead me to volunteer as a Python instructor for the Code First courses.

Over the course of 8 weeks we teach and guide groups of around 30 women with various levels of coding experience in CSS/HTML, Python, or Ruby. These courses are a mixture of in-person classes and self-learning, at the same time the ladies involved work in teams of 2-4 people to build a project of their own interest.


The idea behind these workshops is rather simple: train people and provide them with practical use of the skills they are learning. Having as a final objective to develop a fully deployed RESTful app. But the whole CF:girls thing goes way beyond that. Over those 8 weeks the girls form a strong, motivating, and supportive community, in which they can acquire new skills, meet like-minded people, learn from other women working in STEM areas, and even attend external women in tech events!


I find rather interesting the mixture of apps and projects pursued, as well as the high quality of the presented final products. But beyond that, I find this to be an excellent opportunity to give back to the amazing community that has adopted and welcome me as a professional in a STEM area. Thus I can say for sure I will be getting involved in more Code First events/workshops.

Coffee and Cakes Event

RSE Sheffield is hosting another coffee and cakes event on May 31st at 14:00 in the Ada Lovelace room on 1st floor of the Computer Science Department (Regents Court East). Attendance is free, but you need to register via this link.

Take the opportunity to come and have an informal chat about research software.

This event is a community event for anyone, not just computer science or members of the RSE team. If you work on software development are an RSE or simply want to talk about some aspect of software or software in teaching then come along.

Building Linux GPU Code with NSIGHT in Windows

Why would you possibly want to build and execute CUDA GPU applications within NSight Eclipse for Linux within Microsoft Windows? Well if you use windows as your main OS there are plenty of reasons but the most obvious is that you may be developing cross platform code and want to build and test it without dual booting. If you are thinking about virtual machines then forget about it. Most (except some very expensive enterprise options) do not have the ability to access a GPU device (e.g GPU pass-through) from within a virtual machine.

The purpose of this post is to describe how to install the necessary tools to permit local GPU development inside the Linux NSight IDE from within Windows. The advantages of which are not only cross platform development but also the ability to locally develop in powerful Linux IDE with remote execution and graphical debugging. This is particularly helpful if you want to execute or debug your code on a HPC system (like Sheffield's ShARC system) from Windows. The post focuses on the use of the new Windows 10 Linux subsystem, however you could use the approach to install CUDA tools on a lightweight Linux virtual machine. The concept is the same either way. i.e. build and debug locally execute remotely.

Configuring the Linux Windows Subsystem for CUDA compilation

The Windows 10 subsystem for Linux is available in the anniversary update. You can install it from the "Turn on or off windows features" dialogue. It is listed under "Windows subsystem for Linux (beta)". This alone is not enough to build our GPU applications as we will need to install CUDA. A normal CUDA install will require a local GPU and the installation of a CUDA compatible graphics driver. Fire up the Windows Bash Shell (or a Linux virtual machine). You can then use the following commands to install the CUDA toolkit without installing a graphics driver. This will install the core NVIDIA CUDA compiler (nvcc) and NSight. You can update the CUDA_REPO_PKG variable to install a different CUDA version.

sudo apt-get update
sudo dpkg -i $CUDA_REPO_PKG
sudo apt-get update
sudo apt-get install -y --no-install-recommends cuda-core-8-0 cuda-cudart-dev-8-0 nsight

You can now create a symbolic link to a generic CUDA install. This will permit the addition and fast swapping of different CUDA versions.

sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda 
export PATH=$PATH:/usr/local/cuda/bin

Note: if you want the CUDA bin location to be persistently on the PATH (after you reboot the Bash shell) then you will need to add the export PATH line to your .bashrc profile. Test that the install was successful by running nvcc.

nvcc --version

This should give you some information on the nvcc version. e.g.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

The CUDA toolkit is now installed so you can build (but not execute) CUDA GPU programs using the Linux bash shell.

Graphical editing with Nsight IDE

To be able to run a graphical NSight IDE from within the Windows subsystem for Linux you will need to be running a X server within Windows. You can install the free XMing application for this purpose. If you would rather use a Linux virtual machine then you can avoid this step as the virtual machine will most likely have an X server included. The advantage of the Windows subsystem approach is that it is very lightweight. From within your Bash terminal you will need to set the following environment variable.

export DISPLAY=:0

The display variable is an environment variable passed to graphical applications. In this case, the value of :0 it tells the application to use the first display on the local system (our XMing server in this case). If you want to make this environment variable change permanent then you should add it to your .bashrc profile. You can now run the NSight application from Bash.


Glorious isn't it. Within NSight we can create a new CUDA project which will compile using the local CUDA install. In order to remotely execute and debug you can use the "C++ Remote application" run configuration. This will require SSH access to a suitable Linux machine with a GPU and CUDA installed.

Future blog posts will cover how remote execution and debugging can be achieved on the University of Sheffield ShARC system. ShARC has a typical of job based HPC system which encourages job submission rather than execution of code on worker nodes via SSH logins.

Summary of Bash Profile Changes

I added the following to my .bashrc profile (located in the home directory) to ensure that NSight could be launched straight after starting the Bash shell in Windows.

# add cuda bin dir to path
export PATH=$PATH:/usr/local/cuda/bin

# export the display environment variable
export DISPLAY=:0

Spark and Scala on Sheffield HPC systems

As part of our support for a Large scale machine learning MSc course in Computer Science, the Sheffield RSE group put together a tutorial for how to use Spark and Scala on Sheffields HPC systems. We are sharing with the rest of the community in case its useful to you

Its for people whove never used a HPC system before. By the time theyve finished, they are able to submit their own Spark jobs to the HPC cluster. If anyone is interested in us re-running this as a workshop (it takes around 2 hours) let us know.

Some notes on our current implementation of Spark on HPC:-

  • We are currently restricted to jobs that run on one node. This is because Sheffields HPC clusters are not traditional Hadoop/Spark clusters and so some level of integration is required between Sun Grid Engine and Spark. We've only managed to get as far as implementing this across single nodes at the moment.

  • One way weve fudged this is to make sure that we provide our students with access to nodes with a LOT of memory 768 GB per node in fact, 12 times as much as you get on a normal node on ShARC or Iceberg. We are experimenting with allowing others access to our kit via a contribution based model. See for details.

Job validation with Grid Engine: false negatives

In a previous post, I noted that if you're not sure if a Sun Grid Engine (SGE) job can ever run on an HPC cluster you can perform 'dry-run' job validation: by passing -w v as arguments to qrsh/qrshx/qsh/qalter you can ask the SGE scheduler software if your job could ever run if the cluster were entirely empty of other jobs.

For example:

    qsub -pe smp 2 -l rmem=10000G -w v myjob.sge

would most likely tell you that your job could not be run in any of the cluster's job queues (due to the size of the resource request).

But beware: as mentioned in my earlier post this his job validation mechanism sometimes results in false negatives i.e. you are told that a job cannot run even though though in reality it can.
This is something that the HPC sysadmin team at the University of Leeds alerted us to.

Here's an example of a false positive (using our ShARC cluster.
If you ask for a single-core interactive session with access to four GPUs then dry-run validation fails:

    [te1st@sharc-login1 ~]$ qrsh -l gpu=4 -w v
    verification: no suitable queues

yet (without validation) the resource request can be satisfied:

    [te1st@sharc-login1 ~]$ qrsh -l gpu=4 
    [te1st@sharc-node100 ~]$   # works!

The reason for this appears to be that the validation is performed without running any Job Submission Verifier (JSV) scripts. These scripts are run (typically on the SGE master machine) on every submitted job to centrally modify or reject job requests post-submission.

On ShARC the main JSV script changes a job's Project from a generic one to gpu if x > 0 GPUs have been requested using -l gpu=x. The job can then be assigned to (GPU-equipped) nodes associated with that project. So, if the JSV is not run before job validation (using -w v) then validation of jobs that request GPUs will fail as no nodes (more accurately queue instances) will be found that can satisfy the resource request given the (default) project of jobs.

The workaround here is to explicitly request a Project (using e.g. -P gpu) when trying to validate a job using -w v i.e. partly duplicate the logic in the (bypassed) JSV, but this requires that you know have read and understood the JSV.
This is something that users may not want to do and adds complexity, when the whole point of investigating job validation in the first place was to find a simple way by which users could check if if their jobs could run on a given SGE cluster.

In summary, SGE's job validation mechanism is not a fool-proof option for users as it does not take into consideration changes made to a job by Job Submission Verifier scripts post-submission.

Introduction to Modern Fortran

In February, the Research Software Engineering group hosted an ‘Introduction to Modern Fortran Course’ taught by EPSRC Research Software Engineering Fellow, Ian Bush. The course material is available at

During the day, Ian recommended a bunch of books (below)

We’ve been working with the University library and I’m happy to announce that all of these are now available to borrow. Search for them using the University catalogue

Determining MPI placement on the HPC clusters

Say you request a 16 slot MPI job on ShARC with 3GB per-process using a submission script like the one below:

#Tell the scheduler that maximum runtime is 1 hour
#$ -l h_rt=1:00:00
#Request 16 slots
#$ -pe mpi 16
#Request 3 Gigabytes per slot
#$ -l rmem=3G

#Load gcc 4.9.4 and OpenMPI 2.0.1
module load dev/gcc/4.9.4
module load mpi/openmpi/2.0.1/gcc-4.9.4

mpirun  ./MPI_hello_world

The scheduler is free to decide where on the system your 16 slots get placed. You may have all 16 slots running on one node, one slot per node for 16 nodes or anything in between. The exact placement of your jobs may affect runtime.

We can find out where the scheculer placed your MPI processes using the $PE_HOSTFILE environment variable. When your job starts running, this points to a file that contains placement information. We make use of it in a submission script as follows

#Tell the scheduler that maximum runtime is 1 hour
#$ -l h_rt=1:00:00
#Request 16 slots
#$ -pe mpi 16
#Request 3 Gigabytes per slot
#$ -l rmem=3G

#Load gcc 4.9.4 and OpenMPI 2.0.1
module load dev/gcc/4.9.4
module load mpi/openmpi/2.0.1/gcc-4.9.4

#Put placement information into node_info.txt
cat $PE_HOSTFILE  > node_info.txt

mpirun  ./MPI_hello_world

You'll now get a file called node_info.txt that contains information about which nodes your MPI slots were placed. For example 1 UNDEFINED 1 UNDEFINED 1 UNDEFINED 1 UNDEFINED 1 UNDEFINED 2 UNDEFINED 2 UNDEFINED 3 UNDEFINED 4 UNDEFINED

In the above example, 4 slots were placed on node059, 3 slots on node 50, 2 slots on nodes 080 and 090 and one slot on the other listed nodes.

Job validation with Grid Engine

(Edit: caveats are listed in a more recent post)

Computer cluster job scheduling software is fantastic at managing resources and permitting many jobs to run efficiently and simultaneously.
However, schedulers aren't always great at giving end-users feedback when things go wrong.

For example, on our ShARC cluster, which runs the (Son of) Grid Engine (SGE) scheduler, if you request a longer run-time than is permitted by any of the cluster's job queue configurations then your job will sit there queueing indefinitely until you or someone else deletes it.
For example, let's use qsub to submit a job where we ask for 1000 hours of run time and 4 GiB of RAM:

[will@mysofa ~]$ ssh sharc
[te1st@sharc-login1 ~]$ qsub -l h_rt=1000:00:00 -l rmem=4G -m bea -M -N longtask myjobscript.sge

Your job 236268 ("STDIN") has been submitted
[te1st@sharc-login1 ~]$ qstat -u $USER
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
 217834 0.00000 longtask   te1st        qw    03/20/2017 10:48:39                                    1        

Job 217834 will now sit queuing forever.
Not only will you not be told why, you won't be given any notification that the job will not run.

In situations like this it can be useful to ask the scheduler to validate a job.
One way of doing this is to run 'qalter -w v <myjobid>' after job submission if say you think that a job has now been queueing for longer than previously-submitted jobs of a similar nature:

[te1st@sharc-login1 ~]$ qalter -w v 217834
Job 217834 (-l h_rt=3600000) cannot run in queue "flybrain.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "gpu.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "gen2reg.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "rse.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "gpu-vis.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "insigneo-polaris.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "interactive.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "shortint.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "all.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "evolgen.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "rse-training.q" because of cluster queue
Job 217834 (-l h_rt=3600000) cannot run in queue "cstest.q" because of cluster queue
verification: no suitable queues

What this 'qalter -w v <myjobid>' command does is check to see whether the job could run in any of the job queues on the cluster if the cluster were free of other jobs.

The last line of output is key: our job will never be run given the current cluster configuration.
Looking above that, we can see that it cannot run in any of the general-purpose job queues (such as all.q) and there is specific mention of our 1000 hour (3600000s) run-time resource request.
We can therefore deduce that our run-time resource request wasn't satisfiable.

Modifying a resource request post-submission

Once we know that our job can't run we could then delete our job...

[te1st@sharc-login1 ~]$ qdel 217834 
te1st has deleted job 217834 

...then consult the cluster's documentation to discover the maximum possible run-time and resubmit using more sensible resource requests.

Alternatively we can use qalter to modify the resource requests associated with a queueing job:

qalter -l h_rt=96:00:00 -l rmem=4G 217834 

Important: using qalter in this fashion will change all resource requests for the job so here we need to re-specify the rmem request.

Job validation at submission time

You can also perform the same type of job validation at job submission time using -w v e.g.

qsub -w v -l 1000:00:00 -l rmem=4G myjobscript.sge

This won't actually submit your job; it just performs validation.

Why is validation not performed by default?

You may ask why such validation is not enabled by default for all jobs; one reason for this is that it is believed it would place undue burden on the scheduler.

Another is that sometimes a validation attempt results in a false negative that can be difficult to automatically identify (edit: see this more recent post for details).

Other types of resources

If you repeat the experiment outlined above but instead of requesting 1000 hours of runtime you ask for 100 GPUs, 9999GB of RAM or 10000 cores you'll observe the same behaviour: jobs that make requests unsatisfiable under the current cluster configuration can be submitted but will never run.

Again, job validation can help here but depending on the type of resource the validation error messages can be more or less cryptic.
For example, if you try to validate a 100000-'slot' (core) MPI job using -w v you get the following:

qsub -pe mpi 100000 -w v somejob.sge
Job 311838 cannot run in PE "mpi" because it only offers 0 slots

This is rather misleading but the mention of 'slots' should prompt you to check the number of cores you've requested is sensible.

'Poke' validation: consider the current cluster load

Another type of validation is poke validation, which checks if a job could be run under the current cluster load i.e. with many of the cluster's resources already in use.
See man qsub and search for -w for more information on the different types of validation.