Mozsprint 2017 at the University of Sheffield


The 1st-2nd of June 2017 saw the Mozilla Global Sprint circle the globe for another time this year. It's Mozilla's flagship two-day community event, bringing together people from all over the world to celebrate the power of open collaboration by working on a huge diversity of community led projects, from developing open source software, building open tools to writing curriculum, planning events, and more. So here's a few of my own thoughts and reflections on this year's happenings.

Lead up to the sprint

Open Leadership Training mentorship

I joined my first Mozilla Global Sprint last year as the culmination of the Science Lab’s inaugural Working Open Workshop and mentorship program. I worked on my very own open science project, rmacroRDM which I'd spent the lead up to the Sprint preparing for. This year however it was a different experience for a number of reasons.

Firstly, the roles have been reversed, and from mentee, I was now a seasoned open leadership mentor. In fact, I had enjoyed the Open Leadership training program so much that I’d volunteered to mentor on the following two rounds, the first culminating at MozFest 2016 and this latest round at the Global Sprint 2017. Apart from staying connected to the vibrant network of movers and makers that is Mozilla, I also found I got a lot out of mentoring myself. From improving skills in understanding different people’s styles and support requirements to being introduced to new ideas, tools and technologies by interesting people from all over the world! Overall I find mentorship a positive sum activity for all parties involved.

So the lead up this year involved mentoring two projects while they prepare to launch at the global sprint. The Open Leadershp Training program involves mentees working through the OLT materials over 13 weeks while developing the resources required to open their projects up, ready to receive contributions. On a practical level, the program teaches approaches to help clearly define and promote the project and the use of github as a tool to openly host work on the web, plan, manage, track, discuss and collaborate. But the program delves deeper into the very essence of building open, supportive and welcoming communities in which people with an interest in a tool/cause/idea can contribute what they can, learn and develop themselves and feel valued and welcome members of a community.

Weekly contacts with the program alternated between whole cohort vidyo call check-ins and more focused one-on-one skype calls between mentors and mentees. This round I co-mentored with the wonderful Chris Ritzo from New America’s Open Technologiy Institute and we took on two extremely exciting projects, Aletheia and Teach-R.



Mentee projects:


Headed up by Kade Morton (@cypath), a super sharp, super visionary, super motivated, self-described crypto nerd from Brisbane, Australia, Aletheia doesn't pull any punches when describing it's reason for being:


In response they're building a decentralised and distributed database as a publishing platform for scientific research, leveraging two key pieces of technology, IPFS and blockchain. Many of the technical details are frankly over my head but I nonetheless learned a lot from Kade’s meticulous preparation and drive. Read more about the ideas behind the project here.




What can I say about Marcos Vital, professor of Quantitative Ecology at Federal University of Alagoas (UFAL), Brazil and fellow #rstats aficionado apart from he is also a huge inspiration! An effortless community builder, he runs a very successful local study group and has built a popular and engaged online community through his lab facebook page promoting science communication.

The topic of his project Teach-R is close to my heart, aiming to collate and develop training materials to TEACH people to TEACH R. Read more about it here



Hosting a Sheffield site.

Secondly, this year I helped host a site here at the University of Sheffield, and seeing as the sprint coincided with my first day as a Research Software Engineer for Sheffield RSE, we decided to take the event under our wing. With space secured and swag and coffee funds supplied by the Science Lab, the local site was read for action!



The Sprint!

Sprint at the University of Sheffield.

The was a good buzz of activity throughout the sprint at the site, with a few core participants while others came and went as they could. At the very least, roaming participants managed to soak up some of the atmosphere and pick up some git and github skills,...a success in my books!

Stuart Mumford (@StuartMumford) led project SunPy, a python based open-source solar data analysis environment and attracted a number of local contributors, including a new PhD student, although, as is often the case, much of the first morning seemed to be spent battling python installation on his laptop! Worth it for picking up a local contributor that will hopefully remain engaged throughout his studies though, and the team managed to push on with bug fixes and documentation development.

Jez Cope (@jezcope), our University's Research Data Manager was contributing to Library Carpentry, one of the biggest and most popular projects at his year's Sprint and also brought super tasty banana bread. He's also blogged about his experiences here.

Myself, while of course tempted by the many R, open science and reproducibility projects on offer, in the end chose to work on something unrelated to what I'm lucky to do for work and focus on a project I'm interested in personally. So I teamed up with Tyler Kolody (@TyTheSciGuy) on his timely project EchoBurst. The project aims to address our growing, social media facilitated, retreat into echo chambers, which is resulting in increasingly polarised public discourse and an unwillingness to engage with views we disagree with. The idea is to attempt to burst through such bubbles, by developing a browser extension with the potential to distinguish toxic content, more likely to shut down discussion, from more constructive content that might be able to bridge different perspectives.

Admittedly the project is very ambitious with a long way to go, many stages and various techniques/technologies to incorporate including natural language processing, building the browser plugin and even considering psychological and behavioural aspects in designing how to present information that might oppose a user's view without triggering the natural shut-down response.

There was plenty of really interesting brainstrorming discussion but the biggest initial challenge, and where the project could use the most help, is in collecting training data. The main approach is for contributors to help collect URLs of blogs on polarising topics from which to scrape content. But during the sprint we also added the option for contributors to add relevant youtube videos to collaborative playlists. We also started working on simple R functions to help scrape and clean the caption content.


Sprint across the globe

What a productive event this year's sprint was! While details of the level of activity have been covered and storyfied elsewhere and the final project demos can be found here and here, I just wanted to highlight some basic stats:

Global #mozprint involved:
  • 65 sites (+ virtual participants)
  • 20 countries
  • 108 projects
During the 50 hour #mozsprint, we saw:
  • 302 pull requests closed
  • 320 pull requests opened
  • 2223 comments & issues
  • 824 commits pushed

BOOM!

(access the full data on github activity here)


Mentee progress

I was really happy to see both our mentees get great responses, pick up new contributors and make good progress on their projects.

  • Marcos expertly moderated a very active gitter channel for Teach-R, attracted a number of excellent and very engaged new contributors, adding a number of new lessons, in both English and Portuguese!.

  • Kade also got great engagement for Aletheia, including onboarding science communicator Lisa Mattias (@l_matthia), who's already blogged about their plans to take the project forward by applying to present it at this year's Open Science Fair. Importantly, he also managed to attract the javascipt developer they've been desperately looking for. Success! You can read more about Kade's experiences of the sprint here.

They both made us very proud indeed!



Highlights

But the most important feature of the sprint for me every year is the global comradery and atmosphere of celebration. Handing off from one timezone to the other and checking in within our own to hear from leads about their project needs and progress, hanging out with participants from far and wide on vidyo and through streams of constant messaging on gitter, catching up with friends across the network...



...and cake...sooooooooo much cake!!

disclaimer: this cake was sadly not at the Sheffield site. It definitely has inspired me to put a lot more effort into this aspect of the sprint next year though!


Final thoughts

The end of the sprint is always a bit sad but the projects live on, hopefully with a new lease of life. So if, by reading this, you're inspired to contribute, check out the full list of projects for something that might appeal. There's a huge diversity of topics, tasks and skills required to chose from and fun new people to meet!

So does the network so if you’ve got an exciting idea of your own that you think would make a good open source project make sure to check out @MozOpenLeaders and look out for the next mentorship round.

As for the impact on Sheffield RSE, well there was one point where we managed to get the full team and loose collaborators working in one room (we’re normally spread out across the university). It felt great to work together from the same space so we decided to make a point of routinely booking one of the many excellent co-working spaces the University of Sheffield has on offer and establish regular work-together days!

So thanks for the inspiration and excellent times Mozilla! Till the next time!

(ie Mozfest 2017!)



Sounds:

Apart from the coffee and good vibes, the day was also fuelled by sounds. Here's a couple of the mixes that kept the Sheffield site going!

Grooves no. 1:


Grooves no. 2:




tmux: remote terminal management and multiplexing

Today we have a guide to 'terminal multiplexing' including suggestions on how to use it on computer clusters such as ShARC and Iceberg.


Have you ever?

  • Started a process (such as a compilation or application install) over SSH only to realise that it's taking far longer than you expected and you need to shut down your laptop to go to a meeting, which you know will therefore kill both the SSH connection and your process?
  • Been in a cafe with flakey wifi and had a remote process hang or possibly die due to an unstable SSH connection?
  • Accidentally closed a window with a SSH session running in it and really regretted it?
  • Wanted to be able to switch between multiple terminal sessions on a remote machine without having to establish a SSH connection per session?
  • Wanted to be able to have multiple terminals visible at once so you can say edit source code in one terminal whilst keeping compilation errors visible in another?
  • Wanted a nicer way to copy and paste between remote terminal sessions?

If the answer to any of these is "yes" then terminal multiplexing may help!

Making remote Linux/Unix machines easier to administer/use!

First, we need to delve a little deeper into some of the problems we are trying to solve.

Why do my remote processes die when my SSH connection dies/hangs?

(Skip over this section if you want!)

Every process (bar the systemd process or init process with a process ID of 1) has a parent process. If a process is sent a signal telling it to cleanly terminate (or 'hang up') then typically its child processes will be told to do the same.

When you SSH to a remote machine, the SSH service on that machine creates a shell for you within which you can run commands.

To illustrate, here I logged into a server and used the pstree program to view the tree of child-parent relationships between processes. Notice in the excerpt shown below that the SSH service (sshd) has spawned a (bash) shell process for my SSH session, which in turn has spawned my pstree process:

[will@acai ~]$ ssh sharc
...
[will@sharc-login1 ~]$ pstree -a
systemd --switched-root --system --deserialize 21
...
  ├─sshd -D
  │   └─sshd
  │       └─sshd
  │           └─bash
  │               └─pstree -a
...

So if the SSH service decides that your connection has timed out then it will send a signal to bash process were to die then any child processes started by that bash process would also die.

If the remote servers you work with are primarily High-Performance Computing (HPC) clusters running scheduling software such as Grid Engine then you have a simple, robust way of ensuring that the sucess of your processes doesn't depend on the reliability of your connection to the clusters: submit your work to the scheduler as batch jobs. There are many other benefits to submitting batch jobs over using interactive sessions when using such clusters but we won't go into those here.

However, what do you do when there is no HPC-style scheduling software availble?

  • You could run batch jobs using much simpler schedulers such as at for one-off tasks or cron or systemd Timers for periodic tasks.
  • You could prefix your command with nohup (no hang up) to ensure it continues running if the parent process tells it to hang up.

Neither of these allow you to easily return to interactive sessions though. For that we need terminal multiplexers.

A brief guide to the tmux Terminal Multiplexer

Detaching and reattaching to sessions

Terminal Multiplexer programs like GNU Screen and tmux solve this problem by:

  1. Starting up a server process on-demand, which then spawns a shell. The server process is configured not to respond when being told to hang up so will persist if is started over a SSH connection that subsequently hangs/dies.
  2. Starting up a client process that allows you to connect to that server and interact with the shell session it has started
  3. Using key-bindings to stop the client process and detatch from the server process.
  4. Using command-line arguments to allow a client process to (re)connect to an existing server process

Demo 1

Here we look at demonstrating the above using tmux. I recommend tmux over GNU Screen as the documentation is clearer and it makes fewer references to legacy infrastructure. Plus, it is easier to google for it! However, it may use more memory (true for older versions).

Let's create and attach to a new tmux session, start a long-running command in it then detach and reattach to the session:

Used keys:

<prefix> d: detatch

where <prefix> is Control and b by default. Here <prefix> d means press Control and b then release that key combination before pressing d.

In this case we started tmux on the local machine. tmux is much more useful though when you start it on a remote machine after connecting via ssh.

Windows (like tabs)

What else can we do with terminal multiplexers? Well, as the name implies, they can be used to view and control multiple virtual consoles from one session.

A given tmux session can have multiple windows, each of which can contain multiple panes, each of which is a virtual console!

Demo 2

Here's a demonstration of creating, renaming, switching and deleting tmux windows:

Used keys:

<prefix> ,: rename a window
<prefix> c: create a new window
<prefix> n: switch to next window
<prefix> p: switch to previous window
<prefix> x: delete current window (actually deletes the current pane in the window but will also delete the window if it contains only one pane)

Dividing up Windows into Panes

Now let's look at creating, switching and deleting panes within a window:

Used keys:

<prefix> %: split the active window vertically
<prefix> ": split the active window horizontally
<prefix> Up or Down or Left or Right:
  switch to pane in that direction

Scrolling backwards

You can scroll back up through the terminal history of the current pane/window using:

<prefix> Page Up:
  scroll back through terminal history

Copying and pasting

If you have multiple panes side-by-side then attempt to copy text using the mouse, you'll copy lines of characters that span all panes, which is almost certainly not going to be what you want. Instead you can

<prefix> z: toggle the maximisation of the current pane

then copy the text you want.

Alternively, if you want to copy and paste between tmux panes/windows you can

<prefix> [: enter copy mode

move the cursor using the arrow keys to where you want to start copying then

space: (in copy mode) mark start of section to copy

move the cursor keys to the end of the section you want to copy then

enter: (in copy mode) mark end of section to copy and exit copy mode

You can then move to another pane/window and press

<prefix> ]: paste copied text

I find this mechanism very useful.

And there's more

Things not covered in detail here include:

Using tmux on HPC clusters

Terminal Multiplexers can be useful if doing interactive work on a HPC cluster such as the University of Sheffield clusters ShARC and Iceberg (assuming that you don't need a GUI).

On ShARC and Iceberg can:

  1. Start a tmux or GNU Screen session on a login node;
  2. Start an interactive job using qrshx or qrsh;
  3. Disconnect and reconnect from the tmux/Screen session (either deliberately or due an issue with the SSH connection to the cluster);
  4. Create additional windows/panes on the login node for editing files, starting additional interactive jobs etc, watching log files.

Starting tmux on worker nodes is also useful if you want to have multiple windows/panes on a worker node but less useful if you want to disconnect/reconnect from/to a session as if you run qrsh a second time you cannot guarantee that you will be give an interactive job on on the node you started the tmux session from.

However, note that you can have nested tmux sessions (with <prefix><prefix> <key> used to send tmux commands to the 'inner' tmux session).

Warning: many clusters have multiple login nodes for redundancy, with only one being the default active login node at any given time. If the active login node requires maintenance then logged-in users may be booted off and long-running processes may be terminated (before the system administrator makes a 'standby' login node the currently active one). Under such circumstances your tmux/Screen session may be killed.

Being a good HPC citizen

Your interactive job (on a cluster worker node) will be terminated by the cluster's Grid Engine job scheduler after a fixed amount of time (the default is 8 hours) but your tmux/Screen session was started on a login node so is outside the control of the cluster and will keep running indefinitely unless you kill it.

Each tmux/Screen session requires memory on the login node (which is used by all users) so to be a good HPC citizen you should:

  • Kill your tmux/Screen session when no longer needed (tmux/Screen will exit when you close all windows)
  • Only start as many tmux/Screen sessions on the login node as you need (ideally 1)
  • Exit your interactive Grid Engine job (on a worker node) if no longer needed as then others can make use of the resources you had been using on this node.

Tip: with tmux you can ensure that you either reconnect to an existing session (with a given name) if it already exists or create a new session using:

tmux new-session -A -s mysession

This should help avoid accidentally creating more than one tmux session.


NB the recordings of terminal sessions shown were created using ttyrec and ttygif then converted to .webm videos using ffmpeg.

Software Carpentry and Data Carpentry at the University of Sheffield!

The University of Sheffield is now a Software Carpentry Partner Organisation, allowing the Research Software Engineering and Library teams to start organising Software Carpentry and Data Carpentry workshops. These are designed to help researchers develop the programming, automation and data management skills needed to support their research. Workshop dates are to be announced shortly.


Edit:

Our first Software Carpentry workshop is scheduled for 16th and 17th August!


Software Carpentry and Data Carpentry logos

Addressing the training needs of researchers with regards to programming

As more researchers realise they can produce better quality research more quickly if they have some coding and data management skills under their belts universities will need to ensure that training in these areas is accessible to those that need it.

Academic institutions will most likely already have courses for teaching highly-specialist subjects (such as how to use the local HPC cluster) but for the more generic aspects of research software development and data management there are several obvious choices:

  • Develop and deliver bespoke materials;
  • Buy in to commercial training packages;
  • Point researchers towards free online resources;

However, there is also a fourth option: team up with the Software Carpentry (SC) and Data Carpentry (DC) not-for-profit organisations to deliver on-site, interative workshops based on open-source materials that have been refined by a large community of SC and DC instructors.

Software whatywhaty?

Software Carpentry has developed discipline-agnostic workshop material on:

Data Carpentry lessons look at data management and processing within the context of a specific domain (such as ecology or genomics), focussing on areas such as:

  • the command line;
  • data cleaning and filtering using OpenRefine;
  • data processing and visualisation with Python or R;
  • cloud computing
  • GIS

What form do the workshops take?

The Software and Data Carpentry organisations ask that accredited instructors delivering 'branded' workshops adopt a fairly progressive teaching style:

  • Workshops typically last two days and include four lessons (e.g. the unix shell, Python, version control and databases).
  • There's lots of live coding: the instructor and students gather together in a room with laptops and a projector and all present go through a number of examples interactively. Students use their own laptops to ensure that they're able to continue where they left off at the end of a workshop. Instructors can and do make mistakes when doing live coding; students can then learn from these mistakes and may grow in confidence on learning that pros make mistakes too.
  • Instructors try to elicit responses from students and use quizes to gauge comprehension and keep students focussed.
  • Software Carpentry has a code of conduct and tries to ensure that all lessons delivered under its banner are as inclusive as possible.

What's happening at the University?

The University is now a Software Carpentry Partner Organisation so can run many workshops per year using the Software Carpentry and Data Carpentry branding. We could run workshops without the branding but Software and Data Carpentry are now familiar names to researchers (and potentially employers) and by working closely with those two organisations we become part of a global network of instructors with which we can share ideas and materials.

The RSE team and Library collectively now have five accredited Software and Data Carpentry instructors: Mike Croucher received training some time ago and in March Tania Allard and I from the RSE team plus Jez Cope and Beth Hellen from the Library's Research Services Unit participated in instructor training in Oxford.

Software Carpentry Instructor Training session

The four of us spent two days learning about the SC/DC teaching style, what makes for an effective instructor and got to practise several aspects of workshop development and delivery. I must thank the instructors on the training course (Mateuz Kuzak and Steve Crouch) plus Reproducible Research Oxford for hosting and organising the event.

We are now planning our first Software Carpentry and Data Carpentry workshops. These are to be held later in the summer.

Keep an eye on this blog, the RSE-group@sheffield.ac.uk mailing list and @RSE_Sheffield for dates!

Coffee and Cakes Event

The RSE Sheffield team would like to thank everyone for attending the second Coffee and Cakes Event that was held last Wednesday (31/05/2017). The event provided a great opportunity to hear from researchers all around the University about the software engineering challenges faced within their projects. We hope to use the insights gained from the event to help improve your research workflow in the future.

To get updates on future RSE events, please join our RSE Google Discussion Group.

First GPU Computing Seminar - Towards achieving GPU-native adaptive mesh refinement

amr

We've kicked off our first GPU Computing group seminar this year with a talk by Ania Brown from Oxford e-Research Centre titled "Towards achieving GPU-native adaptive mesh refinement" on 30th of May 2017. Adaptive mesh refinement (AMR) is a method for reducing memory cost by varying the accuracy in each region to match the physical characteristics of the simulation, at the cost of increased data structure complexity. Ania described the optimisation and software challenges that need to be considered when implementing AMR on GPUs, based on her experience working on a GPU-native framework for stencil calculations on a tree-based adaptively refined mesh as part of her Master degree.

There's no offical GPU Computing talks in June but we highly recommend the upcoming talk "From Democratic Consensus to Cannibalistic Hordes: The Principles of Collective Animal Behaviour" by Prof. Iain Couzin on the 29th of June 2017.

Links and More Information

For presentation slides and more information on both of these talks, visit the GPU Computing seminars page.

Research Software Engineer in High Performance Computing

A job opportunity within the RSE Sheffield group is available under the job title of "Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling".

The purpose of the Research Software Engineer post is to enhance The University’s capability and expertise in developing Research Software. This role will be based in the newly formed Research Software Engineering group, which aims to improve all aspects of research software including reproducibility, usability, efficiency and correctness.

The primary function of this role is to support the EC funded CompBioMed project. A user-driven Centre of Excellence in Computational Biomedicine, to nurture and promote the uptake and exploitation of high performance computing within the biomedical modelling community.

The post is fixed-term with an end date of 30 September 2019. The deadline for applications is 19th June 2017.

Links and More Information

Jobs.ac.uk posting

University of Sheffield Application Link

Sheffield Code First:girls

python1

Anyone working on any STEM area knows, as a fact, that we are facing a digital and technical skills gap. In the governement's digital strategy report last year it was highlighted that we would need an extra 745,000 workers with digital skills by 2017, as 90% of jobs require digital skills to some degree. On top of this, many technical areas suffer a diversity deficit (cultural and gender based). With the UK being among the European countries with the smaller number of female professionals in STEM areas.

Being this a rather complex problem, many people and organisations work hard to provide a solution to this issue. Some approaches adopted by such individuals and organisations are:

  • Encourage the hiring of highly skilled immigrants
  • Provide wider support for underrepresented minorities
  • Leverage inclusive and encouraging environments for those who demonstrate an interest in STEM areas
  • Support and train those willing to make a career change/or follow non-traditonal career paths
  • Approach the new generations and provide them with useful skills that would help them make an informed career choice

... and the list goes on.

I do believe, however, that the most fruitful approach is to work with the upcoming generations and provide them with useful technical and personal skills early on. This would not only make them better qualified for their future but would enable them to make informed decisions with regards to their professional future.

Code First: girls is a multi-award organization that aims to tackle the gender imbalance in three ways: training women, building a strong and supportive community, and helping companies to train, recruit, and retain their female force.

Belonging to a minority within STEM has lead me to take an active role as an equality and diversity ambassador, which eventually lead me to volunteer as a Python instructor for the Code First courses.

Over the course of 8 weeks we teach and guide groups of around 30 women with various levels of coding experience in CSS/HTML, Python, or Ruby. These courses are a mixture of in-person classes and self-learning, at the same time the ladies involved work in teams of 2-4 people to build a project of their own interest.

python1

The idea behind these workshops is rather simple: train people and provide them with practical use of the skills they are learning. Having as a final objective to develop a fully deployed RESTful app. But the whole CF:girls thing goes way beyond that. Over those 8 weeks the girls form a strong, motivating, and supportive community, in which they can acquire new skills, meet like-minded people, learn from other women working in STEM areas, and even attend external women in tech events!

python1

I find rather interesting the mixture of apps and projects pursued, as well as the high quality of the presented final products. But beyond that, I find this to be an excellent opportunity to give back to the amazing community that has adopted and welcome me as a professional in a STEM area. Thus I can say for sure I will be getting involved in more Code First events/workshops.

Coffee and Cakes Event

RSE Sheffield is hosting another coffee and cakes event on May 31st at 14:00 in the Ada Lovelace room on 1st floor of the Computer Science Department (Regents Court East). Attendance is free, but you need to register via this link.

Take the opportunity to come and have an informal chat about research software.

This event is a community event for anyone, not just computer science or members of the RSE team. If you work on software development are an RSE or simply want to talk about some aspect of software or software in teaching then come along.

Building Linux GPU Code with NSIGHT in Windows

Why would you possibly want to build and execute CUDA GPU applications within NSight Eclipse for Linux within Microsoft Windows? Well if you use windows as your main OS there are plenty of reasons but the most obvious is that you may be developing cross platform code and want to build and test it without dual booting. If you are thinking about virtual machines then forget about it. Most (except some very expensive enterprise options) do not have the ability to access a GPU device (e.g GPU pass-through) from within a virtual machine.

The purpose of this post is to describe how to install the necessary tools to permit local GPU development inside the Linux NSight IDE from within Windows. The advantages of which are not only cross platform development but also the ability to locally develop in powerful Linux IDE with remote execution and graphical debugging. This is particularly helpful if you want to execute or debug your code on a HPC system (like Sheffield's ShARC system) from Windows. The post focuses on the use of the new Windows 10 Linux subsystem, however you could use the approach to install CUDA tools on a lightweight Linux virtual machine. The concept is the same either way. i.e. build and debug locally execute remotely.

Configuring the Linux Windows Subsystem for CUDA compilation

The Windows 10 subsystem for Linux is available in the anniversary update. You can install it from the "Turn on or off windows features" dialogue. It is listed under "Windows subsystem for Linux (beta)". This alone is not enough to build our GPU applications as we will need to install CUDA. A normal CUDA install will require a local GPU and the installation of a CUDA compatible graphics driver. Fire up the Windows Bash Shell (or a Linux virtual machine). You can then use the following commands to install the CUDA toolkit without installing a graphics driver. This will install the core NVIDIA CUDA compiler (nvcc) and NSight. You can update the CUDA_REPO_PKG variable to install a different CUDA version.

sudo apt-get update
CUDA_REPO_PKG=cuda-repo-ubuntu1404_8.0.44-1_amd64.deb
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG
sudo dpkg -i $CUDA_REPO_PKG
sudo apt-get update
sudo apt-get install -y --no-install-recommends cuda-core-8-0 cuda-cudart-dev-8-0 nsight

You can now create a symbolic link to a generic CUDA install. This will permit the addition and fast swapping of different CUDA versions.

sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda 
export PATH=$PATH:/usr/local/cuda/bin

Note: if you want the CUDA bin location to be persistently on the PATH (after you reboot the Bash shell) then you will need to add the export PATH line to your .bashrc profile. Test that the install was successful by running nvcc.

nvcc --version

This should give you some information on the nvcc version. e.g.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

The CUDA toolkit is now installed so you can build (but not execute) CUDA GPU programs using the Linux bash shell.

Graphical editing with Nsight IDE

To be able to run a graphical NSight IDE from within the Windows subsystem for Linux you will need to be running a X server within Windows. You can install the free XMing application for this purpose. If you would rather use a Linux virtual machine then you can avoid this step as the virtual machine will most likely have an X server included. The advantage of the Windows subsystem approach is that it is very lightweight. From within your Bash terminal you will need to set the following environment variable.

export DISPLAY=:0

The display variable is an environment variable passed to graphical applications. In this case, the value of :0 it tells the application to use the first display on the local system (our XMing server in this case). If you want to make this environment variable change permanent then you should add it to your .bashrc profile. You can now run the NSight application from Bash.

nsight

Glorious isn't it. Within NSight we can create a new CUDA project which will compile using the local CUDA install. In order to remotely execute and debug you can use the "C++ Remote application" run configuration. This will require SSH access to a suitable Linux machine with a GPU and CUDA installed.

Future blog posts will cover how remote execution and debugging can be achieved on the University of Sheffield ShARC system. ShARC has a typical of job based HPC system which encourages job submission rather than execution of code on worker nodes via SSH logins.

Summary of Bash Profile Changes

I added the following to my .bashrc profile (located in the home directory) to ensure that NSight could be launched straight after starting the Bash shell in Windows.

# add cuda bin dir to path
export PATH=$PATH:/usr/local/cuda/bin

# export the display environment variable
export DISPLAY=:0

Spark and Scala on Sheffield HPC systems

As part of our support for a Large scale machine learning MSc course in Computer Science, the Sheffield RSE group put together a tutorial for how to use Spark and Scala on Sheffields HPC systems. We are sharing with the rest of the community in case its useful to you https://github.com/mikecroucher/Intro_to_HPC/blob/gh-pages/README.md

Its for people whove never used a HPC system before. By the time theyve finished, they are able to submit their own Spark jobs to the HPC cluster. If anyone is interested in us re-running this as a workshop (it takes around 2 hours) let us know.

Some notes on our current implementation of Spark on HPC:-

  • We are currently restricted to jobs that run on one node. This is because Sheffields HPC clusters are not traditional Hadoop/Spark clusters and so some level of integration is required between Sun Grid Engine and Spark. We've only managed to get as far as implementing this across single nodes at the moment.

  • One way weve fudged this is to make sure that we provide our students with access to nodes with a LOT of memory 768 GB per node in fact, 12 times as much as you get on a normal node on ShARC or Iceberg. We are experimenting with allowing others access to our kit via a contribution based model. See http://rse.shef.ac.uk/resources/hpc/premium-hpc/ for details.