This blog post provides a copy of my recent talk at the Royal Microscopical Society’s Virtual Data Analysis in Atomic Force Microscopy Meeting (Event Page): How can we make AFM data analysis more open and reproducible?
I’ve also put together what I hope will be some useful links:
Two challenges PhD students encounter are:
To help address the above for a particular case I spoke with a new cohort of PhD students today (from the Speech and Language Technologies Centre for Doctoral Training) to explain what high-performance computing (HPC) is and why they might care. The hope is that they will now be able to include HPC in their training plans once they recognise problems that HPC might be well-suited to helping with.
This is going to be my personal perspective on open source, and hopefully encouragement for you to get involved with Hacktoberfest, a month-long event encouraging participation and coding, which the RSE Team at the University of Sheffield are supporting. I hope to introduce some of the legal, technical, economic and academic aspects of open source – providing context for experts and a gateway for the newly interested. Note: I have no legal training, consult someone who does if you need to! But, my perspective, so we’re going back to when I was first exposed to open source, back around the millennium…
I formally led a software development team for the first time as part of the Scottish Covid Response Consortium’s (SCRC) contribution to the Royal Society’s Rapid Assistance in Modelling the Pandemic (RAMP) initiative. The software is Simple Network Sim and was created by Jess Enright, the overall and academic lead for the project and a lecturer at the University of Glasgow. I felt a bit out of my depth to start off with, but the team were fantastic (and mostly way better at Python than me) - I hope my leadership helped them to do their best work. I can’t think of another time I’ve learned so much in three months.
This post is about management and leadership on the project.
I’m personally a big proponent of using
git rebase. Whilst it can be used in place of
git merge to combine branches, it operates differently. This can easily lead to people misunderstanding how it works, which can subsequently lead to problems.
The classic problem is that after performing
git rebase, you attempt to
git push and your changes are rejected as your local HEAD is behind the remote HEAD. Helpfully git bash suggests you should use
git pull first, to include the remote changes, before you
git push. However, this is not what you want to do. The rebase has intentionally changed history which leads to this disagreement between the local and remote copies of the branch, performing
git pull at this stage will (redundantly) merge back in the the commits that were just rebased.
Including High-Performance Computing clusters
It appears that recent cyberattacks on various European High-Performance Computing (HPC) clusters were in part facilitated by bad actors acquiring ‘SSH keys’ of researchers with accounts on multiple clusters then using these keys to hop between HPC clusters. Secure SHell (SSH), as the name implies, can be a very secure way of starting a remote shell (command-line session) on remote Linux machines (e.g. HPC clusters) and the underlying protocol is also useful for copying files to/from remote machines (via the SCP, SFTP and rsync tools), but there are several poor practices that can limit the security of remote access and file transfers. Given the recent attacks it makes sense for staff and students who access remote Linux machines to consider these, even if the remote machines are not managed by the University of Sheffield as poorly managed keys/passwords could allow others to impersonate you, which could result in further cyberattacks, data theft/loss and reputational damage to you and the University.
Following the success of our 2019 event, the University of Sheffield and NVIDIA are pleased to announce that we will be hosting a 2020 GPU Hackathon as part of the NVIDIA international GPU Hackathon Series.
This event will take place between 27 – 31 July 2020, most likely as an online event unless government restrictions around COVID-19 are significantly altered.
As software engineers, we were lucky to have a running start at shifting to remote working practices as part of our institutional response to COVID-19. We have very little physical infrastructure in the team and can access most of what we need to do through the internet, even each other through video calls, emails and text chat. We can rely on our I.T. Services to provide a virtual network (VPN) and virtual machines for hosted applications. However, there is one area where I feel like we’ve made our own luck - collaborating on developing software. A big part of our ethos is to identify good practices, put them into action and share them with others. So I’m going to talk a bit here about how we use git and GitHub not just to track changes and contributions to code, but to manage projects. And, as it turns out, this is actually pretty easy if you have a little time to invest. The difficulty, for many of us, is finding that time, but that’s kind of another story!
Academics and RSEs have been very busy over the last few weeks coming up with creative solutions to move teaching and training online. My own under/post graduate GPU module Parallel Computing with GPUs has posed a significant problem. The course was designed to be run within the University of Sheffield’s High Specification teaching lab which is equipped with CUDA enabled GPUs. Moving this course online clearly requires a mechanism for students to access GPUs, without presenting a significant change to their current working practice (e.g. Visual Studio development in Windows). Ideally this cold be done using the university’s high performance computing facilities but the provision for GPUs is currently insufficient to support 100 students (although new GPUs are on the way). The obvious solution is to move this to the cloud however there are a number of challenges to solve which are the topic of this blog post which also serves as a reference for when I forget all of this in 6 months time.
Note: This blog specifically targets AWS as it is what I have used on the InstanceHub website which is part of the solution to Problem 3.
In response to the need for increased remote working as a result of the current Covid-19 situation, we’re going to be doing our Code Clinics remotely using Google Hangouts for now.
Code Clinics remain a great way to get help with writing and maintaining code and with reduced physical access to labs, perhaps now is good time to focus on this aspect of research. We can offer advice on working with and executing code remotely.
For queries relating to collaborating with the RSE team on projects: email@example.com
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.