RSE Computing Seminar and Coffee & Cake event 19th June 2018 at 12:00

Both events will be held at COM-G12-Main Lewin, Computer Science Department (ground floor) on the 19th of June, starting from 12:00.

12:00 - RSE Seminar: Tackling the learning curve of scientific programming

By: Dr. Patricio Ortiz

Talk Abstract:

Programming is part of the curriculum of students of computer science, and it will be complemented with other related subjects to make them knowledgeable on the subject. The situation of a science or engineering student is the opposite; typically they have one course to learn one language, and that language is usually not the one they will first face in real-life situations. This situation has occurred for decades, and it is likely not going to change, but there is a real need to better prepare science and engineering students to face the very steep learning curve of having to start programming as part of an ongoing project or their thesis. Universities like ours offer excellent facilities like the HPCs supplied by CICS, yet the reality is that many students and young researchers may have never used a Unix based system, let alone a parallel system.

The book I wrote, "first steps in scientific programmings" aims at facilitating the passage through the learning curve by providing tips based on years of experience and my interaction with students and brilliant young researchers who did not have the opportunity to learn anywhere else the challenges which programming in a scientific environment involve.

I will briefly describe the points which I think are more important to emphasise, points which I've confirmed as important by interacting with other experienced researchers at the U. of Sheffield, who are trying to provide support for the people starting in this field.

Link for the book:

https://sites.google.com/view/fsscientificprogramming/home

A supportive link:

https://sites.google.com/a/sheffield.ac.uk/rcg/my-blog/research-computing-notes/firststepsinscientificprogramming

Please Register using Eventbrite

13:00 - Coffee and Cake event

The Coffe and Cake event is open to everyone and offers a great opportunity to further discuss the topics raised by our speaker. In addition, if you have any particular research software issues or would like to have a general discussion about research software or software in teaching, please come along for an informal chat with the RSE team.

Book: First steps in Scientific Programming

I have just published this book in electronic and print formats ( iBooks and Amazon). This work aims at providing science and engineering students/post-docs a series of concepts found in real-life scientific projects, including concepts of programming, code testing, internal representation, rounding errors, tricks of the trade, advice on best practice to store data for long-term usage as well as a broad introduction to the Unix environment which they will find when coding for HPC and cloud computing. I do not focus on any individual language but the elements common to all of them. I cover the value of designing the code carefully, planning for future use and scalability, either using flowcharts or just "generic code". I give an overview of the tools available to tackle different problems. Sections as diverse as working with existing code and working with time are covered. This book is intended to ease the learning curve for those starting, not for the seasoned scientific programmer. Click here for a more detailed description including a table of contents.

Patricio F. Ortiz is an RSE based on the Department of Automatic Control and Systems Engineering, The University of Sheffield, mostly involved with the design and implementation of the architecture for the Urban Flows Observatory project. Its objective is to monitor a number of variables in a urban environment, from whether variables, air pollution to behaviour of construction materials. Its data can potentially be correlated with data dealing with human activities and human health issues, and it could involve several disciplines in the long run. The Urban Flows Observatory is mostly financed by EPSRC and UKCRIC. Sheffield is one of a handful of cities involved in this nationwide effort.

SSI Fellowship success for Sheffield

The Software Sustainability Institute(SSI) is a cross-council funded group that supports the research software community in the UK. It has championed the role of the Research Software Engineer and has led national and international initiatives in the field.

One of the most popular activities undertaken by the SSI is their fellowship program. This competitive process provides an annual cohort of fellows with £3,000 to spend over fifteen months on a project of their choice. Competition for these fellowships is fierce! Just like larger fellowships, applicants must get through a peer-reviewed application process that includes written proposals and selection days.

I am extremely happy to report that Sheffield has won, not just one, but three SSI Fellowships this year. The only institution to match us was UCL, home of one of the first RSE group in the country. Here's a brief statement from each Sheffield fellow explaining how they plan to use their funds:

Tania Allard

Nowadays, the majority of research relies on software to some degree. However, in many cases, there is little focus on developing scientific software using best development practices due to a number of reasons such as the lack of adequate mentoring and training, little understanding of the requirements of the scientific code, and software being undervalued or not considered as a primary research output. This has changed over time with the emergence of RSEs (Research Software Engineers) just like myself. But certainly not every university or institute has an RSE team, neither every discipline is represented in the current RSE community. I plan to use this fellowship to develop an RSE winter school covering not only technical skills but also some of the craftsmanship and soft skills needed when developing a significant amount of scientific code. Also, this winter school will help to diversify the RSE pool by focusing on underrepresented groups within the community (e.g. gender, age, scientific disciplines, universities without RSEs) while disseminating best software practices among a number of disciplines.



Becky Arnold

I'm planning use the fellowship funds to bring external speakers in to talk to the astrophysics group, with the goal of improving the style, efficiency and sustainability of our coding. As physicists, as I imagine in many fields, we are largely taught to code to get the things we need to be done completed as quickly as possible, with little regard for the quality of the code itself. We are taught how to code, but not how to code well. I want to give us the opportunity to improve in that. Also I hope to change the way we think about coding, from a disposable stepping stone used to further research as quickly as possible to a fundamental part of the science itself.



Adam Tomkins

I am part of the Fruit Fly Brain Observatory project, with the aim to open up neurological data to the community, in an accessible way. Part of the issue with open data sharing is the vast amount of custom storage and format solutions, used by different labs. With this fellowship, I will be holding training events for both biologists and computational modelers on how to use the latest open data standards, demonstrating how using open software can generate instant added-value to data, with a larger community of tools and platforms.


RSE at Sheffield

When we set up the Sheffield RSE group, one of our aims was to help cultivate an environment at Sheffield where research software was valued. We do this by providing training events, writing grants with academics, consulting with researchers to improve software, improving the HPC environment and anything else we can think of. Of course, correlation does not imply causation but we like to believe that we helped our new SSI Fellows in some way (the SSI agrees) and we are very happy to bask in their reflected glory.

Code in the Academy: Rebecca Senior

In this interview series, the RSE team talk to University of Sheffield students about the role of coding in their research.

In out first of the series, we speak to Rebecca Senior.

She's in the final year of her PhD in the Department of Animal and Plant Sciences studying the interactions between land-use and climate change in tropical rainforests and what this means for biodiversity conservation. She's currently looking into how deforestation affects forest connectivity, so lots of spatial analyses. She mainly codes in R but also I dabbles in a bit of Python.


How did you first get into coding?

What motivated you to learn? How and what did you start learning?

When I was an undergrad we did our statistics in R commander, which is a GUI (Graphical User Interface) for R. A supervisor wisely told me that there’d come a point when I couldn’t do what I needed using R commander alone, so I spent the summer before my final year grappling with R and cursing it profusely until I was somewhat competent.

What are your favourite coding tools?

I’m pretty in love with the R + RStudio + tidyverse combo. RStudio is an integrated development environment (IDE) for R, which basically makes coding look far less hideous, and which allows you to write, save and run code in a more efficient way. The tidyverse is “an opinionated collection of R packages designed for data science”. The various packages make data management/analysis/presentation much more intuitive for many people.

How do you think coding has helped you in your work?

For one thing it got me an internship at UNEP-WCMC (UN Environment World Conservation Monitoring Centre) after I finished my undergrad, and it subsequently helped me to get this PhD position! More fundamentally, coding has sped things up, enhanced the reproducibility of my work and my ability to collaborate with others, and has helped me tackle complex problems that I couldn’t have done manually.

Tell us about your favourite coding achievement.

I wrote a teeny R package to estimate the time of sunrise and sunset based on date and location. It’s actually a really simple implementation of solar calculations developed by NOAA for MS Excel, but it was instrumental to one of my thesis chapters and it was my first experience of making an R package. Check it out here: https://github.com/rasenior/SolarCalc!

How do you think these skills can be better supported in academia?

Since all researchers (students and supervisors alike) are judged primarily on their publication output, encouraging students to publish software would be an obvious place to start. In my field, the journal Methods in Ecology and Evolution is a very popular option for people seeking to publish R packages.

That said, not all coding results in something publishable and, in any case, the sharing of software via peer-reviewed publications is not always a good measure of its usefulness anyway. I think students should be encouraged to share their coding achievements with peers, and more broadly via online platforms such as GitHub and Gist. Software has made its way into academic impact reporting, so perhaps coding should also be more valued within progress reports and theses?

It would be great to see the teaching of coding broaden beyond statistics, especially within the life sciences. There is so much more to coding than conducting t-tests! With continuing advances in technology we have to grapple with much bigger and more varied datasets, analyse them in sometimes very complex ways, and present the methods and results in a clear and succinct format, all the while maintaining reproducibility as much as possible. That’s a whole heap of coding skills that are very infrequently taught!

How do you see coding fitting in with your future career?

Whether I stay in academia or not, I will continue coding. I hope that my coding skills will help me secure a research position post-PhD. I’m not sure yet exactly where my research will take me, but I hope it involves developing R packages and making pretty figures in ggplot.

Any coding advice for new PhD students?

Don’t be afraid to set aside time for learning something new. Learning takes time – accept that and incorporate it into your work schedule. You’re still a student and some of the skills you learn may open doors you didn’t even know were there.

CodeFirstGirls meets Hacktoberfest

It is that time of the year again! The autumn-winter courses for CodeFirst: girls are in full swing at Sheffield, Manchester and many other locations all over the UK.

As the lead instructor of the Python courses part of my 'job' is to make sure that everything runs smoothly and that the gals make the most of the course. Since the courses run only over 8 weeks and we have loads of ground to cover I decided to improve the way the instructors communicate and plan the course as well as how we deliver the contents for the course.

Implementation

These are some of the approaches we are currently using in our courses:

  • GitHub: I use Git and GitHub all the time for all my projects and tasks. So it only made sense to make it a central point of contact as well as the main place for all the additional material to be kept into. It has worked wonders, all of the organisation stuff is there, we make sure that the additional materials developed by us are peer-reviewed and it makes it makes all of our lives easier.
  • GitKraken: ok ok I know many people would prefer teaching Git using the command line, but I have used both command line and GUI approaches and I think you first need to know your audience to better understand which approach to use. In this case GitKraken was my weapon of choice... powerful, intuitive, it can be easily integrated with GitHub, BitBucket and GitLab, and did I say beautiful? Yes, which makes it suitable for visual learners.
  • Learning by doing: I am a firm believer of learning by doing. It sometimes is the best approach to get the grips of things. How do you make the git-add-commit-push workflow a natural habit? Exactly, by doing it over and over again. So we are making sure that every session would include bits and pieces where the gals had to generate pieces of code, push them to their repos, collaborate with others and or create pull requests.
  • Feedback on the fly: As a Sofware/Data carpentry intructor some of the things I love the most is the use of post-its. That way you and the helpers know straight away who is struggling (red post-it) and how is not (green post-it) that way the learners get help instantly and the main instructor gets visual clues on how fast or slow to proceed. At the end of the day the learners write on the post-its something they liked or learned and something that could be improved or that they struggled with. So I decided to give this a go at CFG and so it has helped us a lot so far.
  • Active engagement: one of the key things that makes initiatives such as CodeFirstGirls work is not the fact that we teach them how to code, online courses do that. But the whole thing is an excellent community building activity, the gals find common minded people, are exposed to role models, and feel empowered to continue their career in tech. That is the beauty of what we do. So it is only fair that we engage with them. We have our own #hashtag (Go now and look for #ShefCodeFirst in Twitter), we have guest speakers, slack channels, our very own course website (obvs in GitHub pages) and we try to open their eyes to the wider tech and open source community. Also we have many Octocat stickers to give away!!!


Hacktoberfest

I mentioned before that we try to keep the gals actively engaged throughout the course as well as to integrate them to the wider community. And what a better way to do this but getting the gals involved in Hacktoberfest!!! We ere a bit tight on time but I thought it was worth trying to get some of the girls involved in something like this.

By doing so the girls would get the following benefits:

  • Learn how to contribute to open source projects
  • Integrate to the open source community
  • Get extra coding practice
  • Get extra git practice (4 Pull Requests were needed to complete this)
  • If completed they would get a special edition t-shirt (Whoop whoop)

That meant extra work for me: find specific tasks and projects for them to contribute to, merge pull requests bonanza, and prepare extra gifs and guides on how to complete the tasks. But it was totally worth it!!! I was more than delighted to see all the PR coming into our own repo as well as getting all the notifications from the girls getting involved in Hacktoberfest.

I know not everyone got involved as many have PhDs, Master's, dissertations, and a life to look after. But I am massively proud of them all. So many of our gals had never used Git or GitHub before and now they are collaborating like pros.

Talk about motivation :) And if you want to keep up to date with the end of course projects they will be presenting in 5 weeks time keep an eye on Twitter!

Numpy plus the Intel MKL for fast fourier transforms

One of the nice things about using (Ana)conda's default builds of Numpy, Scipy and Scikit-Learn is that they are compiled against and depend on Intel's Math Kernel Library (MKL), which provides multi-threaded implementations of functions for linear algebra that satisfy the BLAS and LAPACK APIs.
This Numpy build is therefore able to distribute the work involved in say a matrix multiplication operation between CPU cores without having to spawn multiple processes and migrate/duplicate data between the address space of those proceses. We can see that multiple CPU cores are being used by comparing the 'real' (wall-clock) time of a simple matrix multiplication example to the 'user' time, which is the amount of time our code spend running on all CPU cores.

$ conda create -n default_env python=3.6 numpy
$ source activate default_env
$ time python -c 'import numpy as np; x = np.random.random((5000, 5000)); x @ x'                                       

real    0m2.505s
user    0m6.640s
sys     0m0.067s

As you can see, the 'real' time is much less than the 'user' time, which would only be possible if multiple cores were used.

The MKL also provides Fast Fourier Transform FFT functions but be warned that by default the Anaconda build of Numpy is not able to use multiple threads for FFT functions!

$ time python -c 'import numpy as np; shape = 1 * [50000000]; x = np.random.choice(a=1, size=shape); np.fft.ifftshift(np.fft.fftn(np.fft.fftshift(x)))' 

real    0m4.821s
user    0m4.419s
sys     0m0.400s

$ source deactivate

However, the Intel Python Distribution's build of Numpy does use multiple threads for FFts.
The IPD is a separate set of conda packages where things like numpy, scipy, scikit-learn etc have been optimised to make better use of Intel libraries such as the MKL and Data Analytics Acceleration Library (DAAL).

$ conda create -n ipd_env python=3.6 numpy
$ source activate ipd_env
$ time python -c 'import numpy as np; shape = 1 * [50000000]; x = np.random.choice(a=1, size=shape); np.fft.ifftshift(np.fft.fftn(np.fft.fftshift(x)))'

real    0m1.305s
user    0m3.545s
sys     0m0.899s

$ source deactivate

Note that the wall-clock time for the IPD example is less than the total time spent executing user code on all CPU cores i.e. multiple cores are being used.

ReproHacking at Opencon London 2017 Doathon

Building on the success of last year’s #Reprohack2016 for the Berlin OpenCon satellite event, I rejoined the team of organisers (Laura Wheeler, Jon Tennant and Tony Ross-Hellauer ) and teamed up with Peter Kraker to develop the hackday for this year’s OpenCon London 2017.

To allow better reflection on this year's theme, “Open for what?”, we expanded the format to two tracks, opening up the scope for both projects and participants. One track retained the ReproHack format from last year, the other, a broader track, offered the opportunity for leads of any type of open science project to submit them for work. Projects were not constrained to coding and you didn't have to code to take part in the session - anyone with an interest in creative contribution to open science in whichever capacity was welcome.

On the day, after a round of introductions and sharing our motivations for attending, we reviewed the submissions and knuckled down.


ReproHacking

The original ReproHack was inspired by Owen Petchey’s Reproducible Research in Ecology, Evolution, Behaviour, and Environmental Studies course, where students attempt to reproduce the analysis and figures of a paper from the raw data, so we wanted to attempt the same. They take a few months over a number of sessions though, so, given our time constraints, we focused on reproducing papers that have also published the code behind them. This year we had a whole day, which gave us more time to dig deeper into the materials, moving beyond evaluating them for reproducibility, to how understandable, even how reusable they were. While fewer, we still had some excellent submissions to choose from.

I'm pleased to report that two of the three papers attempted were succesfully reproduced!


***

I was particulalrly impressed with the paper Andrew Ajube tackled:

The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles Piwowar et. al

Under very minimal supervision and never having used R or Rstudio before, he managed to reproduce an analysis in an rmarkdown vignette]. I think this speaks volumes, to the power of the tools and approaches we have freely available to us, of following best practice (well described in this epic Jenny Bryan blog post on project-oriented workflows), the effort the producers went to, and of course, genuine engagement by Andrew. It can work and it is rewarding!


***

I worked with Marios Andreou on a very well curated paper, submitted by Ben Marwick.

The archaeology, chronology and stratigraphy of Madjedbebe (Malakunanja II): a site in northern Australia with early occupation Clarkson et. al

The paper offered two options to reproduce the work. A completely self-contained archive version in a docker container, which Marios spun up and reproduced the analysis in, in no time. I opted for the second option, installing the analysis as a package. It did require a bit of manual dependency managing but, this was documenteted in the analysis repository README on github. This meant that all functionality developed for the analysis was available for me to explore. Presenting the analysis in a vignette also made it’s inner working much more penetrable and allowed us to interactively edit it to get a better feel of the data and how the analysis functions worked. Ultimately, not only could we reproduce the science in the paper (open for transparency), we could also satisfy ourselves with what the code was doing (open for robustness) and reuse the code (open for resuse). The only step further would be to make the functionality more generalisable.

At the end of the session we collected some feedback about the experience, reflecting on reproducibility, the tools and approaches used, documentation and reusability. Here's the feedback we had for Ben's work.

As calls for openness are maturing, it's good to push beyond why open? to indeed, open how? open when? open for what? Different approaches to how a study is made "reproducible" has implications on what the openness can achieve downstream. It's probably a good time to start clarifying the remit of different approaches.


Do-athoning

On the do-athon side there were a couple of cool projects. Peter and Ali Smith worked on Visual data discovery in Open Knowledge Maps, adapting their knowledge mapping framework, Head Start, to the specific requirements of data discovery.

Tony, Lisa Mattias and Jon continued work on the, open to anyone, hyper-collaborative drafting of the "Foundations for Open Science Strategy Development" document, started at the OpenCon 2017 Do-athon.


Agile hacking

One thing I love about hacks is that you never know quite what skills you’re gonna get in the room. In our case, we got scrumaster Sven Ihnken offering to help us navigate the day. We’ve actually been agile working for the passed few months with the nascent shef dataviz team and I find it a productive way to work. So an agile hack seemed a worthy experiment. I personally thought it worked really well. It was nice to split up the day into shorter sprints and review progress around the room half way. And Sven did do a great job “buzzing” around the room, keeping us focused and engaged and ultimately, getting all our tasks from doing to done!

***


At the end of the day, we shared what we'd worked on, and settled in for the main Opencon London evening event. As the event talks went through more traditional remits of OpenCon, from public engagement to open access to literature and data, it just reiterated to me that each strand of openness is yet another way to invite people in to science. For me, inviting people all the way in, into your code and data, your entire workflow, is the bravest and most rewarding of all!

A successful 2nd RSE conference

RSE Sheffield in the 2nd RSE conference

RSE17

The second RSE conference took place on the 7th and 8th of September 2017 at the Museum of Science and Industry MOSI. There were over 200 attendees, 40 talks, 15 workshops, 3 keynote talks, one of which was given by our very own head honcho Mike Croucher (slides here), and geeky chats galore.

RSE team members Mozhgan and Tania were involved in the organising committee as talks co-chairs and diversity chair (disclose: they had nothing to do with Mike's keynote). Also, all of the RSE Sheffield team members made it to the conference, which seems to be a first due to the diverse commitments and project involvement of all of us.

Once again, the event was a huge success thanks to the efforts of the committee and volunteers as well as the amazing RSE community that made this a en engaging and welcoming event.

Conference highlights

With so many parallel sessions, workshops, and chats happening all at the same time it is quite complicated to keep a track of every single thing going on. And it seems rather unlikely that this will change over time as it was evident that the RSE community has outgrown the current conference size. So we decided to highlight our favourites of the event:

  • The talk on 'Imposter syndrome' by Vijay Sharma: Who in the scientific community has not ever experienced this? Exactly! So when given the chance everyone jumped into this talk full of relatable stories and handy tips on how to get over it.

  • Another talk that seemed to have gathered loads of interest was that of Toby Hodges from EMBL on community building. This came as no surprise (at least to me) as RSEs often act as community builders or as a bridge between collaborating communities. Opposed to just being focused on developing software and pushing it into production.

  • During the first day the RSEs had the chance to have a go at interacting with the Microsoft Hololens. There was a considerable queue to have a go at this, and unfortunately, we were not among the chosen ones to play with this. Maybe in the future.

  • My hands-on workshop on 'Jupyter notebooks for reproducible research'. I was ecstatic to know the community found this workshop interesting and had to run this twice!!!

  • Also, I'd like to casually throw in here that I have been elected as a committee member for the UK RSE association, so expect to read more about this in this blog.

For obvious reasons I missed most of the workshops but Kenji Takeda's workshop on 'Learn how to become an AI Super-RSE' was another favourite of the delegates as this was run twice too!

Our workshop on Jupyter notebooks for reproducible research

Being a RSE means that I serve as an advocate of sustainable software development. Also, as I have discussed here before: I am greatly concerned about reproducibility and replicability in science. Which, I might add, is not an easy task to embark onto. Thankfully, there are loads of tools and practices that we can adopt as part of our workflows to ensure that the code we develop is done by following the best practices possible, and as a consequence, can support science accordingly.

Naturally, as members of the community come up with more refined and powerful tools in the realm of scientific computing we (the users and other developers) adopt some of those tools meaning that we often end up modifying our workflows.

Such is the case of Jupyter notebooks. They brought up to life a whole new era of literate programming: where scientist, students, data scientist, and aficionados can share their scripts in a human readable format. What is more important, they transform scripts into a conveying scientific narrative where functions and loops are followed by their graphical outputs or allow the user to interact via widgets. This ability to openly share whole analysis pipelines is for sure, a step in the right direction.

However, the adoption of tools like this brings not only a number of advantages but also presents a number of challenges and integration issues with previously developed tools. For example, the traditional version control tools (including diff and merge tools) do not play nicely with the notebooks. Also, the notebooks have to be tested as any other piece of code.

During the workshop, I introduced two tools: nbdime and nbval, which were developed as part of the European funded project: OpenDreamKit. Such tools introduce very much needed version control and validation capabilities to the Jupyter notebooks, addressing some of the issues mentioned before.

So in order to cover these tools as well as how you would integrate them within your workflow I divided the workshop in three parts: diffing and merging of the notebooks, notebooks validation, and a brief 101 on reproducibility practices.

Notebooks diffing and merging

During the first part of the workshop the attendees shared their experiences using traditional version control tools with Jupyter notebooks... unsurprisingly everyone had had terrible experiences.

Then all of them had some hands-on time on how to use nbdime for diffing and merging from the command line as well as from their rich html rendered version (completely offline). As we progressed with the tutorial I could see some happy faces around the room and they all agreed that this was much needed.

Need more convincing? This tweet showed up in my feed just this week And just earlier this week this tweet showed up on my feed:

Notebooks validation

The second part of the workshop focused on the validation of the notebooks. And here I would like to ask this first: 'How many of you have found an amazing notebook somewhere in the web just to clone it and find out that it just does not work: dependencies are broken, functions are deprecated, can't tell if the results are reproducible?

I can tell you, we have all been there. And in such cases nbval is your best friend. It is a py.test plugin to determine whether execution of the stored inputs match the stored outputs of the .ipynb file. Whilst also ensuring that the notebooks are running without errors.

This lead to an incredible discussion on its place within conventional testing approaches. Certainly, it does not replace unittesting or integration testing, but it could be seen as a form of regression testing for the notebooks. Want to make sure that your awesome documentation formed by Jupyter notebooks is still working in a few months time? Why not use CI and nbval?

Wrapping up

The closing to the workshop was a 101 on working towards reproducible scientific computing. We shared some of our approaches for reproducible workflows and encouraged the delegates to share theirs. We covered topics such as valuing your digital assets, licensing, automation, version control and continuous integration, among others.

The perfect close to a great RSE conference!


Just a few more things

Let me highlight that all the materials for the workshop can be found at: https://github.com/trallard/JNB_reproducible and that all of it is completely self contained in the form of a Docker container.

If you missed out on the conference and would like to see the videos and slides of the various talks do not forget to visit the RSE conference website.


Iceberg vs ShARC


TL;DR Around 100 of Iceberg's nodes are ancient and weaker than a decent laptop. You may get better performance by switching to ShARC. You'll get even better performance by investing in the RSE project on ShARC.

Benchmarking different nodes on our HPC systems

I have been benchmarking various nodes on Iceberg and ShARC using Matrix-Matrix multiplication. This operation is highly parallel and optimised these days and is also a vital operation in many scientific workflows.

The benchmark units are GigaFlops (Billion operations per second) and higher is better Here are the results for maximum matrix sizes of 10000 by 10000, sorted worst to best

According to the Iceberg cluster specs, over half of Iceberg is made up of the old 'Westmere' nodes. According to these benchmarks, these are almost 4 times slower than a standard node on ShARC.

The RSE project - the fastest nodes available

We in the RSE group have co-invested with our collaborators in additional hardware on ShARC to form a 'Premium queue'. This hardware includes large memory nodes (768 Gigabytes per node - 12 times the amount that's normally available), Advanced GPUs (A DGX-1 server) and 'dense-core' nodes with 32 CPUs each.

These 32 core nodes are capable of over 800 Gigaflops and so are 6.7 times faster than the old Iceberg nodes. Furthermore, since they are only available to contributors, the queues will be shorter too!

Details of how to participate in the RSE-queue experiment on ShARC can be found on our website

What if ShARC is slower than Iceberg?

These benchmarks give reproducible evidence that ShARC can be significantly faster than Iceberg when well-optimised code is used. We have heard some unconfirmed reports that code run on ShARC can be slower than code run on Iceberg. If this is the case for you, please get in touch with us and give details.

Sheffield R Users group celebrates Hacktoberfest


We'll be honest here and say that our Sheffield R Users group Hacktoberfest celebrations started as a last minute stroke of inspiration. Nearing our standard first Tuesday of the month meetup, our speaker lineup for October was thin. At the same time I'd spent the last month mentoring as part of Mozilla Open Leadership program again, which was gearing up to have projects participate in Hacktoberfest, a global month long celebration of open source, organised by Digital Ocean and designed to get people engaged with projects hosted openly on GitHub. For those unfamiliar with the platform, GitHub is one of many code repositories where open projects live allowing anyone to copy, modify and even contribute back to open source projects, many of which depend on such volunteer contributions. As it takes a village to raise a child so it takes a small village to build, maintain, continue to develop and support the users of a succesful open source project, where even small non-technical contributions, for example, to documentation, can be a huge help to maintainers (see Yihui Xie's (of knitr fame) blog post on this).

So what better way to entice folks to get involved than the promise of stickers and a free t-shirt on completion of the Hacktoberfest challenge! And the challenge? Simple. Make four contributions (pull requests) to any open source project on GitHub between the 1st and 31st of October. And the contribution can be anything — fixing bugs, creating new features, or updating and writing documentation. Game on!


Many project owners had labelled specific issues up and we noticed there were many rOpenSci projects in need of some #rstats help.

Given that doing is the best way to learn and working on problems outside our daily routines can be a great distraction, we thought it'd be a great idea to skip the standard talk meetup format for October and instead opt for some hands on Hacktoberfest action! It would also give the opportunity to any of our R users who were curious but did not have previous experience with GitHub and open source to learn more through practice and also in a friendly space where they could get help with any questions or uncertainties. Working the details through on Twitter (as you do!), an exciting plan emerged...not only would we extend to holding weekly sessions throughout the whole month, we would end with a special Halloween celebratory session!


Kick off meetup - briefing session

At the kick off meetup, fellow Sheffield R Users Group co-organisers Tamora James (\@soaypim) and Mathew Hall (\@mathew_hall) introduced participants to the general ideas and principles of open source, discussed contibuting to open projects, introduced GitHub and walked through scanning issues (seeing what things need doing in a particular project), forking repositories (making a copy of the materials associated with a project) and making pull requests (sending contributions...yes it was all greek to me in the beginning too...and I'm Greek!). Given the short time we had to prepare for the session, the materials provided by Digital Ocean on their Hacktoberfest event kit page were an invaluable resource and we can easily recommend them as a great introduction to contributing to open source. Of the 8 folks that made it to the session, 3 would go on to contribute pull requests over the month.


The sessions

Admittedly, when you work at a computer all day, spending another 3 hours at your screen voluntarily is probably not everyone's top choice. But I personally found the opportunity to carve some time out to explore the huge variety of projects and diverse ways in which to get involved engaging and in some ways quite relaxing. The great collaborative spaces available for booking at the University of Sheffield, the informal setting and hanging out with friends made the sessions something I actually looked forward to. And the "no pressure" aspect of voluntary contribution meant I was free to play around, follow my own curiosity and explore things I was interested in but don't necessarily get the time to work with during my normal working day. Indeed some participants came along to make use of the company and learn some new things not necessarily related to the Hacktoberfest challenge. So collaboratory, no pressure spaces can be really useful for sharing knowledge.


Halloween R-stravaganza

Finally it was time for the closing event, our Halloween Hacktoberfest special! Excitement was building from the day before when Tamora and I spent the evening carving our too cute to be scary octocat :heart: spooky R pumpkin!


We also got some candy donations from Sheffield RSE and a special guest, Raniere Silva (\@rgaiacs), who came all the way from Manchester to join us (although technically it had been his idea after all). The stage was set for a fun finale!


Success!

While we all got our t-shirts I was really impressed with Tamora and Raniere's contributions and their approach served as the biggest take-away for me. They both focused on a problem or feature that would improve a tool they were already interested in / used. They got feedback on their suggestion before they even begun by opening an issue on GitHub and interacting with the project's owners about their idea. That meant their efforts were well focused and much more likely to be accepted.




My t-shirt in the end was mainly earned by helping with typos. For Hackoberfest, the size of your contribution doesn't matter as long as you send a contribution. And finding typos is actually non-trivial and time consuming due to our brain's auto-correct feature. Sadly, the coding pieces that I worked on over the session did not end up making the cut to submit as a functional pull request yet (there'll be a personal blog about my experience during Hacktoberfest coming soon instead). Mostly however I loved the experience and am already looking forward to organising it next year!.

Thinking ahead, 3 things I'd do differently in 2018 would be:

  • Reach out to more organisations: There's a great variety of clubs and meetups at the University and more widely in Sheffield that could be interested in joining forces for a Hacktoberfest event. This would give us R users an opportunity to interact with users of other tools and potentially even tackle issues requiring mixed skills as teams.
  • Start planning earlier! This would give us an opportunity to advertise better leading up to the kick-off session and allow us to co-ordinate with other groups.

  • Run a git & GitHub clinic before the first hack session\: This would give the opportunity to folks that have not used GitHub before to get some experience and confidence before turning up to a hack session.

So long #Hactoberfest! See you in 2018!



Pumpkin Carving session and Halloween special powered by: