ReproHacking at Opencon London 2017 Doathon

Building on the success of last year’s #Reprohack2016 for the Berlin OpenCon satellite event, I rejoined the team of organisers (Laura Wheeler, Jon Tennant and Tony Ross-Hellauer ) and teamed up with Peter Kraker to develop the hackday for this year’s OpenCon London 2017.

To allow better reflection on this year's theme, “Open for what?”, we expanded the format to two tracks, opening up the scope for both projects and participants. One track retained the ReproHack format from last year, the other, a broader track, offered the opportunity for leads of any type of open science project to submit them for work. Projects were not constrained to coding and you didn't have to code to take part in the session - anyone with an interest in creative contribution to open science in whichever capacity was welcome.

On the day, after a round of introductions and sharing our motivations for attending, we reviewed the submissions and knuckled down.


ReproHacking

The original ReproHack was inspired by Owen Petchey’s Reproducible Research in Ecology, Evolution, Behaviour, and Environmental Studies course, where students attempt to reproduce the analysis and figures of a paper from the raw data, so we wanted to attempt the same. They take a few months over a number of sessions though, so, given our time constraints, we focused on reproducing papers that have also published the code behind them. This year we had a whole day, which gave us more time to dig deeper into the materials, moving beyond evaluating them for reproducibility, to how understandable, even how reusable they were. While fewer, we still had some excellent submissions to choose from.

I'm pleased to report that two of the three papers attempted were succesfully reproduced!


***

I was particulalrly impressed with the paper Andrew Ajube tackled:

The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles Piwowar et. al

Under very minimal supervision and never having used R or Rstudio before, he managed to reproduce an analysis in an rmarkdown vignette]. I think this speaks volumes, to the power of the tools and approaches we have freely available to us, of following best practice (well described in this epic Jenny Bryan blog post on project-oriented workflows), the effort the producers went to, and of course, genuine engagement by Andrew. It can work and it is rewarding!


***

I worked with Marios Andreou on a very well curated paper, submitted by Ben Marwick.

The archaeology, chronology and stratigraphy of Madjedbebe (Malakunanja II): a site in northern Australia with early occupation Clarkson et. al

The paper offered two options to reproduce the work. A completely self-contained archive version in a docker container, which Marios spun up and reproduced the analysis in, in no time. I opted for the second option, installing the analysis as a package. It did require a bit of manual dependency managing but, this was documenteted in the analysis repository README on github. This meant that all functionality developed for the analysis was available for me to explore. Presenting the analysis in a vignette also made it’s inner working much more penetrable and allowed us to interactively edit it to get a better feel of the data and how the analysis functions worked. Ultimately, not only could we reproduce the science in the paper (open for transparency), we could also satisfy ourselves with what the code was doing (open for robustness) and reuse the code (open for resuse). The only step further would be to make the functionality more generalisable.

At the end of the session we collected some feedback about the experience, reflecting on reproducibility, the tools and approaches used, documentation and reusability. Here's the feedback we had for Ben's work.

As calls for openness are maturing, it's good to push beyond why open? to indeed, open how? open when? open for what? Different approaches to how a study is made "reproducible" has implications on what the openness can achieve downstream. It's probably a good time to start clarifying the remit of different approaches.


Do-athoning

On the do-athon side there were a couple of cool projects. Peter and Ali Smith worked on Visual data discovery in Open Knowledge Maps, adapting their knowledge mapping framework, Head Start, to the specific requirements of data discovery.

Tony, Lisa Mattias and Jon continued work on the, open to anyone, hyper-collaborative drafting of the "Foundations for Open Science Strategy Development" document, started at the OpenCon 2017 Do-athon.


Agile hacking

One thing I love about hacks is that you never know quite what skills you’re gonna get in the room. In our case, we got scrumaster Sven Ihnken offering to help us navigate the day. We’ve actually been agile working for the passed few months with the nascent shef dataviz team and I find it a productive way to work. So an agile hack seemed a worthy experiment. I personally thought it worked really well. It was nice to split up the day into shorter sprints and review progress around the room half way. And Sven did do a great job “buzzing” around the room, keeping us focused and engaged and ultimately, getting all our tasks from doing to done!

***


At the end of the day, we shared what we'd worked on, and settled in for the main Opencon London evening event. As the event talks went through more traditional remits of OpenCon, from public engagement to open access to literature and data, it just reiterated to me that each strand of openness is yet another way to invite people in to science. For me, inviting people all the way in, into your code and data, your entire workflow, is the bravest and most rewarding of all!