Demonstrating Importance and Value in Research Software

David Wilby

13 October 2022 10:00

As research software engineers (RSEs) or researchers who develop software & code, at some point we will need to provide evidence that our software actually has some positive effect on the world. Whether this is for career progression, the REF, just for your own peace of mind or any of a whole host of reasons, collecting usage data takes many forms but can often come as an afterthought.

This International RSE day (Thursday 13th October 2022) let’s have a look at how we can plan to collect some evidence to demonstrate the value of our work and hopefully mitigate some potential stumbling blocks in our career progression.

When it comes to ‘traditional research outputs’ (i.e. peer-reviewed publications), the dreaded journal impact factor¹ has come to be the de facto standard measure of the quality of a piece of research, however flawed we know it to be.² Along with numbers of citations³, these measures are a big part of what’s used to determine how worthy the author of some research is of: promotion, funding, recognition etc. But with software, we don’t implicitly have the *ahem* “luxury” of impact factors and numbers of citations so using other ways of showing that our work has merit is something of a necessity, but how do we do that?

The dangers of metricization

I want to preface this blog by pointing out that simplifying a complex topic such as research quality down to a set of easily digestible metrics is a terrible idea. Publication itself demonstrates to us how dangerous this is, with researchers vying to publish in the top journals. Retraction rates actually scale with impact factors⁴^,⁵ - an indication of how poorly these metrics can represent research quality. Of course, much research published in the ‘top’ journals is of high importance (though quality may vary) and the authors should be recognised accordingly, but simple metrics can do a disservice to researchers when seeking funding, promotion and so on.

James Hetherington, now Director of UCL’s Advanced Computing Research Centre discussed these dangers in this talk a few years ago:

What I’m discussing here, is not a reduction of research (software) down to a set of simple metrics, but the gathering of evidence to assist your demonstration to the decision makers that your work is of good quality. And perhaps that’s a matter of presentation. More on this later.

Importance, quality & impact

The concepts discussed in this article apply similarly to what one may term: research “importance”, “quality” and “impact”, referring to the influence a research output has on academia, the standard to which it is produced and the effect it has on world outside academia, respectively. These terms may be used differently in different settings but the distinction is often drawn in more or less the same way.

We often conflate importance and quality in academic output. For instance, if a paper has a lot of citations or is published in a high impact factor journal, it is deemed to be good. In this case the research may be important, but these facts don’t speak to its quality. Academic progression applications may ask for demonstration of high quality research output, but generally they are likely to value high impact factors and citations more than the standards to which research is done.⁶ Here, by these definitions, I’ll mainly be considering how to demonstrate importance and impact.

For software, the evidence you use may have some similarities, but it’s important to make the distinction between academia and the rest of the world. Some kinds of evidence are fairly straightforward, e.g. citations are academic and software sales are likely to be impact, but for package downloads or numbers of users it may be difficult or impossible to differentiate between the two. Depending on your approach for gathering statistics, you may be able to manage some way of telling the difference between academic and non-academic uses.

For impact: remember to think even further outside of your bubble, you may be able to find some useful evidence that you wouldn’t immmediately expect. Do some hunting via your search engine of choice, google scholar or similar every now and then and you may find an article, report, derivative work, youtube tutorial or something else entirely that will help you out.

Why do we need evidence at all?

In an ideal world, assessors would have the time to assess your software on its own particular merits: code quality, extensibility, robustness and so on. But frankly, in all likelihood your software won’t be evaluated at all, let alone by anyone with the expertise to understand the important characteristics you’ve worked so hard on. So what we’re trying to achieve here is to make it easier for others to understand the standard of the work we produce.

Note that software specific journals such as the Journal of Open Souce Software do review software and act as a demonstration of quality.

By using numbers of downloads, values of grants supported, users, volumes of data processed or other relevant metrics alongside our claims of research excellence, this can crystallize the point you’re making much more immediately. Think about these two statements:

I was the lead developer on the project producing an all singing, all dancing python package published to PyPI.

I was the lead developer on the project (GBP1.2M) producing a python package published to PyPI (20k downloads a week).

Whilst overly simplistic (and perhaps slightly misrepresentative in the case of frequency of downloads) a reader will get a much quicker and clearer idea of how great your software is from the latter. Fortunately for us, we have much more leeway at present in evidencing software than we do with publications, so we can choose the evidence that best supports our particular project best.

Actually, the above example doesn’t do the best job of presenting the evidence in a way that doesn’t concentrate on the metrics themselves. Instead, we could avoid this by just using our evidence to back up our claims, perhaps

I was lead developer of software that was developed to enable research on a GBP1.2M project, supporting 2 PhD Students and 2 PDRAs to publish 3 papers. The software also supports a community of researchers in the discipline and is publicly available as open source software with 20k downloads a week.

Types of evidence to collect..

Some useful evidence is there for the taking without a developer needing to collect it: values of grants supported, publications (!), other research roles that your work enables, etc. Some other evidence has to be gathered more deliberately, whether via a technical mechanism, or through collecting testimonials and other feedback from users and PIs.

Qualitative evidence - Easily overlooked but also easily achieved, qualitative evidence such as testimonials from researchers or PIs or feedback from users can form strong evidence for the usefulness and quality of software. You could even include some utility to gather this in your code, if suitable. Ask the researchers on the project to send you a paragraph or sentence explaining the importance of the software, let them know how valuable this will be and that ‘traditional’ metrics aren’t so useful for research software and they’ll be glad to help, I’m sure.
Users, Downloads, Data - Numbers, numbers, numbers. Easier said than done but get these if you can, write them to logs or store them in a database, whatever works for your software. Having numbers to illustrate the scale of use of your software will make any claims of research excellence or impact significantly more salient.
Money, Money, Money - Money talks. So whether it’s the value of a grant or something more commercial such as sales of the software itself or a product enabled, monetary values can be very strong evidence.
Publications & Citations - Depending on whether your software is used exclusively by researchers in your immediate network or more widely, the ease of keeping track of publications will vary. All you can really do here is make it easy for people to cite your repository and/or paper and keep track of the citations. Of course you could always ask users to let you know when they publish work that uses your software, but if it’s very popular, you may not want hundreds of emails a year about this!

One approach is to encourage citation of your software in publications, which will clearly show its value. The strategy here is to lower the barrier for others to cite your work. If there’s a paper associated with the software, encourage its citation in your documentation and provide pro forma citation text that can be used in publications. Additionally/alternatively, for citation of the software itself provide a citation file (CITATION.cff) with your code (info at citation-file-format.github.io) which can help to smooth the citation process and provide citations in various formats.

Last Resorts - If your project is on GitHub, stars on the repository or number of clones are collected by default, but I wouldn’t rely on this. Stars are a really unreliable measure of quality and certainly most academics are unlikely to use them, I imagine. Currently GitHub only shows the last two weeks of clones for a repository and as far as I can work out there is no way of boosting this, at least for free accounts. Also it’s not clear if continuous integration runs count towards this, which may mean the number of repo clones is over-represented (arguably not a bad thing for this purpose though..).

And how to get it

A lot of these types of evidence you’ll just have to keep track of and get hold of manually, but for usage statistics you can use something more sophisticated.

Third party usage tracking - Say your software is a web app, you could employ a tool like Google Analytics or Plausible to gather and aggregate usage data.
Roll-your-own usage tracking - If you’d rather keep your usage tracking under your own control, you could consider including a utility in your code for it to send data back to a web server at some useful points such as installations or downloads. You should of course inform users of this and allow them to opt in/out, but in an open source project for instance, you can provide users with the peace of mind that you aren’t collecting any nefarious data.

Some examples of projects using this approach I was recently made aware of are Sire and BioSimSpace, both open source research software. Sire collects and reports some basic information such as operating system, processor and software version which are described publicly and visualised here.

Considerations for different types of software

Depending on what your software does and how, the approach for collecting evidence may vary wildly, so you’ll have to tailor your approach.

Open Source

If your work is open source (and many funders now stipulate this) there are some elements to evidencing quality that can be easier.

For one, you can simply demonstrate that your code actually exists - which is a good start! For this reason, it can be a good idea to discuss with the project PI and explain the manifold benefits of open source software if it’s appropriate. Most PIs will be receptive and understand that being able to show the quality of your work is vital for career progression. It’s harder to evidence your work if it has no publicly visible element whatsoever.

Depending on the project, being open source might enable much wider use outside of your immediate network of contacts, increase the impact within the academic community and potentially increase citations.

Closed Source

Ideally, I personally would advise against keeping research software closed source unless absolutely necessary, but there are lots of valid reasons to do so. So in this case it’s absolutely vital to collect some evidence, whether it’s numbers of users or whatever is relevant to the project. A strong testimonial from the project PI or other stakeholders will go a long way.

Packages

If your project is a package that’s available via a public package repository like PyPI or CRAN there are likely services available to provide stats for package downloads, for example PyPI Stats or META CRAN. In addition, it may be that merely making a package available via a major package repository is good evidence of quality in and of itself, since they typically enforce a certain standard and require proficiency to have published there.

Commercial software

Commercial software almost intrinsically has evidence metrics associated with it. If the software is the product itself you might already be collecting numbers of customers and fiscal measures that might be able to be used as evidence, or if the software is used inside a commercial organisation you have some involvement with (a spin-out for example) then numbers of sales, customers etc. might already be available for you to use.

It’s possible that just the simple fact of having customers at all could count as some pretty strong evidence for quality research software!

Software with small user bases

Of course not everything is a whizzy package or web app with a public face, perhaps the essential software you’ve been working on is specific to a single project or research group. In this case your software is every bit as important but it can be harder to show evidence of this since it will be likely to have a very small user base.

The approach is much the same here too, but qualitative evidence may prove to be very important indeed. Ask the PI for a very, very strong testimonial and collect grant values, numbers of students, publications etc. as well.

On the other hand, consider early on in the project whether or not it can be designed in such a way that it can be used by a more general user base and/or be open source. This can require a bit more effort but in the grand scheme of things you could end up with a much larger group of users or even a community of user/developers, generating a far greater impact.

Finally

Some final words:

Try to collect evidence proactively - It’s easier said than done, but try to plan ahead for a project how you might collect some evidence, it’s not much fun but it will pay off in the future when you don’t have to desperately scrabble for it - take it from me!

Disclaimer - This certainly isn’t a comprehensive guide, so if you think something is missing you can get in touch with me at d.wilby [at] sheffield.ac.uk and I’ll update the post accordingly.

Acknowledgements: Many thanks to colleagues Bob Turner, Neil Shephard and Robert Chisholm for their useful input.

Originally conceived as a metric to help librarians select journals to subscribe to, the impact factor is a measure of a journal’s ratio of citations to papers published. https://en.wikipedia.org/wiki/Impact_factor ↩
See the Declaration on Research Assessment (DORA) for more on the movement to improve the way researchers and scholarly work are evaluated. ↩
Flawed for many reasons as well, not least because of the massive disparity in citation rates across disciplines. ↩
F.C. Fang, A. Casadevall (2011) Retracted Science and the Retraction Index. Infection and Immunity https://doi.org/10.1128/IAI.05661-11 ↩
Why high-profile journals have more retractions. Nature (2014). https://doi.org/10.1038/nature.2014.15951 ↩
Here in the UK, the government’s Research Excellence Framework (REF) has a major effect on the way research is assessed. In short, the REF asks universities to submit their best pieces of research output and demonstrations of research impact, ultimately determining how much government funding universities receive from government. This has a downstream effect on how academics are assessed for career progression. ↩

Previous Next

Blog Home