R is a statistical programming language and one of the most popular languages for data analysis, statistics and plotting in academia and industry. Learning a new language can be daunting, particularly if you have no experience of scripting and are used to Graphical User Interfaces (GUIs) where you point and click to perform your statistical analysis.
Fear not though, there are lot of resources and very friendly, enthusiastic and helpful R users out there who can help you on your journey learning R. This post details some of them, and I’d welcome additions.
Most of these resources are links to websites that are free and openly available. Where books are linked they are very often freely available on-line, but there will also often be the possibility of purchasing a hard copy, which you may want to consider doing if you find the resource useful to help support the authors.
R has a number of bodies, organisations and companies associated with it.
R is software and will need installing on your computer. Because it is Free Open Source Softrware (FOSS) you can download and install it on your computer for free. You will have to install it to use it and the isntr
Integrated Development Environments (IDE) are software that help you write code faster and more consistently courtesy of various features such as syntax highlighting, automatic bracket and quote pairing, automatic indentation and a suite of functions for performing common tasks such as version controlling files or rendering documents.
The most popular IDE for R is RStudio Desktop which has excellent support for R, RMarkdown/Quarto and basic Git support. If you are new to version control with Git you may want to consider using GitKraken which provides an intuitive point and click interface for version controlling your files and working with GitHub/GitLab.
My personal preference is to use Emacs and the package Emacs Speaks Statistics (ESS). This is a robust solution (ESS) has been around for decades and you get the convenience of using Emacs and its many packages such as the amazing Magit for carrying out all Git related tasks. It has a steeper learning curve than RStudio but in my opinion is completely worth the effort.
If you’re using R the chances are you want to perform some sort of Statistical Analysis on your data. This often involves cleaning data that has been received, writing code to summarise, tabulate and plot your data, often in a literate manner (which means reports are open and can be reproduced easily). If you read nothing else to get you started using R for this work then you should read R for Data Science by Hadley Wickham and Garrett Grolemund. This is an excellent book that is available for free online.
R has its own Markdown language for writing literate documents and a comprehensive resources covering all aspects is R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garret Grolemund. By writing your work in R Markdown you are performing literate programming and it means your report can updated automatically if the underlying data changes. Output to HTML, PDF, LibreOffice, Microsoft Office and many other formats. The underlying source can be version controlled using Git so that it is documented, backed up (e.g. on GitHub or GitLab) and it is easy to collaborate with colleagues.
More recently Posit (nee RStudio) have developed Quarto the next iteration of RMarkdown. It supports more document types (e.g. blogs and RevealJS slides) and has excellent documentation and a growing number of extensions. If you are just starting out I would recommend using Quarto over RMarkdown.
You will hear a lot about the Tidyverse which is an opinionated collection of R packages designed for data science. They are well worth learning as they make writing code considerably easier than with the base R packages. You won’t need all of the packages immediately but key ones to learn are
If you’ve large datasets the dtplyr which uses the
data.table package in the background but with dplyr
code. data.table
is considerably
faster than dplyr
for many operations. This is particularly noticeable when you have large datasets.
There is a wealth of resources out there for learning and using R for different topics. The following is that which I’m aware of, if there is an omission please open an issue on my blog
There are some excellent resources for learning Bayesian Analyses with R. Perhaps the most comprehensive and in-depth is Statistical Rethinking by Richard McElreath. He runs regular free courses teaching the material in the book (Statistical Rethinking 2023) and the book content has been translated to other R frameworks and Python. Another very good book is Bayes Rules! An Introduction to Applied Bayesian Modeling. These are both covered in the Bayesian Statistics - Syllabus course by Andrew Heiss.
R has excellent support for producing graphs, figures and data visualisations. There is the base graphics that have been around since the beginning, but more recently the ggplot2 framework introduced by Hadley Wickham which implements Leland Wilkinson’s Grammar of Graphics has been very popular.
[Data visualization with R and ggplot2 | the R Graph Gallery](https://r-graph-gallery.com/ggplot2-package.html) |
It is good practice to version control your code and literate documents as you develop them. This can be achieved using the version control system Git. Get yourself an account on GitHub and/or GitLab and settle down to read Jenny Bryans excellent Happy Git and GitHub for the useR.
There is a lot more to R than just Statistical analysis and one day you may want to investigate these in greater detail. The links below are to more advanced topics such as writing and maintaining packages or specific tasks such as text mining.
It is good practice to version control the code you write, it provides an electronic paper trail of how your code has evolved over time and allows you to keep track not just of the code itself but why it has changed or been written.
These days the most popular version control system is Git and projects are often hosted/backed up on
popular “forges” such as GitHub or GitLab. Sign up with an academic email
address (@<institute>.ac.uk
or @<institute>.edu
) and you will have a few extra benefits.
Learning Git is a whole, vast, topic in and of itself, but to get started with R and Git see the recommendation above. If you are a student or researcher at The University of Sheffield you may want to consider taking the Research Software Engineering (RSE) Teams popular Git, GitHub and GitKraken : Zero to Hero course which runs regularly throughout the year. Sign up to their mailing list and you’ll be notified of when the course runs. Alternatively email them to find out when the next course is scheduled to run.
The Comprehensive R Archive Network (CRAN) is the primary place to look for R packages. It also contains a number of subject specific Task Views which are pages that summarise the packages and resources associated with a particular topic. There are also links to the official manuals, FAQs and user contributed documentation.
The R Journal is the peer-reviewed, open-access scientific journal published by the R Foundation. It includes articles on packages, reviews and proposals, comparisons and benchmarking, applications of existing techniques and special issue articles to accompany conferences or particular topics.
Cheatsheets come in handy as a reference to packages and commands. A central repository of cheatsheets is maintained by Posit.
The R community is incredibly supportive, welcoming and helpful. There are over 600 User Groups around the world where R users meet up and share their experience and knowledge and support each other. Sheffield has its own SheffieldR User Group.
R-Ladies is a worldwide organisation whose mission is to promote gender diversity in the R Community. Groups around the world have their own meetups and activities.
There is also the R4DS Online Learning Commuity which helps you work through the R for Data Science book. They have an active Slack channel for coordinating the courses and run Tidy Tuesday, a weekly podcast and community activity which is a great way of learning now tasks in R.
The NHS R Community is focused on applications of R in the NHS Research community. They have blogs, a Slack channel and conferences.
The R Bloggers site aggregates blogs from people who write about R and is a brilliant resource. A few highlights are noted as well but R Bloggers is probably the best resource. If you want to subscribe to these most have RSS feeds
Find posts and resources on Mastodon by searching for the #rstats
hashtag (you will also find this is still widely
used on the site many migrated to Mastodon from). Over time you will find many enthusiastic users and developers who
share there knowledge and experience freely.
For queries relating to collaborating with the RSE team on projects: rse@sheffield.ac.uk
Information and access to JADE II and Bede.
Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group.
Queries regarding free research computing support/guidance should be raised via our Code clinic or directed to the University IT helpdesk.