Open COVID-19 research

On the 24 March 2020, I organised the first online ReproducibiliTea University of Bristol meeting. We had 12 attendees, 6 from the University of Bristol and 6 from other institutions who saw the event advertised on my Twitter account and the ReproducibiliTea Online calendar.

Amazingly, we started on time with no technical issues and after a short introduction we got stuck into the main topic of debate: “What can open science do for COVID-19?”.

There was concern that more shared data, code and materials may mean more misinformation: Twitter is exploding with dodgy, over-simplified analyses using open COVID-19 data posted by people with no apparent research experience or credentials. More regression plots and visualisations of death counts, case counts and comparisons by country appear every day.

Some of us questioned if openness is really the problem. Many people do bad science with less transparent research practices, and open research practices allow us to identify problems and call them out. If people misuse open research materials we will know, because they are open.

The issue of misinformation may also stem from how people use, or abuse, open research. Why people access COVID-19 data and what problem they try to answer with it may determine if their answers harm or help others. For example, some people who post regression plots on Twitter maybe chasing likes, followers and clicks, rather than trying to inform effective public policy.

Ultimately, there is a “ton of garbage” but in amongst it is well-designed treasures, which are only possible on a large scale with shared data. Finding a way to sift through the rubbish and evaluate the helpful stuff could be very valuable.

We knew of many examples of free, publicly available COVID-19 data, one of the most famous being the dataset underlying the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University. We were less aware of COVID-19 simulation studies with available code. This may be concerning since some simulation studies have had major impacts on public policy during the pandemic, such as the “COVID-19 reports” from Imperial College London. It is particularly important to share the simulation code of these powerful papers, otherwise we will struggle to precisely replicate the methods and evaluate the model’s validity. For example, the code underlying a particularly influential “COVID-19 report” is not currently available. The authors are now trying to share their code but are finding it difficult.

Overall, we all agreed that sharing accessible and reuseable code, data and materials benefitted COVID-19 research but differed on who should be allowed to access these materials and how easily.

Student COVID-19 projects

We then moved onto a new discussion: “Should we get students to analyse COVID-19 data as part of a student assignment?”.

People thought this idea was exciting. Unlike many student projects, the research question would be incredibly relevant, so it offers students the chance to apply their skills to a real-world problem. This will probably make the project very engaging and interesting for students. The availability of open COVID-19 datasets also makes such a project quite feasible, although some datasets are missing metadata.

On the other hand, the topic’s relevance may be a problem. It is an anxious time for many and allowing students to think about other things may benefit their wellbeing. Lecturers could offer an alternative assignment for students who do not want to work with COVID-19 data, but this may be tricky to manage. The potential impact of students’ work has positive and negative aspects. If students share their work, they could contribute to the public’s understanding of COVID-19 and this may spread more misinformation. However, learning to think about the impact your work may have and what it means to disseminate research is an important lesson in itself. Sharing their work more widely may also encourage students to double-check their work and reflect on its ethical aspects and the uncertainties of the (meta) data. Lecturers could also try to minimise the risk of students spreading misinformation by limiting how informative or controversial their interpretations could be, perhaps by controlling how much data students get.

Overall, the greater potential impact of a project increases both the potential benefits and risks. Lecturers would need to weigh up the risks of a COVID-19 project, such as students spreading misinformation or overloading anxious students, with the benefits, such as a highly relevant research question.

Preregistration thumbnail. Photo credit preregistered_small_color.png by Open Science Collaboration is licensed under CC BY 4.0

Preregistering COVID-19 research

I then put a slightly different question to the group: did we think preregistration should be used for COVID-19 research, and if so how?”.

This question slightly divided us. For some, preregistration is what you make it. You control how much time and detail you put into your preregistration, so it could be fast. Even though we may be doing a lot of exploratory work in the context of this world-wide pandemic, it always makes sense to write down what you plan to do, regardless of how long it takes you.

Two attendees working in the computer and data sciences disagreed. Preregistration may be beneficial, but it is both hard to do and to discuss in computer science. One researcher wanted to pre-register their studies but found it difficult to do. The second researcher saw two different worlds within computer science: one building methods, the other applying methods. Preregistration felt particularly difficult for people who work on theoretical methods, who try many different things, throw away lots of stuff that doesn’t work, and don’t remember what they threw away. This researcher thought that preregistration is more suitable for applied computer scientists; for example, machine learning and language processing could gain something with pre-registration.

Registered Reports

One attendee asked if they should submit a Registered Report now, given that the timescale for collecting data is so uncertain. In response, someone argued that COVID-19 shouldn’t hold you back from submitting a Registered Report because COVID-19 will likely affect everyone conducting similar studies and journals will hopefully understand the situation and be flexible. Now may also be a great time to think more about exploratory open research initiatives, for example, the Exploratory Reports format currently offered by the International Review of Social Psychology and Cortex.

Final thoughts

Overall we had a lively, relaxed discussion focusing on the roles that sharing research, Registered Reports, student projects and preregistration can play in COVID-19 research. I encourage others to hold relaxed, virtual discussions, and we’ve pooled our experience at ReproducibiliTea to offer some guidelines on hosting these discussions.

Katie Drax is a PhD student at the University of Bristol studying meta-research. She founded and organises the ReproducibiliTea University of Bristol branch.