Tuesday 12 August 2014

Alternative metrics in the future UK Research Excellence Framework

In the previous UK REF, due to be reported in December 2014, individual subject areas (units of assessment) had the option to request citation counts for submitted articles, together with field and time normalization information. Most did not and there was no option to ask for any alternative metrics.

Here are recommendations for uses of alternative metrics, including altmetrics, in national research evaluation exercises, such as the UK Research Excellence Framework (REF). Please leave comments if you see problems with these recommendations.

  1. Alternative metrics should not be provided routinely for all articles. Alternative metrics seem currently to be highly susceptible to spam and to give little added information for typical articles. There are so many different altmetrics that it does not seem to be worth routinely collecting this data for all articles. In fact it is likely to be damaging to routinely collect alternative metrics for articles for evaluation purposes because this will push academics and research support offices towards wasting their time trying to attract tweets etc. to their work. Of course, a certain amount of self-publicity is a good thing and should not be discouraged but if it is measured then it is likely to get out of hand.
  2. Units of assessment should be given the option to provide alternative metrics in special cases. Research may have impacts that are not obvious from reading it or even from citation counts. For example, a research group may maintain a website that is popular with schools, host key software with a substantial uptake or produce books that are in reading lists around the world. To give a specific case, the point of many blogs is to attract a wide audience but how else can you prove that a blog is widely read that by reporting how many readers or visitors it has? You can immediately tell that you are reading something special when you get to Stephen Curry's blog but its real value comes from the number of other people who have come to the same conclusion. Researchers should have the opportunity to present data to support their claim of having a non-standard impact. For units of assessment that do not allow the routine use of citation counts, I think that citation counts should be allowed in special cases (and I did this in my own case). For all units of assessment, I think that alternative metrics should be allowed in special cases. I think that they will be particularly valuable for social impact case studies but can also be useful to demonstrate educational impacts for research.
  3. Assessors should be cautioned to not interpret alternative metrics at face value but to take them as pointers to the potential impact of the research. There are two important problems with interpreting alternative metrics. First, in most cases it is impossible to effectively normalise them for field and discipline and so it is hard to be sure whether any particular number is good or not. Second, this is exacerbated because an alternative metric could partly reflect "important" impact and partly reflect irrelevant impact, such as fun. For example, an article with a funny title could be tweeted for amusement or for the value of its findings. In practice, this means that assessors should use the alternative metrics to guide them to a starting position about the impact of the research but should make their own final judgement, taking into account the limitations of alternative metrics.
  4. Units of assessment submitting any alternative metrics should complete a declaration of honour to state that they have not attempted to game the metrics and to declare any unintentional manipulation of the metrics. Unintentional manipulation might include librarians teaching students how to tweet articles with examples of the university's REF publications. The declaration of honour should include information that the alternative metrics will be made fully public and that it is likely that future researchers in computer science will develop algorithms to detect manipulation of REF metrics and so it will be highly embarrassing for anyone that has submitted manipulated data, even if unintentionally. It is likely that this process will be highly effective because individuals are likely to gain access to the raw data used in services like Twitter and Mendeley and hence discover, for example, the IP addresses of tweeters and can also detect abnormal patterns of the accumulation of a metric over time so that highly sophisticated manipulation strategies would have a chance of detection. This declaration of honour gives a non-trivial degree of risk to the submitting unit of assessment and should act as a deterrent to using metrics in all except the most important cases. 

Introduction

The purpose of this blog is to share ideas about uses of alternative metrics for evaluation. It is in part a response to David Colquhoun's blogging against the use of altmetrics. I believe that alternative metrics can be useful in some research evaluation contexts and think it is useful to have a blog covering these contexts.

As part of the Statistical Cybermetrics Research Group, I have been using alternative metrics for research evaluations since 2007 and this seems like a good time to make recommendations for specific applications. Previous evaluations have been for a large UK organisation promoting innovation (Nesta), the EU, the UNDP and individual university departments. All the evaluations so far have had the common factor that the organisations evaluated produce knowledge, but not primarily traditional academic knowledge in the form of journal articles, and need evidence about the wider impact of their articles. For these, we have used a range of web-based metrics to give evidence of general impact. We always include extensive discussions of the limitations of the metrics used and and also recommend the use of content analysis in parallel with the metrics so that the numerical values can be interpreted more accurately.