Tuesday, 12 August 2014

Alternative metrics in the future UK Research Excellence Framework

In the previous UK REF, due to be reported in December 2014, individual subject areas (units of assessment) had the option to request citation counts for submitted articles, together with field and time normalization information. Most did not and there was no option to ask for any alternative metrics.

Here are recommendations for uses of alternative metrics, including altmetrics, in national research evaluation exercises, such as the UK Research Excellence Framework (REF). Please leave comments if you see problems with these recommendations.

  1. Alternative metrics should not be provided routinely for all articles. Alternative metrics seem currently to be highly susceptible to spam and to give little added information for typical articles. There are so many different altmetrics that it does not seem to be worth routinely collecting this data for all articles. In fact it is likely to be damaging to routinely collect alternative metrics for articles for evaluation purposes because this will push academics and research support offices towards wasting their time trying to attract tweets etc. to their work. Of course, a certain amount of self-publicity is a good thing and should not be discouraged but if it is measured then it is likely to get out of hand.
  2. Units of assessment should be given the option to provide alternative metrics in special cases. Research may have impacts that are not obvious from reading it or even from citation counts. For example, a research group may maintain a website that is popular with schools, host key software with a substantial uptake or produce books that are in reading lists around the world. To give a specific case, the point of many blogs is to attract a wide audience but how else can you prove that a blog is widely read that by reporting how many readers or visitors it has? You can immediately tell that you are reading something special when you get to Stephen Curry's blog but its real value comes from the number of other people who have come to the same conclusion. Researchers should have the opportunity to present data to support their claim of having a non-standard impact. For units of assessment that do not allow the routine use of citation counts, I think that citation counts should be allowed in special cases (and I did this in my own case). For all units of assessment, I think that alternative metrics should be allowed in special cases. I think that they will be particularly valuable for social impact case studies but can also be useful to demonstrate educational impacts for research.
  3. Assessors should be cautioned to not interpret alternative metrics at face value but to take them as pointers to the potential impact of the research. There are two important problems with interpreting alternative metrics. First, in most cases it is impossible to effectively normalise them for field and discipline and so it is hard to be sure whether any particular number is good or not. Second, this is exacerbated because an alternative metric could partly reflect "important" impact and partly reflect irrelevant impact, such as fun. For example, an article with a funny title could be tweeted for amusement or for the value of its findings. In practice, this means that assessors should use the alternative metrics to guide them to a starting position about the impact of the research but should make their own final judgement, taking into account the limitations of alternative metrics.
  4. Units of assessment submitting any alternative metrics should complete a declaration of honour to state that they have not attempted to game the metrics and to declare any unintentional manipulation of the metrics. Unintentional manipulation might include librarians teaching students how to tweet articles with examples of the university's REF publications. The declaration of honour should include information that the alternative metrics will be made fully public and that it is likely that future researchers in computer science will develop algorithms to detect manipulation of REF metrics and so it will be highly embarrassing for anyone that has submitted manipulated data, even if unintentionally. It is likely that this process will be highly effective because individuals are likely to gain access to the raw data used in services like Twitter and Mendeley and hence discover, for example, the IP addresses of tweeters and can also detect abnormal patterns of the accumulation of a metric over time so that highly sophisticated manipulation strategies would have a chance of detection. This declaration of honour gives a non-trivial degree of risk to the submitting unit of assessment and should act as a deterrent to using metrics in all except the most important cases. 


  1. I may have put mu comment on the wrong post, so I'll repeat it here

    I'm sorry that you didn't respond at all to the criticisms of altmetrics that I and Andrew Plested raised at http://www.dcscience.net/?p=6369 and in the BMJ.

    You say " I believe that alternative metrics can be useful in some research evaluation contexts", but you adduce no evidence that altmetrics measure quality. The examples that we gave show that, on the contrary, altmetrics are more likely to measure triviality.

    Blogs are great fun. Mine has had 3.7 million views and I hope it has helped to explain some scientific ideas. But it has nothing whatsoever to do with research. I couldn't have afforded to spend time on it when I was doing full time research. It us a fun game for the semi-retired and should not count one iota to my scientific reputation. That depends on papers like http://www.onemol.org.uk/Colquhoun%20&%20Hawkes-1982-ocr.pdf
    I don't think that one would get far on social media.

  2. Thanks very much for your comments. I didn't reply to your criticisms of altmetrics because I agree with most of them. The purpose of this blog is not to argue that altmetrics are perfect - they clearly are not - but to argue for specific ways in which they can be useful.

    To give some context to this, there are many different ways of making valuable contributions to research. Researchers can work in the physical, medical, social or formal sciences or could be engineers or work in the arts and humanities. I think that it is easy for an individual to get a misleading impression of what research is by looking at their own situation, which may be very different from that of others. Researchers may also make valuable contributions to research in non-traditional ways, for example by helping with education, commercial or other applications, communicating science to the public or informing policy decisions. I think that in some cases blogs, websites, data archives, films, computer programs and other non-standard outputs can be valuable contributions to science and that unless we find ways to recognise them, the researchers will be encouraged to stick to traditional scientific publishing.The UK government wants to recognise research that has an impact by requesting impact case studies as part of the REF2014. This is explicitly non-academic impact and I think that this could include highly successful blogs and I think that it is natural to present altmetrics as part of a case for the success of a blog. This is because impact implies not just the internal quality of something but an audience (and an effect on an audience, but that is hard to get evidence for).

    I would not claim that altmetrics measure research quality. There is evidence that they can be indicators of the impact of research and I think that they can be valuable for helping to make the case for the impact of research.

    Finally, there is statistical evidence here that altmetrics are not always random or irrelevant but can associate with citations
    and there is statistical evidence here that citations are not always meaningless but can associate with peer-review judgements in some disciplines

  3. Thanks. I'm glad that you agree that altmetrics have nothing to do with quality. Since the idea of the REF was to judge excellence in research, this must be held against them.

    At the end you seem to contradict yourself by saying that altmetrics do correlate with citations in your PLOS One paper. But you also say in that paper "it is not possible to speculate about the degree of accuracy for citation estimates made with altmetrics from the data set used here".
    In any case citations take decades to accumulate (if it's a good paper anyway). It's far too early to say.

    The meaning of "impact" varied quite a lot as HEFCE struggled to wriggle out of the sillier definitions, but I doubt whether altmetrics measures it well by any definition. The fact that my blog has had 3.7 million hits isn't included in any of the commercial altmetric, as far as I know. But I do get credit for reposting other people's TV programs on YouTube! I don't even get altmetric brownie points for things that I write for newspapers or for TV appearances, as far as I know The commercial products don't even measure well what they purport to measure.

    You idea of having an honour code is interesting, but I fear that it wouldn't work at all. Codes of practice are almost universally ignored when their is money at stake. It would amount to yet more box-ticking.

    I have nothing against the idea that you should get some credit for things you do that aren't in formal papers (though how much is questionable if the aim is to measure research excellence). It's perfectly easy to list these things yourself. You don't need to pay companies to provide a half-baked version of them.

  4. Thanks very much again for your comments - again I agree with most of them. Perhaps on the correlation/quality issues I can make a general point about the difference between a measure and an indicator. A measure of the quality of research should be something that is pretty foolproof and robust whereas an indicator for research quality can be flawed and clearly wrong in some cases but may still have uses. I think that citations and alternative metrics are not measures of research quality but can be indicators of it in some contexts. For example whilst some brilliant articles have few citations and some terrible articles are highly cited, in general more highly cited articles tend to contain research that scientists would rate more highly - as long as the citation counts are normalised for field and time. This is what the http://arxiv.org/abs/0912.2601 shows in some cases. This doesn't mean that anyone can claim that article X has Y citations, therefore it is brilliant/hopeless because citations do not *measure* quality, even if they correlate with peer judgements of it in some cases. This does mean that looking at citation counts in some cases can be useful as, for example, (a) a starting point for identifying good or bad research - especially if averaged over large collections of articles, (b) a second opinion about what is good research (peer review judgements are often wrong). I think that both of these uses are OK.
    Altmetrics are less useful indicators than citations because their correlations with peer judgements of research are probably much weaker (I don't think anyone has checked this directly yet, only indirectly with correlations against citations) and they are easy to manipulate. Their only advantage is that they accumulate before citations. So I can't see how systematically reporting altmetrics for individual articles can help in the REF at the moment but I think that they could mainly help as impact indicators for some impact case studies, where they can be used to underpin a case for an activity having an impact.
    I think that your figure of 3.7 million hits for your blog is a great example of a useful altmetric - it shows that your blog has a big audience in most convincing way possible as far as I can tell and suggests that it has a big impact. I would be happy to see claims like this in future REF impact statements (although yours doesn't count of course because it doesn't fit the REF impact scope). I don't think that a statistic must come from any specific data provider to be used for this. Any reasonable source would be OK.
    The idea of the honour code is to increase the embarrassment factor for people that try to manipulate the scores. I think that computer scientists will be very good at identifying manipulation and there is a research area called adversarial information retrieval that already does this for spam. So it is likely that any significant attempt to manipulate altmetrics will be caught, named and shamed.

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by the author.

  7. This comment has been removed by the author.