Peter Grabitz, Yuri Lazebnik, Joshua Nicholson, Sean Rife
Review posted on 11th September 2017
The idea of classifying hypotheses as supported or refuted by ongoing works, as a means to identify "strongly supported" or "strongly refuted" claims is an interesting one. I would like to see further discussion of how this could be applied.
Namely, it seems the R-factor is something that should be applied to a specific scientific claim, as opposed to a whole research article. Being able to quickly identify the evidence that supports (green), refutes (red) or relates unclearly (yellow) to a claim, directly from the claim in said literature, could aid comprehension (not to mention discoverability) of the surrounding literature, and highlight claims that are well-supported or lacking in independent replications. Do the authors feel that one paper is sufficiently related to one central claim for application of the R-factor the the paper? Alternatively, I would argue that judging the "veracity" of component evidence presented within an article could be more informative.
Further, limiting these data to the "cited by" literature from that paper could skew the perspective, depending on which article you are viewing the claim in - to understand the overall "veracity" of a claim, it seems the reader would need to navigate back to the first mention of that claim in order to find the longest chain of evidence. Instead, I would be interested to explore the feasibility of a claim-centric (as opposed to paper-centric) count, and to understand whether this is already achieved by existing practises (such as meta-analyses of the literature). Perhaps an alternative approach would be to ensure that meta-analyses that include an article are more clearly visible from that article (e.g. highlighted in a "cited-by" section), and an extension to that would be to link that more recent work to the specific assertions that it relates to in the current article.
I would also be interested in whether the authors' have any thoughts on the reporting bias towards positive results (it may be hard to judge replicability, if failed replications remain in desk drawers), as well as on more nuanced evaluations of related evidence: is some evidence stronger than others? Is it feasible to define a scientific claim, or is it dependent on context/species/other factors?
Finally, I would be concerned about applying such a metric to individual researchers. An examination of unintended consequences for such a metric would be useful to discuss.
Competing interests: None identified.
Full disclosure: I am Innovation Officer at eLife. My academic background is not in meta-research of reproducibility.