Giant study finds untrustworthy trials pollute gold-standard medical reviews
Two-year collaboration aims to create tools to help counter the tide of flawed research
A huge collaboration has confirmed growing concerns that fake or flawed research is polluting medical systematic reviews, which summarize evidence from multiple clinical trials and shape treatment guidelines worldwide. The study is part of an effort to address the problem by creating a short checklist that will help researchers to spot untrustworthy trials. Combined with automated integrity tools, this could help those conducting systematic reviews to filter out flawed work – in medicine and beyond.
In the study, which has taken two years and was posted on 26 November to the medRxiv preprint server1, a team of more than 60 researchers trawled through 50 systematic reviews published under the aegis of Cochrane, an organization renowned for its gold-standard reviews of medical evidence.
After applying a barrage of checks, the authors — many of whom are themselves editors or authors of Cochrane reviews — reported that they had “some concerns” about 25% of the clinical trials in the reviews, and “serious concerns” about 6% of them.
The study can’t provide an overall estimate of problematic trials in Cochrane reviews because the sample selected — for the purpose of trialling integrity checks — wasn’t random or representative, says co-author Lisa Bero, a senior research-integrity editor at Cochrane.
Still, “we definitely picked up some dodgy trials”, says Jack Wilkinson, a health and biostatistics researcher at the University of Manchester, UK, who led the project, titled INSPECT-SR. He adds that the proportion found in the study might be an overestimate, because some of the checks turned out to be subjective or difficult to implement.
A protocol for ‘trustworthiness’
The results echo previous concerns about rising numbers of problematic studies corrupting systematic reviews in medicine and other research fields2,3,4,5, probably owing to paper mills that produce fake science.
Recognizing this problem, Cochrane introduced guidance three years ago that researchers should try to spot untrustworthy trials and exclude them from reviews. But although scientists have used a variety of protocols to do this, there is no universally agreed tool to help identify an untrustworthy study, says Bero, who is also a bioethicist at the University of Colorado Anschutz Medical Campus in Aurora.
“Frankly, these tools haven’t been tested at all,” she says, adding that researchers are unlikely to use methods that are difficult, lengthy or unclear. One integrity checklist aimed at journal editors proposed more than 50 questions, for instance, which some scientists say is too many.
Testing red flags
The aim of the INSPECT-SR study was to test 72 potential integrity checks that might help to identify untrustworthy work, garnered from a previous wide-ranging consultation. They range from specific statistical checks on a trial’s data and methods, to details of funding and grants, the date a trial was registered, and its authors’ publication records.
The study found that some checks are too cumbersome or infeasible in practice. Now, the list has been sharpened to a shortlist of 21 questions in 4 areas: a study’s post-publication record (such as retractions or expressions of concern); its methods, governance and transparency (such as study registration, ethical approval and how participants were recruited); whether it has plagiarized text or manipulated figures; and detailed ways to check for discrepancies in the data and results. These might be further whittled down before the shortlist is published.
Wilkinson’s team is working on a similar checklist that journal editors might apply to papers, and a third checklist for statistical tests that could be used if a reviewer can access the individual participant data in a trial, rather than just summary results. Although many researchers argue that it should be mandatory to provide such data for reviewers, medical journals rarely require it.
Automated tools
Scientists’ main concern about checklists is the time it takes to assess each study, says Wilkinson. A Cochrane review might involve anything from a few studies to dozens of trials, but in systematic reviews outside of medicine, there could be hundreds of papers to examine. That’s a bigger worry, say Kim Wever, a meta-science researcher who specializes in analysing systematic reviews, and René Aquarius, a neurosurgery researcher, both at Radboud University Medical Center in the Netherlands. In work not yet published, they have found many flawed papers among more than 600 studies on animal models of haemorrhagic stroke, as Retraction Watch and Science have reported.
Reviewers of preclinical work also typically have less funding for their studies, and must examine papers that tend to have fewer signals of stringent reporting — such as being recorded on an official registry — than do clinical trials, adds Torsten Rackoll, a systematic-review methodologist at the Berlin Institute of Health.
In the past few years, however, automated software tools have sprung up that can help with some checks. Software such as Imagetwin looks for duplicated images in papers, for instance, and tools such as Signals, Papermill Alarm and Argos raise alarm bells about author retraction records, the studies a paper is citing and other warning signs.
At a meeting in London on 3 December, computer scientist Daniel Acuña at the University of Colorado, Boulder, described a new tool called Reviewer Zero, which promises to check for statistical inconsistencies as well as image manipulation.
“These tools are essential,” Wever and Aquarius wrote to Nature, noting that they have collaborations with Signals and Imagetwin.
Unfortunately, most of the tools also require subscriptions, says Otto Kalliokoski, an animal behaviourist at the University of Copenhagen, who this February reported spotting more than 100 studies with questionable images while doing a systematic review of depression studies in rats5. His team used a beta version of Imagetwin, he says, but he now no longer has access.
Patrick Starke, the co-founder and chief executive of Imagetwin in Vienna, says the team can afford to work for free with researchers and integrity sleuths only on selected projects. “Financial support from government institutions or funders would be needed to enable us to provide the service to researchers on a larger scale,” he says.
Help for systematic reviewers
Tools that flag issues using a paper’s metadata, such as author records or cited studies, are more openly available. Anyone can check papers — one at a time — at Signals; this month the London-based firm Research Signals, which makes the tool, announced it would incorporate reviews from outside experts into its paper checks. Elliott Lumb, the co-founder of the firm, says it also offers free bulk access to research-integrity sleuths and other researchers who provide contributions, such as reviews, to the tool.
Systematic reviewers think it would be helpful if journals could get better at blocking untrustworthy or fake papers during peer review, or retracting them after detection. “The main problem for me is still that journals take an enormous amount of time before they retract, if they retract at all,” says Ben Mol, an obstetrician and gynaecologist at Monash University in Melbourne, Australia, who was part of the INSPECT-SR collaboration, and says his sleuthing work has now led to almost 300 retractions of fake trials.
If papers in a systematic review get retracted, Cochrane policy is that reviewers should revise the work to remove the studies. Ella Flemyng, who manages projects to improve the methods of Cochrane reviews, in London, says the team is now aware of dozens of published Cochrane reviews that are affected by retractions of included studies.
doi: https://doi.org/10.1038/d41586-024-04206-3
This story originally appeared on: Nature - Author:Richard Van Noorden