This is a very interesting preprint of what I think is powerful underlying research, and I thank the authors for posting it and considering comments. I few suggestions for clarifying some parts I found confusing before the ink is dry on its final “postprint” form.
Lines 47-48: Introductory paragraph is confusing. “percentage of fabricated papers might be just a fraction of the percentage of self-reported misconduct.” Is this intended to be the other way around, self-reporting of misconduct is likely lower than actual?
Lines 72-74: The statement “mutual criticism and policing of misconduct might be least likely to occur in developing countries in which academia was built on the German model” was confusing since the German model is nowhere explained, and German research fared well in the results. The supporting reference  is paywalled but their abstract contrasts “liberal research regimes adopted by developmental states and marked by freedom from government oversight, and illiberal laboratory cultures imported from Germany and marked by all-powerful lab directors and their vulnerable underlings.” Suggest working such a statement in the text so readers don’t have to conduct their own lit searches just to understand a sentence.
Methods, lines 109 or so. How a direct visual inspection of 20,621 papers that contained images of Western Blots was conducted might be explained more. This was manual? In animal behavioral testing the term direct visual measurement is sometimes used with automated video capture, but this apparently was all manual? This is from Bik et al (2016) but in a quick read through of that article, I didn’t see complete explanation of how there either. I assumed it had to have been automated in some way, or else it would have taken months to manually inspect 20K papers, with the inspector doing little else. This is remarkable.
The absence of supporting data seems a major limitation that would be best acknowledged in the text. I saw the comment from last author EMB that they didn’t want to list potentially problematic papers by name in absence of separate investigation. I suspect also the authors don’t want exposure to potential defamation of character claims. This is understandable, it’s an irony that a paper on scientific integrity is unwilling to show its data, since transparency and data sharing are hallmarks of efforts to improve all science disciplines. Still, I’d suggest that this work would be more persuasive if it had supporting data. Consider if the wording could be toned down and data shown. At the minimum, a frank discussion of this limitation within the body of the final manuscript might head off tedious criticisms later.
Methods descriptions: These are a bit terse. Since readers come from all disciplines and non-specialists, good to not assume too much statistical understanding. I don’t use the odds ratio test, and don’t really want to have to go look it up and try to figure out when it is appropriate, best practices, how to interpret, limitations, etc. just to evaluate the figures and results. Explanation of these sorts of things, and why this particular test was used would be helpful in the methods.
Figures: Unlabeled vertical axis with strange scale increments took a double look. Error bars for the 95th% CI that do not overlap 1 are considered meaningful? Suggest a more explicit explanation on this.
Looking forward to more contributions from these authors, and more importantly, serious discussions of how to change underlying incentives and attitudes leading to these problems.