Open preprint reviews by David Baltrus

Preprinting Microbiology

Patrick D Schloss

(Stepping up to break the ice and comment formally here instead of just on twitter)

1. I think the Ben Schwessinger experience described here (https://blushgreengrassatafrid... is worth a mention for a couple of different reasons. It's the first time that I can recall that a journal had to step up and actually deal with a situation where scooping by preprint (or because of preprint) may have occurred. As such the policy at PLoS has been refined. When things change, there are always the uneasy situations like this that force people to make difficult (and sometimes wrong) decisions

2. I think it's also worthwhile to mention sites like PubPeer. Public reviews and comments on preprints are part of overlapping discussions but aren't necessarily the same discussion. Feels like there's something to be said about that although I'm not sure what that is right now.

3. My whole take on "but it's not peer reviewed" is that those that will be reading the preprints in order to cite them are well qualified as reviewers themselves. If you don't trust the paper or don't like it, don't cite it. If you read through the paper and don't see fault with experiments, why not cite it? We all have blindspots but it's not like we don't review papers all the time and critique them anyway even if they've been through peer review.

4. I think we should make a greater effort to write positive comments on preprints and not just use this as a forum for review. Positive comments can help those who maybe aren't in the literature figure out which preprints are great and which have holes (by their lack of positive comments). I see this as important if preprints are going to be written about by the popular press and digested by those who aren't necessarily experts. We as experts need to endorse good papers just as we will trash the bad papers.

5. I had the first preprint in biorXiv under Microbiology, why are you taking this achievment away from me Schloss?

6. Looping back on number 4...if we are going to be the ones reviewing grants and papers and we see a preprint cited, we can actually review this work. Some are going to use it to get around page limits but, like you point out, we as scientists should be pretty good at snuffing shoddy and rushed work out and so that this could also theoretically backfire on the person trying an end run on page limits. Sure it may give you more space to write, but if you do a terrible job you may otherwise poison the impression of a grant reviewer that might otherwise like your grant. I'm tired of having to see (in press) or (in prep) when work is cited in a paper or grant. If it's an important enough story for the grant, I want to be able to read the story myself and preprints allow this.

7. There are different costs and benefits for preprints depending on the field you are in and the point in your career. I don't know that we've figured this out at all yet or if there is a great answer across the board. It seems as though the pop gen fields have taken to preprints more than other fields, but in my experience evolutionary biology in general tends to be less "scoopy" or "eat their young" than other fields. I'd like the world to exist where everyone can freely post preprints and get credit, but I can see this going horribly wrong in fields that are much more competitive and potentially containing more selfish PIs. I mean this not as a positive or negative commentary on different fields, but it's quite obvious to me that some fields are more cutthroat than others for a variety of reasons and the cost/benefit analysis for preprints in these fields will be different.


Couple more points that I think are worth mentioning (and sorry if they are in there already, read it yesterday):

8. Worth mentioning that preprints can be an important "minor league" for journals to scout for papers from and that the more that we can show this happening the more preprints will benefit

9. There's been discussion about bioRxiv not "allowing" methods papers (there was a twitter scuttlebutt a few weeks ago). Their stance is that (I'm putting words in their mouths) they don't just want to publish step by step protocols that have been published elsewhere (like in Sambrook). They will publish step by steps for new protocols though. Would be good to point out that ProtocolsIO is a "live" preprint server (effectively) for protocols though. I like the interface there better for describing protocols because it's constantly growing and being annotated by those doing the protocols and with nuances for other organisms.

10. Jeff Ross-Ibarra (and probably others) have started using preprints during their journal clubs and leaving comments at the various venues. I think this is a great way to help students learn to review, but also want to highlight that it's important to think about protecting the student's identities when commenting on preprints until this is a commonplace thing.

11. It would be nice if comments left on preprints would populate NCBI comment sections. I don't know if this will ever happen, but it would be nice to know that others have talked about papers before they are published.

12. There are also starting to be "review communities" (like PCI Evolutionary Biology https://evolbiol.peercommunity... which will evaluate preprints and could (could!) act as reviewers for forums like mSphere Direct or otherwise if we let them.

show less


Swabs to genomes: a comprehensive workflow

Basic reporting

This manuscript from Dunitz et al. covers a lot of ground, as it takes novice researchers from sample isolation to genome sequencing and phylogenetic classification.

In fifth grade I had to write directions on how to make a PB&J sandwich. My directions were 10 pages long and still weren't detailed enough. What I learned from this (at first glance) simple exercise was that, no matter how easy a task seems, it is incredibly difficult to write step by step instructions that all can follow. In writing workflow papers, especially ones that cover so much ground, the authors are inevitably going to have to sacrifice nuance for clarity and descriptiveness for brevity. After reading this manuscript through a couple of times, while there are certainly places where more description could be possible, the authors do a pretty good overall job at capturing the spirit of the analyses and providing a workflow that advanced high school classes could theoretically use. There are a couple of places that more depth is warranted (see below), but overall they do a pretty good job balancing thoroughness and readability.

All that being said, I think it would benefit the manuscript greatly to set up a virtual machine on iPlant (www.iplantcollaborative.org) that contains sample data sets and is set up to run most of the programs in this workflow. This virtual machine would be freely accessible to all (so long as iPlant remains accessible to all) and would provide a means to run the workflow without having to install software, get permissions, etc....Moreover, versions of this software would be frozen on this virtual machine so that anyone looking to repeat these analyses would not have to worry about changes in versions. It's a bit of work to set up one of these virtual machines, but they will be around forever and can be accessed with the click of a link. It just seems to me like it would be good to set up a one stop interface where those who were interested could forever have preprogrammed access to all the programs and analyses described in this manuscript. It's a great resource and seems like a perfect fit for this kind of workflow.

Experimental design

This section isn't quite applicable to this manuscript, but all of the described analyses and programs make logical sense. Following these directions would certainly give a bioinformatically novice user a pretty straightforward path to genome preparation and assembly.

Validity of the findings

Again, this section isn't quite applicable to this manuscript, but all of the described analyses and programs seem like they will work. I will admit to not checking every single link that the authors reference, but from what I've seen this is as good an introductory description as you can get to bacterial sampling and genome analysis.

Comments for the author

The authors have a readable style of writing, but throughout I thought there were some phrases that would be better left out:

Abstract: "has become almost trivial" I would change this wording simply because "almost trivial" just reads a bit off to me in this context (especially because having done these analyses, they are never trivial".

Line 5: "and difficulty" I don't think you need these two words. IMO it's the drop in cost that is the main driver, and the level of difficulty hasn't changed, it's just been redistributed to bioinformatics.

Line 27: "relatively cheap sequencing" better as "cost efficient sequencing" or something slightly different. The words relatively cheap read too colloquial to me here.

LIne 30: "create a large activation energy" again...reads too colloquial to me.

Line 106: "It is customary to offer a small favor or gift" Please leave this line out. I understand the sentiment, but it's really weird to read in a manuscript and hopefully folks have enough humility to be thankful for the help.

Line 128: "Will often result in the isolation of pathogens" better as "can preferentially isolate human pathogens"

Line 142: Put in a temp for room temperature (given how detailed other parts of the manuscript are"

Line 152: Which online tutorial?

Line 152: "or this paper by Baldouf" better as "or Baldouf [5]."

LIne 178: It strikes me that if you are going to mention monophyletic clades, that a definition of polyphyletic for comparison sake is warranted

Line 180: "going back in time" is a bit of an unclear statement for the intended audience.

Line 183: "measure how much a particular part of a phylogenetic tree" better as "measure how well a node is supported"?

LIne 191: "sterile swab"...how can you obtain or ensure that the swab is sterile?

Line 193: "for 1-3 days" better as "until colonies of interest appear"

Line 204: "can be easily found online" better as "can be found online"

LIne 224: delete "originally developed by Fred Sanger and now"

LIne 226: "needs DNA" better as "requires DNA"

Section 6.3: You should describe the entire PCR program (annealing time and extension times, number of cycles, etc..."

Line 266: You should elaborate on "all controls behaved as expected"

Line 281: what about mentioning science exchange (www.scienceexchange.com) as a way to shop around for sequencing centers and compare prices

Line 322: seems like you need quotation marks around "upload the data without well mapping" button

Line 360: please reword "ready to go"

Line 463: "fancy" better as "complex"

Line 502-514: What about the possibility of getting human contamination from outside the sample? What would that look like? Seems like an important thing to mention given who would be using this workflow.

Line 564: what about mentioning the recent preprint showing 8$ library prep from the Baym et al? http://biorxiv.org/content/early/2015/01/16/013771

Line 708: make sure to mention to not copy/paste the carriage return either

show less