In this nice data paper, the authors provide a deep Illumina metagenomic and metatranscriptomic data set, assembly, and high-level analysis of a biogas reactor microbial community.
The paper is well written and the data seems to be of good quality, based on their reporting. The paper is also highly reproducible, coming with a version-controlled workflow (in a Makefile), on github, with an associated Dockerfile; using Docker is a great idea but is imperfectly executed still (see below).
All of the raw data is deposited publicly and was available to me.
The authors should include the number of reads that mapped back to the assembly at their 1 kb cutoff, as this would help us gauge the inclusivity of the assembly for both the DNA and RNA reads.
Minor quibbles --
I cannot evaluate the claims of priority. Isn't it sufficient to say deep Illumina metagenomes are rare and leave it at that?
For the assembly, how were these parameters picked, and is there any evaluation of sensitivity or specificity?
The GitHub and Docker URLs in the PDF have an ] at the end that blocks just clicking on them.
I would suggest deprecating the Docker discussion a bit; it didn't work for me. I also have other suggestions for modification. Details below. I might suggest putting a tag on the repo so that you can link to the last time you actually ran the Docker container.
docker run -v /path/to/output/directory:/home/biogas/output 2015-biogas-cebitec
didn't work, presumably due to docker version upgrades; I needed to put metagenomics/2015-biogas-cebitec before the last bit.
The raw data is downloaded into the docker container, which can be a bit of a problem, because on AWS (where I tried to run this docker container) the containers were stored on on the root disk. There are two possible solutions that I can see --
- do as I did in this blog post, and put the data on the host disk and then mirror it into the container:
I think the first solution works better, but in any case, something needs to be done about putting large amounts of data in an opaque and potentially trashable container that consumes all the available disk space as part of the make file :).
Third, and most troublingly, my attempt to run the docker container failed with a missing 'unzip' command. This is probably easily fixable but does indicate a mismatch between the workflow and the container.
In sum, apart from minorly revising the docker container and/or discussion, and providing mapping rates, this looks great!
Quality of written English Acceptable
Declaration of competing interests I declare that I have no competing interests.
Authors' response to reviewers: (http://www.gigasciencejournal.com/imedia/3232474601750548_comment.pdf)