The preprint entitled, "Trait components of whole plant water use efficiency are defined by unique, environmentally responsive genetic signatures in the model C4 grass Setaria" investigates components of water use efficiency in 189 genotypes of a recombinant inbred line population using a controlled environment automated phenotyping system that controls water content of pots and measures plant size using imagery. Overall, the paper is well-written, methods are largely satisfactory, and conclusions are valid. However, there may be some gaps in explanation of the experimental design and while it's understandable this is new system generating a ton of data, I don't feel enough is being done to use the time series data, which focuses on making daily calculations. Further discussion of the SLOD method would be appreciated as it may account for some time dependence. These are explained more below.
Methods are quoted largely from previous related manuscripts, which is fine with me. However, the number of replicates was not reported. Based on the number of genotypes, 189, and the stated number of individuals, 1138, we can assume there were 3 reps or blocks within which the water treatment levels were randomized along with the genotypes (although there is a remainder when dividing 1138 by 189 - why?). However, stating the number of replicates in the methods would be standard. One sentence is confusing, though, "This strategy effectively…within both treatment blocks." Does this imply only two blocks, one for well watered and one for water limited? In this case, there would only be one replicate of WW and WL, so impossible to do statistical comparisons of the water treatment levels. Personally, I feel some type of schematic of physical layout is always necessary to ensure correct description of design.
Line 171: I believe multiple linear regression is a more common term, multivariate would imply multiple dependent variables but I think you only had mass. Given the confusion you should also specify if fresh and dry weight were estimated simultaneously (multivariate) or separately (multiple). As a side note, you could try models that include the interactions of the predictors, which is the same as multiplying predictors together to create a new term.
Line 224: Maybe I'm missing something, but I don't see how calculations every other day limit replication? Are you saying you use the values for each day as replicates? Is that accurate? Replication should be the statistical replication, which I think is 3. That would seem an odd choice to me, based on my understanding. The explanation of equation 1 partially answers this, but I'm not familiar with that approach. Has it been used elsewhere other than in your own work?
Line 257: Seems like doing analysis for each time point individually is sort of the obvious way, but I'm not sure it leverages the power of the timer series data the most? Are there not more complex models that include time series for QTL analysis? How to more effectively handle time series data will be a major consideration for the future of phenomics.
Lines 293-315: Redundant with methods, which should not be necessary in the Results section. If some of the information is not in methods, put it there and delete from here.
Line 209: The talk of both treatment blocks is confusing, as described earlier for statistical design. I think you have three blocks with water and genotype randomized within (at least I hope so).
Line 328: For discussion, is it possible to update the water weight during the experiment using the biomass estimates?
Line 331: Have you considered non-linear curve fitting? Loess shows it's variable during the life cycle, but also looks like a saturating curve might approximate. Then, the parameter estimates of the curve could be new traits.
Figures. Put legends on the figures, not just in text. For a good plot, you shouldn't need to read the caption.
-Larry M York, Noble Research Institute