This paper presents a new strategy for annotating the human genome into different “chromatin states” (quiescent, promoter, enhancer etc), and visualizing the results. The new strategy involves first applying a segmentation algorithm, here the Segway algorithm, separately to each available cell type. The Segway algorithm uses molecular data on a cell type (eg histone marks, DNase hypersensitivity) to segment the genome into what might be called ``Segway states”, essentially by clustering together regions that show similar patterns in the molecular data. The paper then uses a random forest classifier to map the Segway states in each cell type to a set of canonical annotations (quiescent, promoter, enhancer etc), based on how well the data in each Segway state match these canonical annotations from previous human-guided annotations. The result is that each cell type produces an annotation of the genome that uses the same set of canonical annotations, which facilitates pooling the results across all cell types.
A key innovation is to automate the mapping from Segway states to canonical annotations, using a random forest. In previous applications of these kinds of method this mapping step has been done by human interpreters. The automation of this step greatly facilitates analysis of many cell types (here 164). Another innovation is the introduction of what the paper calls a “functionality score” to visually weight some of the canonical annotations more heavily than others, depending on how evolutionarily conserved those annotations tend to be. This results in visual upweighting of promoter-annotated regions compared with other annotations for example.
The automated interpretation strategy is simple and potentially useful. And the production of large scale functional genomic annotations, and ways to visualize them, could certainly be a useful resource for the community. However, the utility of the resource is inevitably dependent on the quality of the final annotations, and the main limitation of this work is that it provides very little data to support or validate the quality of the annotations. The enrichment of GWAS hits within regions with high functionality score is the only objective assessment provided, and this assessment is fairly cursory.
To be fair, objective assessment of annotations is not easy. Nonetheless, without this it is hard to judge how useful the annotations are. The GWAS comparisons could usefully be expanded. For example, comparisons with other annotations would help: GWAS hits are also enriched in high DNase regions and near genes; do these annotations have a higher or lower enrichment than functionality score? Can you support your claim that “the functionality score (and the Segway encyclopedia) is the most effective tool for understanding what the function of a known important variant (such as a disease allele).”, by giving an example? Perhaps data on eQTLs could also help support the usefulness of annotations - for example, in cases where annotations differ among cell types, are these predictive of differences in eQTL behaviour? Or perhaps it could be demonstrated that differences in annotations among cell types are predictive of expression. In Figure 4a some cell types have the gene body annotated as “transcribed” whereas others have it annotated as “quiescent”. Is this reflected in differences in expression among cell types? It would greatly improve the impact of the work if the quality and utility of the annotations were better supported.
- The various ways that the paper switches between terminologies like “labels”, “states”, “annotations” etc is quite confusing. It would greatly help presentation to be consistent about the terms used. For example, to use “states” to refer to the output of Segway, and “annotations” to refer to the canonical annotations (promoter, enhancer, transcribed etc).
- The paper introduces modifications to the existing Segway algorithm, which the authors presumably believe to be improvements, but no data is provided to support this. For example, almost no motivation is provided for switching to a mixture of Gaussians from a single Gaussian, except that the single Gaussian “artifactually divided” weak transcription vs. transcription. Can you demonstrate this? Is it really an artifact? It would seem to us that there will in practice be a quantitative continuum of transcription states and not simply transcribed/not-transcribed. The mini-batch idea makes sense, but again it would seem helpful to demonstrate the benefit (with only 25 iterations it seems that you will use at most 25% of the data; does more iterations help?).
The definition of “Encyclopedia segment” seems extremely subjective (the parameters were chosen so that “resulting segments matched our intuition about the size and frequency of functional regulatory elements”), and consequently of questionable utility.
- While annotating each cell type separately is certainly convenient, it also has some disadvantages. It will result in less accurate annotations than a joint approach when the annotations are actually consistent across cell types. And any errors in annotations will tend to artificially inflate differences among cell types, which users may be tempted to overinterpret. Some discussion of these issues, and the difficulty of assessing significance and reliability of differences, seems in order.
- The claim that “the functionality score (and the Segway encyclopedia) is the most effective tool for understanding the function of a known important variant (such as a disease allele).” needs support.
- The internet Encyclopedia interface did not work for us: the first example we tried was chr10:73576055-73611126 and this gave an error “Error: Traceback (most recent call last): File "functionality_score_plot.py", line 108, in ann_labels_frames = ann_labels_mat.reshape(num_frames,resolution,len(ann_celltypes))[:,0,:].astype("int8") ValueError: total size of new array must be unchanged”. Did we do something wrong or is this a bug?
- The Encyclopedia interface would also benefit from being more flexible. For example allowing to search by gene name, or putting in positions in one string (chr10:73576055-73611126) rather than three separate boxes.
- The schematic pipeline in Figure 1 is hard to follow. What does "compare to known phenomena" mean and what are "label features" here? Why do you use Exon as an example annotation when that is not one of the annotations you consider? (And why did you decide not to annotate exons as exons? That would seem like a useful annotation.)
- The definitions of LowConfidence vs. Unclassified and the distinction is unclear.
- The description of the categories in section 2.3 is a quite long and largely restates previous findings.
- It seems that the annotations chosen are not mutually exclusive. For example, the first exon is highly enriched with promoter, enhancer, and bivalent categories (figure 3a), but is also presumably transcribed.
- The classifier used 17 features, but why we have only 16 features In figure 2b?
- How are the columns in figure 3b ordered?
- Excluding RNA-seq data because you are interested in transcriptional regulation seems like a weird justification (p3).
- P4, it is unclear where the number 294 came from here.
- How was panel a of Figure 3 computed?
- If the functionality score is the 75th percentile of a distribution, then it should just be a number. How does it itself have a distribution (Figure 3d)? Is this the distribution across the cell types? Please clarify.
- “GWAS cannot disentangle genetic linkage” - this seems misleading (by the classical definition of linkage) because LD actually decays much faster than linkage, and indeed its increase resolution is one of the advantages of GWAS over genetic linkage analysis.
This review is signed:
Kevin (Kaixuan) Luo