Review for "Thought experiment: Decoding cognitive processes from the fMRI data of one individual"

Completed on 28 Jun 2018 by Krzysztof Jacek Gorgolewski .

Login to endorse this review.


It’s been a while since I read a manuscript as inspiring as “Thought experiment: Decoding cognitive processes from the fMRI data of one individual”. The authors had very limited resources yet managed to design a study shedding light into ability to decode different cognitive states in a single individual. The study design has all the hallmarks of excellence: preregistration of hypotheses, out of sample prediction, as well as transparency of methods via code and data sharing. The dataset they provided along the manuscript is likely to become an important benchmark as well as a valuable educational tool. Conditional on a few minor fixes I wholeheartedly endorse this manuscript.

Comments to author

Major issue:

- I applaud the authors for sharing code, statistical maps and preprocessed data. It is really setting the example in the field. I was especially impressed by the detail and clarity of the notebooks. However, I also feel that there is a huge potential for this dataset becoming an important part of many neuroimaging courses as well as a benchmark for other decoding methods. For this to happen one needs to share the raw data. I strongly encourage authors to format the raw data in the Brain Imaging Data Structure (BIDS) and share it on a domain specific repository such as or FCP/INDI.

Minor issues/comments:

- Abstract: the idea of human experts making predictions is not explained well - I was confused when I first read “four blinded teams”

- I found the term “superordinate domain” more confusing than helpful, sticking to just “domain” or “cognitive domain” might help with accessibility of the text

- Please provide the full text of the instructions given to the participant.

- It is difficult to follow the study design. It would be very helpful if you included a figure depicting the design with length of blocks and runs as well as the sequence of all different categories.

- Authors z-scored the data across the train and test sets. This procedure causes information leakage. Z-scoring should be done separate on train and test sets or rescaling derived from the train set should be applied to the test set.

- The purpose of having human experts making predictions was not motivated well. Was the idea that humans might be able to make better predictions or combine sources of information thus suggesting improvements in automatic methods are possible?

- Page 4: “similar blocks” repeated sentence

- What was the motivation for including the anatomical terms from NeuroSynth if the goal of the prediction was to perform cognitive decoding?

- Page 4 section 3.1: missing full stop after “Fig 4”

- Authors decided to use simple time shift and a boxcar function to average z-scored volumes. What motivated this decision in contrast to HRF convolution, GLM modelling (with explicit modelling of instruction TRs) and contrasts?

- Page 4 “Feature selection for correlation analyses” it’s not clear if the prediction accuracy is for domains or content

- Figure 1, 5 and 9: labels are hard to read due to overlapping, caption would benefit by explaining that colors correspond to K-means derived clusters

- Page 7: the fact that using only most significant voxels gave you better prediction accuracy is quite surprising and at odds with Sochat et al. 2015 ( Would be interesting to hear your opinion why it is so.

- Page 7: it would be good to express the prediction accuracy of the Neurosynth method in the same way as as the correlation method so the reader could compare them easily

- The manuscript would benefit from a figure or table directly comparing the domain and content prediction accuracy made by the correlation, Neurosynth and human experts.

- Is there any information available on what strategies each team of experts used?

- Figure 7 and 10 are missing a colorbar.

- If I understood the manuscript correctly correlation method was able to predict domain slightly better than any team of human experts, but human experts predicted content better than any automated method. Even though not mathematically impossible this result is counter intuitive and worth discussing.

- “Analysis of fMRI data could benefit from splitting the data” - is this really supported by your analysis? Wouldn’t a one sample t-test also capture consistency of BOLD response across time?

- “If, on the other hand, the activity pattern for the language task resembled a default mode network, one would conclude that the task was not performed at all and not use the map to determine the degree of lateralization.” This is a tricky argument. First of all the reason why preoperative mapping is necessary is due to abnormal anatomy of patients (because of slow growing tumors for example). This makes assumptions about normal/expected activation patterns hard to uphold. Furthermore, the patterns might not be as reliable as you say - consider the secret rest run during which DMN pattern was not present.

- “The present study showed how brief periods of self- generated thought can be decoded regarding the superordinate neuropsychological domains involved.” The use of “self-generated” is misleading here - only the secret run was truly self-generated and that one you were not able to decode. The other ones were following instructions and thus should not be considered self-generated (at least this is how this term is used in the mindwandering literature).