|
|
||||||||
University of Massachusetts Medical School, Worcester, Massachusetts, USA
1 Correspondence: University of Massachusetts Medical School, 377 Plantation St., Worcester, MA 01605, USA. E-mail: thoru.pederson{at}umassmed.edu
DAVID KAPLANS "Statistical analysis in NIH peer reviewidentifying innovation," proposes that NIHs Center for Scientific Review adopt the use of certain statistical concepts in evaluating grant applications. The authors key point is his belief that this practice would enhance the detection of innovative proposals. The hypothesis that the present review system is not ideal for detecting (or approving) innovative proposals is actually not easy to document (nor does the author attempt to do so). Nonetheless, his proposal can be evaluated on its intrinsic attributes.
Although he does not specify this, it is understood that the type of applications being discussed are investigator-initiated proposals. Applications requested by NIH through RFPs are by definition narrow in research focus and therefore have a topically more homogenous panel of reviewers. Large program project grant applications, including ones with investigators at more than one institution, also have features of review that differ from standard R01-type applications. Let us assume the author is mainly discussing the latter.
One of the authors initial presumptions, viz. that a given application is only reviewed by two or three individuals, is not true. As he knows, the entire IRG, typically 1518 members, is charged with reviewing each application. While those not assigned as primary reviewers do not study an application in nearly as much detail, they do look through it as the reviews are recited, and non-reviewer members of the IRG almost always participate in the discussion. The key feature of the system is that when each IRG member writes a score, it is done on the basis of what they have heard the reviewers say, what they have heard in the discussion (including in some cases their own comments), and, finally, what they think themselves (in many cases quite independently of the read reviews). As any IRG Executive Secretary knows, the scores usually display considerable variation (and thus variance). One exception used to be the "universally agreed dud"an application everyone at the table felt was awful. Few of these are seen today (and even the pretty good ones are not reviewed due to the current triage system). The other exception is the application from a true superstar (track record-wise) who, in addition, has written a brilliant proposal with no perceived holes, and which has extremely high feasibility and potential importance (a consensus distribution as shown in Fig. 1, except hovering around 1.1).
Notwithstanding the full participation of the IRG and the impact of that on the variance and mean, the author is surely correct that increasing the number of primary reviewers to something like 30 would always make the mean more "meaningful" (indeed, this is an inescapable fact of statistics). One reaction readers might have to this proposal is that it is almost certainly too radical a shift from present CSR procedures to ever be adopted. In my view, it would be unjust to penalize the author for this anticipated reaction.
My major reservation about this proposal lies in the authors contention that the so-called "controversy" distribution (Fig. 1, lower right) is necessarily indicative of an innovative application. Such extreme division of opinion could arise from a legitimate difference in views on the quality of the applicants research, its perceived importance, or even a reflection (in admittedly rare circumstances) of an applicants unlucky draw of a relatively large number of enemies on an IRG. I simply do not find suasion in the authors claim that this pattern is synonymous with innovativeness in an application.
There are additional reservations. The very opening is somewhat muddy in regard to what exactly is being proposed and the statistical concepts are brought in without adequate exposition. The text dealing with consensus as a historical aspect of science is rather distracting and has some inaccuracies as well. (For example, "Powerful new ideas always threaten the current understanding"this is definitely not "always" the case. Mario Capecchis proposal was not rejected because the reviewers were afraid that more would be learned about gene expression during developmentit was most likely not funded because the reviewers werent confident the idea would work technically. And personally, I have never found the radioimmunoassay story to be that unique, given that it was accepted relatively quickly compared with discoveries made by people like Oswald Avery and Barbara McClintock).
There are two other minor points to be made. In Fig. 2, I assume that "Variance/Negative Kurtosis" signifies a larger quantity of both moving to the right. Strictly speaking, this is not an x-axis and may confuse the reader (especially because the kurtosis is negative and thus becomes more negative as one moves to the right). Perhaps "Lower Variance/Negative Kurtosis" could be put under the two left panels and "Higher Variance/More Negative Kurtosis" under the right two panels. Also, the vertical line through the drawing suggests a sharp discontinuity between the "consensus" and "innovative" patterns, which is not the case. In general, this figure attempts to portray the authors message in a pseudo-Cartesian form, which may well be more confusing than simply leaving his proposal as text.
In summary, I am somewhat skeptical of the authors central contention that the so-called "controversy" distribution equates with innovativeness in a proposal. I do applaud this busy investigator for his service on the NIH Peer Review Advisory Committee and for his obviously sincere and motivated effort.
| FOOTNOTES |
|---|
Related Articles
FASEB J 2007 21: 305-308.
FASEB J 2007 21: 311.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |