Conservation Letters’ new policy on reporting confidence intervals (CIs) with p values is one among many recent calls for change in statistical reporting practices. It sits in line with the recently developed Tools for Transparency in Ecology and Evolution (TTEE; Parker et al., ; TTEE_Working_Group, ), which are themselves based on the interdisciplinary Transparency and Openness Promotion Guidelines (Nosek et al., ). Complete and transparent statistical reporting is essential to building a reliable evidence base for practice, and for accumulating and synthesizing scientific knowledge. Conversely, undisclosed analysis practices such as cherry picking “significant” results and p‐hacking (e.g., making decisions about sampling stopping rules, treatment of outliers, transformations, and/or analysis techniques based on whether results meet or fail to meet a statistical significance threshold) have been directly linked to the inability to replicate many important, published experimental effects (Fidler et al., ; Forstmeier, Wagenmakers, & Parker, ; Simmons, Nelson, & Simonsohn, ).
Given Conservation Letters’ focus on publishing science of direct relevance to policy and practice, it is particularly important that the interpretation of statistical analyses and the conclusions supported by this are transparent. From April 2018, Conservation Letters will be requiring: (1) that any article reporting p values must also report 95% CIs in the text and in figures and (2) that all figures that include data used in statistical analyses (whether in the main text or Supporting Information) must show error bars on the figure. Where possible, these error bars should show 95% CIs, but in all cases authors must be explicit about what the error bars show.
The Conservation Letters’ policy encapsulates seven important messages about CIs and p values, which we explicate below.
While based on the same basic information as p values, CIs make uncertainty in parameter values more explicit than do p values alone. For example, they have been shown experimentally to reduce the misinterpretation of statistical nonsignificance as “no effect” and otherwise improve interpretation (Fidler & Loftus, ).
A CI indicates a parameter's precision, a concept akin to statistical power. A longer CI indicates less precision; a shorter interval indicates relatively high precision. CIs indicate a set of plausible values for the parameter, with longer intervals encompassing a wider range of plausible values. Figure illustrates possible effect sizes with relevant 95% CIs, relative to levels that are considered important and not important for five hypothetical results. CI‐A shows a highly imprecise result that while not statistically significant (interval it includes zero) is wide enough to also include values in the ecologically or theoretically important range. CI‐B shows a statistically significant result (interval excludes zero), but is still not precise enough to distinguish between ecologically or theoretically important and unimportant values (we discuss “importance” further in message No. 6). CI‐C shows a more precise nonsignificant result; the interval includes zero and is sufficiently narrow to rule out other important values. CI‐D is similarly precise, but at the other end of the spectrum; a result that is both ecologically and statistically significant. CI‐E demonstrates how a statistically significant result can fail to be ecologically important.
Simply noting whether zero (or some other null value) is inside or outside a CI ignores other important information CIs have to offer, most notably that the interval width is a guide to the precision of the result. It also fails to recognize that intervals rarely have a uniform distribution. Values closer to the middle of the interval are (usually) more likely to represent the parameter than those toward the edges.
In many instances, authors who report error bars fail to specify precisely what the bars represent. Error bars in figures should clearly identify whether the bars represent standard deviations, standard errors, or CIs, and the source of that variation (e.g., variation among vs. within sites). When reporting CIs, always ensure that the level of the confidence (e.g., 95%) is noted.
If you are also reporting the outcomes of null hypothesis significance tests (i.e., p values), below are some further important messages.
Often, there are practical constraints on sample size, and therefore statistical power. It will not always be possible to increase sample size or to otherwise achieve a statistical power level of 80% or more. This does not necessarily mean the research is not worthwhile. It is essential, however, that even when power is low, it is calculated (through an a priori power analysis) and reported for any hypothesis testing result.
The vast majority of papers in conservation science fail to acknowledge the prospect of type II (false negative) errors. Independent calculations suggest that the average statistical power in ecology and related research is low. For example, Jennions and Møller () estimated the average power of behavioral ecology to be approximately 40% to 47% for medium (typical) effect sizes. Smith, Gammel, and Hardy's () estimate was even lower as 23% to 26% (Smith et al., ). This means that the chance of detecting a real effect of medium size in this field is considerably worse than flipping a coin. Parris and McCarthy's () study of the effects of toe‐clipping of frogs revealed similarly low power; at best 60% for a large effect of 40% population decline. If power is not reported, it is safest for editors, reviewers, and policy makers to assume it is low.
Failure to reject a null hypothesis does not provide evidence that the null hypothesis is true. When power is unknown, statistical nonsignificance is uninterpretable. Although this advice may seem obvious, it is unfortunately common for authors in conservation science to present statistical nonsignificance as evidence that the null is true. For example, in a study by Pavone and Boonstra (), the average lifespan of toe‐clipped voles was not significantly different from that of control animals; toe‐clipping was interpreted as having no effect on survival. However, the authors should also have noted that the size of the effect was also not significantly different from a 40% reduction in lifespan due to toe‐clipping, a potentially large impact. So while they were unable to rule out no effect of toe‐clipping, the data were also insufficient to rule out a large effect. Misinterpreting statistical nonsignificance as “no effect” can lead to failures to act to protect biodiversity.
Many studies in conservation science and ecology equate statistical significance with ecological or theoretical importance, as the example above illustrates. Unfortunately, statistical and ecological significance have little to do with one another. Broadly speaking, the effect size measures the magnitude of the change in a parameter that one observes, or expects to observe, from a treatment or exposure to a causal variable. A study result is compelling evidence of an effect only if the effect is large enough to be ecologically or theoretically interesting and unusual enough not to have arisen by chance.
The messages above are not only relevant to researchers conducting t‐tests and ANOVAs as their core analyses. We have on occasion, anecdotally, heard colleagues and peers claim that are involved in modeling, not null hypothesis testing, and as such do not need to consider statistical power or effect size. On closer inspection, many such cases do involve null hypothesis testing as part of a larger procedure, for example, parameters selected for inclusion in models on the grounds that they reached p < .05, or goodness‐of‐fit statistics later subjected to statistical significance analysis. Another often overlooked instance of null hypothesis testing occurs in tests of statistical assumptions (e.g., homogenous variance). Such tests may return nonsignificant results which form the basis of decisions about further analysis, for example, decisions to combine groups of data that show “no difference.” It is important to recognize these instances are null hypothesis testing, and as such, require power calculations and all the same considerations as tests of primary hypotheses.
A number of resources are available to support implementation of Conservation Letters’ policy. Cumming () and Cumming and Calin‐Jageman () provide useful information on reporting and interpreting CIs. Cumming also provides explanatory YouTube videos that include visual aids and simulations to improve statistical inference:
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Conversely, undisclosed analysis practices such as cherry picking “significant” results and p‐hacking (e.g., making decisions about sampling stopping rules, treatment of outliers, transformations, and/or analysis techniques based on whether results meet or fail to meet a statistical significance threshold) have been directly linked to the inability to replicate many important, published experimental effects (Fidler et al., ; Forstmeier, Wagenmakers, & Parker, ; Simmons, Nelson, & Simonsohn, ). If you are also reporting the outcomes of null hypothesis significance tests (i.e., p values), below are some further important messages. 4 MESSAGE NO. 4: STATE THE SAMPLING STOPPING RULE ASSOCIATED WITH YOUR HYPOTHESIS TEST Often, there are practical constraints on sample size, and therefore statistical power. A study result is compelling evidence of an effect only if the effect is large enough to be ecologically or theoretically interesting and unusual enough not to have arisen by chance. 7 MESSAGE NO. 7: LOOK OUT FOR LESS OBVIOUS INSTANCES OF NULL HYPOTHESIS TESTING; MESSAGE NOs. 4 TO 6 APPLY TO THEM TOO The messages above are not only relevant to researchers conducting t‐tests and ANOVAs as their core analyses. On closer inspection, many such cases do involve null hypothesis testing as part of a larger procedure, for example, parameters selected for inclusion in models on the grounds that they reached p < .05, or goodness‐of‐fit statistics later subjected to statistical significance analysis.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 School of BioSciences, University of Melbourne, Australia; School of Historical and Philosophical Studies, University of Melbourne, Australia
2 School of BioSciences, University of Melbourne, Australia
3 The Nature Conservancy, South Brisbane, Australia; University of Queensland, St. Lucia, Australia