The following list of papers discussing null hypothesis significance testing as a method of inference covers the period 2001-2011. The list is intended to supplement previous compilations through 1997 by Marks Nester at http://warnercnr.colostate.edu/~anderson/nester.html, and another through 2001 by Bill Thompson at http://warnercnr.colostate.edu/~anderson/thompson1.html. The list includes a few papers prior to 2001 that did not appear in either of these two previous compilations. The list may not be complete, either because a paper was simply missed, or because the primary focus of a paper did not seem to be relevant to significance testing. Internet links are provided where available. Abstracts are usually available at no charge, but some sites require a fee or subscription for the full text.
Anderson, D. R. (2008). Model Based Inference in the Life Sciences: A Primer on Evidence. New York: Springer.
Anttonen, R. G. (1970). The significance of the null. The Journal of Educational Research 63(10): 438-440.
Balluerka Lasa, N., A. I. Vergara Iraeta, et al. (2009). Calculating the main alternatives to null-hypothesis-significance testing in between-subject experimental designs. Psicothema 21 (1): 141-151.
Balluerka, N., J. Gómez, et al. (2005). The controversy over null hypothesis significance testing revisited. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 1(2): 55-70.
Barber, J. J. and K. Ogle. 2014. To P or not to P? Ecology 95: 621-626.
Bayarri, M. J. and J. O. Berger. 2004. The interplay of Bayesian and frequentist analysis. Statistical Science 19 :58-80.
Beaulieu-Prévost, D. (2007). Statistical decision and falsification in science: Going beyond the null hypothesis. In Cognitive Decision-Making: Empirical and Foundational Issues, ed. Hardy-Vallee, B. Cambridge: Cambridge Scholar Publishing.
Beninger, P. G., I. Boldina, and S. Katsanevakis. (2012). Strengthening statistical usage in marine ecology. Journal of Experimental Marine Biology and Ecology 426-427: 97-108.
Berger, J. O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing? Statistical Science 18(1): 1-32.
Berger, J. O. and J. Mortera. 1991. Interpreting the stars in precise hypothesis testing. International Statistical Review 59: 337-353.
Blaich, C. F. (1998). The null-hypothesis significance-test procedure: Can't live with it, can't live without it. Behavioral and Brain Sciences 21(2): 194-195.
Bonett, D., and T. Wright (2007). Comments and recommendations regarding the hypothesis testing controversy. Journal of Organizational Behavior 28(6): 647-659.
Bookstein, F. (1998). Statistical significance testing was not meant for weak corroborations of weaker theories. Behavioral and Brain Sciences 21(2): 195-196.
Boyce, M. S. 2002. Statistics as viewed by biologists. Journal of Agricultural Biological and Environmental Statistics 7: 306-312.
Brosi, B. J., and E. G. Biber (2009). Statistical inference, Type II error, and decision making under the U.S. Endangered Species Act. Frontiers in Ecology and Environment 7(9): 487-494.
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers and L. Thomas (2001). Introduction to Distance Sampling: Estimating Abundance of Biological Populations. New York: Oxford University Press.
Burnham, K. P. and D. R. Anderson. 2014. P values are only an index to evidence: 20th- vs. 21st-century statistical science. Ecology 95: 627-630.
Burnham, K. P., and D. R. Anderson (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York: Springer.
Butcher, J. A., J. E. Groce, et al. (2007). Persistent controversy in statistical approaches in wildlife sciences: A perspective of students. The Journal of Wildlife Management 71(7): 2142-2144.
Camp, R. J., N. E. Seavy, et al. (2008). A statistical test to show negligible trend: comment. Ecology 89(5): 1469-1472.
Carver, R. P. 1978. The case against statistical significance testing. Harvard Educational Review 48 :378-399.
Chinipardaz, R., and A. Abtahi (2008). Testing a point null hypothesis: The comparison of p-Values and Bayesian evidence in multivariate normal distribution. Pakistan Journal of Statistics 24(2): 123-133.
Chisholm, R., and R. Taylor (2007). Null-hypothesis significance testing and the critical weight range for Australian mammals. Conservation Biology 21(6): 1641-1645.
Chow, S. (1988). Significance test or effect size? Psychological Bulletin 103(1): 105-110.
Chow, S. (1998). The null-hypothesis significance-test procedure is still warranted. Behavioral and Brain Sciences 21(2): 228-235.
Chow, S. L. (1998). Précis of Statistical significance: Rationale, validity, and utility. Behavioral and Brain Sciences 21: 169-239.
Cole, R., and G. McBride (2004). Assessing impacts of dredge spoil disposal using equivalence tests: Implications of a precautionary (proof of safety) approach. Marine Ecology Progress Series 279: 63-72.
Colegrave, N. and G. D. Ruxton. 2003. Confidence intervals are a more useful complement to nonsignificant tests than are power calculations. Behavioral Ecology 14: 446-450.
Cormack, R. M. (1988). Statistical challenges in the environmental sciences: A personal view. Journal of the Royal Statistical Society A 151:201-210.
Cowgill, G. (1977). The trouble with significance tests and what we can do about it. American Antiquity 42(3): 350-368.
Dahiru, T. 2008. P-value, a true test of statistical significance? A cautionary note. Annals of Ibadan Postgraduate Medicine 6:21-26.
Dar, R. (1998). Null hypothesis tests and theory corroboration: Defending NHSTP out of context. Behavioral and Brain Sciences 21(2): 196-197.
de Valpine, P. 2014. The common sense of P values. Ecology 95: 617-621.
Denis, D. (2003). Alternatives to null hypothesis significance testing. Theory & Science 4(1).
D'Errico, G. E. (2009). Issues in significance testing. Measurement 42(10): 1478-1481.
Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspectives on Psychological Science 6(3): 274-290.
Dixon, P. M., and J. H. K. Pechmann (2008). A statistical test to show negligible trend: A reply. Ecology 89(5): 1473.
Eberhardt, L. L. (2003). What should we do about hypothesis testing? The Journal of Wildlife Management 67(2): 241-247.
Eguchi, T. and T. Gerrodette (2009) A Bayesian approach to line-transect analysis for estimating abundance. Ecological Modelling, 220, 1620-1630.
Ellison, A. M., N. J. Gotelli, B. D. Inouye, and D. R. Strong. 2014. P values, hypothesis testing, and model selection: It’s déjà vu all over again. Ecology 95: 609-610.
Erwin, E. (1998). The logic of null hypothesis testing. Behavioral and Brain Sciences 21(2): 197-198.
Fairweather, P. G. 1991. Statistical power and design requirements for environmental monitoring. Australian Journal of Marine and Freshwater Research 42: 555-567.
Fernandez-Duque, E. (1997). Comparing and combining data across studies: Alternatives to significance testing. Oikos 79(3): 616-618.
Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62(5): 749-770.
Fidler, F., M. A. Burgman, et al. (2006). Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conservation Biology 20(5): 1539-1544.
Finch, S., G. Cumming, et al. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement 61(2): 181-210.
Fowler, N. 1990. The 10 most common statistical errors. Bulletin of the Ecological Society of America 71: 161-164.
Fraley, R., and M. Marks (2007). The null hypothesis significance-testing debate and its implications for personality research. In Handbook of Research Methods in Personality Psychology, ed. Robins, R. W., R. C. Fraley and R. F. Krueger, 149-169. New York: The Guilford Press.
Freedman, D. A. (1983). A note on screening regression equations. The American Statistician 37: 152-155.
Frick, R. (1998). Chow's defense of null-hypothesis testing: Too traditional? Behavioral and Brain Sciences 21(2): 199.
Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin (2004). Bayesian Data Analysis. Boca Raton, FL: Chapman & Hall/CRC.
Gelman, A., and H. Stern (2006). The difference between 'significant' and 'not significant' is not itself statistically significant. The American Statistician 60(4): 328-331.
Gerrodette, T. (2011). Inference without significance: Measuring support for hypotheses rather than rejecting them. Marine Ecology. 32: 404-418.
Gerrodette, T., B. L. Taylor, R. Swift, S. Rankin, L. A. Jaramillo and L. Rojas-Bracho (2011) A combined visual and acoustic estimate of 2008 abundance, and change in abundance since 1997, for the vaquita, Phocoena sinus. Marine Mammal Science 27(2): E79-E100.
Gibbons, J. M., N. M. J. Crout, et al. (2007). What role should null-hypothesis significance tests have in statistical education and hypothesis falsification? Trends in Ecology and Evolution 22(9): 445-446.
Glück, J., and O. Vitouch (1998). Stranded statistical paradigms: The last crusade. Behavioral and Brain Sciences 21(2): 200-201.
Good, I. J. (1985). Tail-area probabilities and Bayes factors as distances from the null hypothesis. Journal of Statistical Computation and Simulation 20(4): 325-325.
Good, I. J. (1958). Significance tests in parallel and in series. Journal of the American Statistical Association 53:799-813.
Goodie, A. S. (2004). Null hypothesis statistical testing and the balance between positive and negative approaches. Behavioral and Brain Sciences 27(3): 338-339.
Goodman, S. N. 1999. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130: 1005-1013.
Goodman, S. N. (2001). Of p-values and Bayes: A modest proposal. Epidemiology 12(3): 295-297.
Goodman, D. (2004a). Methods for joint inference from multiple data sources for improved estimates of population size and survival rates. Marine Mammal Science, 20(3), 401-423.
Goodman, D. (2004b). "aking the prior seriously: Bayesian analysis without subjective probability. In The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations, ed. M.L. Taper and S. R. Lele. Chicago: University of Chicago Press.
Goodman, S. N., and S. Greenland (2007). Why most published research findings are false: Problems in the analysis. PLoS Medicine 4(4): e168.
Gray, B. R., and M. M. Burlew (2007). Estimating trend precision and power to detect trends across grouped count data. Ecology, 88(9), 2364-2372.
Gregg, A. P., and C. Sedikides (2004). Is social psychological research really so negatively biased? Behavioral and Brain Sciences 27(3): 340-341.
Guthery, Fred S. (2008). Statistical ritual versus knowledge accrual in wildlife science. Journal of Wildlife Management 72(8): 1872-1875.
Guthery, F. S., J. J. Lusk, et al. (2001). The fall of the null hypothesis: Liabilities and opportunities. The Journal of Wildlife Management 65(3): 379-384.
Hagen, R. (1997). In praise of the null hypothesis significance test. American Psychologist 52(1): 15-24.
Hamilton, W. I. (1993). Testing the Null Hypothesis. Journal of Forestry 91(1): 5.
Harris, R. (1998). 'With friends like this...': Three flaws in Chow's defense of significance testing. Behavioral and Brain Sciences 21(2): 202-203.
Hobbs, N. T., and R. Hilborn (2006). Alternatives to statistical hypothesis testing in ecology: A guide to self teaching. Ecological Applications 16(1): 5-19.
Hobbs, N. T., S. Twombly, et al. (2006). Deepening ecological insights using contemporary statistics. Ecological Applications 16(1): 3-4.
Hoekstra, R., S. FINCH, et al. (2006). Probability as certainty: Dichotomous thinking and the misuse of p values. Psychonomic Bulletin & Review 13(6): 1033-1037.
Hoenig, J. M., and D. M. Heisey (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician 55(1): 19-24.
Hunter, J. E. 1997. Needed: A ban on the significance test. Psychological Science 8: 3-7.
Hunter, J. (1998). Testing significance testing: A flawed defense. Behavioral and Brain Sciences 21(2): 204.
Hurlbert, S. H., and C. M. Lombardi (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici 46(5): 311-349.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine 2(8): e124.
Ioannidis, J. P. A. (2007). Why most published research findings are false: Author's reply to Goodman and Greenland. PLoS Medicine 4(6): e215.
Jaramillo-Legorreta, A., L. Rojas-Bracho, E. L. Brownell, Jr., A. J. Read, R. R. Reeves, K. Ralls and B. L. Taylor (2007). Saving the vaquita: Immediate action, not more data. Conservation Biology, 21(6), 1653-1655.
Johnson, D. H. (2002). The role of hypothesis testing in wildlife science. The Journal of Wildlife Management 66(2): 272-276.
Jones, L., and J. Tukey (2000). A sensible formulation of the significance test. Psychological Methods 5(4): 411-414.
Kelley, J. (2009). The perils of p-values: Why tests of statistical significance impede the progress of research. Handbook of Evidence-Based Psychodynamic Psychotherapy: 367-377.
Kihlstrom, J. (1998). If you've got an effect, test its significance; if you've got a weak effect, do a meta-analysis. Behavioral and Brain Sciences 21(2): 205-206.
Kluger, A. N., and J. Tikochinsky (2001). The error of accepting the "theoretical" null hypothesis: The rise, fall, and resurrection of commonsense hypotheses in psychology. Psychological Bulletin 127(3): 408-423.
Krueger, L. (1998). The Ego has landed! The .05 level of statistical significance is soft (Fisher) rather than hard (Neyman/Pearson). Behavioral and Brain Sciences 21(2): 207-208.
Läärä, E. (2009). Statistics: reasoning on uncertainty, and the insignificance of testing null. Annales Zoologici Fennici 46(2): 138-157.
Lecoutre, B., M.-P. Lecoutre, et al. (2001). Uses, abuses and misuses of significance tests in the scientific community: Won't the Bayesian choice be unavoidable? International Statistical Review 69(3): 399-417.
Lee, M. D., and K. J. Pope (2006). Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference. Journal of Mathematical Psychology 50(2): 193-202.
Lee, M. D., and E.-J. Wagenmakers (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003). Psychological Review 112(3): 662-668.
Lew, M. J. (2006). Principles: When there should be no difference - how to fail to reject the null hypothesis. Trends in Pharmacological Sciences 27(5): 274-278.
Lindley, D. V. 1957. A statistical paradox. Biometrika 44: 187-192.
Link, W. A., and R. J. Barker (2010). Bayesian Inference with Ecological Applications. New York: Academic Press.
Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science 5(6): 161-171.
Lombardi, C. M., and S. H. Hurlbert (2009). Misprescription and misuse of one-tailed tests. Austral Ecology 34(4): 447-468.
Luft, H. S. (2000). Identifying and assessing the null hypothesis. Health Services Research 34(6): 1265-1271.
Mackintosh, N. J. (1987). From null hypothesis to null dogma. Behavioral and Brain Sciences 10(4): 689-695.
Malgady, R. (2000). Myths about the null hypothesis and the path to reform. In Handbook of Cross-cultural and Multicultural Personality Assessment, ed. R. H. Dana, 49-62. Mahwah, NJ: Lawrence Erlbaum Associates.
Martien, K. K., and B. L. Taylor (2003). Limitations of hypothesis-testing in defining management units for continuously distributed species. Journal of Cetacean Research and Management 5(3): 213-218.
Martínez del Rio, C., S. W. Buskirk, et al. (2007). Response to Gibbons et al.: Null-hypothesis significance tests in education and inference. Trends in Ecology and Evolution 22(9): 446.
McBride, G. B. (2002). Statistical methods helping and hindering environmental science and management. Journal of Agricultural, Biological, and Environmental Statistics 7(3): 300-305.
McIlroy, D. R. (2005). Failing to reject the null hypothesis does not mean that the null hypothesis is true. Anesthesia and Analgesia 100(6): 1868-1869.
Meeks, S. L., and R. B. Dagostino (1983). A note on the use of confidence-limits following rejection of a null hypothesis. American Statistician 37(2): 134-136.
Mogie, M. (2004). In support of null hypothesis significance testing. Proceedings: Biological Sciences. London: The Royal Society. 271(3): S82-S84.
Morrison, D. E., and R. Henkel, Eds. (1970). The Significance Test Controversy: A Reader. New Brunswick, NJ: Transaction Publishers (reprinting).
Mundry, R., and C. L. Nunn (2009). Stepwise model fitting and statistical inference: Turning noise into signal pollution. The American Naturalist 173(1): 119-123.
Murtaugh, P. A. 2014. In defense of P values. Ecology 95: 611-617.
Nakagawa, S., and I. C. Cuthill (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews 82(4): 591-605.
Nester, M. R. (1998). Significance tests cannot be justified in theory-corroboration experiments. Behavioral and Brain Sciences 21(2): 213.
Ngatia, M., D. Gonzalez, et al. (2010). Equivalence versus classical statistical tests in water quality assessments. Journal of Environmental Monitoring 12(1): 172-177.
Nicholls, N. (2001). The insignificance of significance testing. Bulletin of the American Meteorological Society 82(5): 981-986.
Oakes, W. F. (1975). On alleged falsity of null hypothesis. Psychological Record 25(2): 265-272.
Paley, J., H. Cheyne, et al. (2008). The null hypothesis: A reply. Journal of Advanced Nursing 64(2): 209-210.
Palm, G. (1998). Significance testing–does it need this defence? Behavioral and Brain Sciences 21(2): 214-215.
Peres-Neto, P. (1999). How many statistical tests are too many? The problem of conducting multiple ecological inferences revisited. Marine Ecology Progress Series 176: 303-306.
Poitevineau, J., and B. Lecoutre (1998). Some statistical misconceptions in Chow's statistical significance. Behavioral and Brain Sciences 21(2): 215.
Rigby, A. (1999). Getting past the statistical referee: Moving away from p-values and towards interval estimation. Health Education Research 14(6): 713.
Rindskopf, D. (1998). Null-hypothesis tests are not completely stupid, but Bayesian statistics are better. Behavioral and Brain Sciences 21(2): 215-216.
Robinson, D. H., and H. Wainer (2002). On the past and future of null hypothesis significance testing. The Journal of Wildlife Management 66(2): 263-271.
Rojas-Bracho, L., R. R. Reeves and A. Jaramillo-Legorreta (2006). Conservation of the vaquita, Phocoena sinus. Mammal Review, 36(3), 179-216.
Salsburg, D. (2001). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: W. H. Freeman / Holt Paperbacks.
Schwann, N. M., and J. Horrow (2005). Failing to reject the null hypothesis does not mean that the null hypothesis is true - In response. Anesthesia and Analgesia 100(6): 1869.
Scialli, A. R. (1992). Confidence and the null hypothesis. Reproductive Toxicology 6(5): 383-384.
Sedlmeier, P. (2009). Beyond the significance test ritual. Zeitschrift für Psychologie/Journal of Psychology 217(1): 1-5.
Sellke, T., M. J. Bayarri, et al. (2001). Calibration of p values for testing precise null hypotheses. The American Statistician 55(1): 62-71.
Shrader-Frechette, K. (2011). Randomization and rules for causal inferences in biology: When the biological emperor (significance testing) has no clothes. Biological Theory 6:154-161.
Shrout, P. E. (1997). Should significance tests be banned? Introduction to a special section exploring the pros and cons. Psychological Science 8:1-2.
Siegfried, T. (2010). Odds are, it's wrong. Science News 177(7): 26.
Silva-Aycaguer, L. C., P. Suarez-Gil, et al. (2010). The null hypothesis significance test in health sciences research (1995-2006): Statistical analysis and interpretation. BMC Medical Research Methodology 10: 44.
Simonne, E., M. Ozores-Hampton, et al. (2007). So, you wanted to accept the null hypothesis? Analysis and interpretation of fertilizer trials in the BMP era. HortScience 42(3): 440.
Smedslund, G. (2008). All bachelors are unmarried men (p < 0.05). Quality & Quantity 42(1): 53-73.
Sohn, D. (2000). Does the finding of statistical significance justify the rejection of the null hypothesis? Behavioral and Brain Sciences 23(2): 293-294.
Soper, H. V., D. V. Cicchetti, et al. (1988). Null hypothesis disrespect in neuropsychology: Dangers of alpha and beta errors. Journal of Clinical and Experimental Neuropsychology 10(2): 255-270.
Stallings, W., and S. Singhal (1969). Confidence level and significance level: Semantic confusion or logical fallacy. The Journal of Experimental Educational 37(4): 57-59.
Stam, H., and G. Pasay (1998). The historical case against null-hypothesis significance testing. Behavioral and Brain Sciences 21(2): 219-220.
Stephens, P. A., S. W. Buskirk, et al. (2006). Inference in ecology and evolution. Trends in Ecology and Evolution 22(4): 192-197.
Sterne, J. A. C., and G. Davey Smith (2001). Sifting the evidence - what's wrong with significance tests? Physical Therapy 81(8): 1464-1469.
Stürzebecher, E., M. Cebulla, et al. (2005). Automated auditory response detection: Statistical problems with repeated testing. International Journal of Audiology 44(2): 110-117.
Tachibana, T. (1982). A comment on confusion in open-field studies: Abuse of null-hypothesis significance test. Physiology & Behavior 29(1): 159-161.
Taper, M. L., and S. R. Lele (2004). The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations. Chicago, The University of Chicago Press.
Tassinary, L. (1998). Significance tests: Necessary but not sufficient. Behavioral and Brain Sciences 21(2): 221-222.
Thomas, L., S. T. Buckland, E. A. Rexstad, J. L. Laake, S. Strindberg, S. L. Hedley, J. R. B. Bishop, T. A. Marques and K. P. Burnham (2010). Distance software: Design and analysis of distance sampling surveys for estimating population size. Journal of Applied Ecology, 47: 5-14.
Thompson, C. F., and A. J. Neill (1993). Statistical power and accepting the null hypothesis. Animal Behaviour 46(5): 1012.
Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes's theorem. Psychological Review 110(3): 526-535.
Trout, J. D. (1999). Measured realism and statistical inference: An explanation for the fast progress of 'hard' psychology. Philosophy of Science 66(3): S260-S272.
Turan, F. N., and M. Senocak (2007). Evaluating 'superiority', 'equivalence' and 'non-inferiority' in clinical trials. Annals of Saudi Medicine 27(4): 284-288.
Vokey, J. (1998). Statistics without probability: Significance testing as typicality and exchangeability in data analysis. Behavioral and Brain Sciences 21(2): 225-226.
Wagenmakers, E. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review 14(5): 779-804.
Whittingham, M. J., P. A. Stephens, et al. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75(5): 1182-1189.
Zumbo, B. (1998). A viable alternative to null-hypothesis testing. Behavioral and Brain Sciences 21(2): 227-228.