The following list of papers discussing null hypothesis significance testing as a method of inference covers the period 2001-2011. The list is intended to supplement previous compilations through 1997 by Marks Nester at http://warnercnr.colostate.edu/~anderson/nester.html, and another through 2001 by Bill Thompson at http://warnercnr.colostate.edu/~anderson/thompson1.html. The list includes a few papers prior to 2001 that did not appear in either of these two previous compilations. The list may not be complete, either because a paper was simply missed, or because the primary focus of a paper did not seem to be relevant to significance testing. Internet links are provided where available. Abstracts are usually available at no charge, but some sites require a fee or subscription for the full text.
Anderson, D. R. (2008). Model Based Inference in the Life Sciences: A Primer on Evidence. New York: Springer. 
Anttonen, R. G. (1970). "The significance of the null." The Journal of Educational Research 63(10): 438-440. 
Balluerka Lasa, N., A. I. Vergara Iraeta, et al. (2009). "Calculating the main alternatives to null-hypothesis-significance testing in between-subject experimental designs." Psicothema 21(1): 141-151. 
Balluerka, N., J. Gómez, et al. (2005). "The controversy over null hypothesis significance testing revisited." Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 1(2): 55-70. 
Beaulieu-Prévost, D. (2007). "Statistical decision and falsification in science: Going beyond the null hypothesis." In Cognitive Decision-Making: Empirical and Foundational Issues, ed. Hardy-Vallee, B. Cambridge: Cambridge Scholar Publishing.
Beninger, P. G., I. Boldina, and S. Katsanevakis. (2012). "Strengthening statistical usage in marine ecology." Journal of Experimental Marine Biology and Ecology 426-427: 97-108. 
Berger, J. O. (2003). "Could Fisher, Jeffreys and Neyman have agreed on testing?" Statistical Science 18(1): 1-32. 
Blaich, C. F. (1998). "The null-hypothesis significance-test procedure: Can't live with it, can't live without it." Behavioral and Brain Sciences 21(2): 194-195. 
Bonett, D., and T. Wright (2007). "Comments and recommendations regarding the hypothesis testing controversy." Journal of Organizational Behavior 28(6): 647-659.
Bookstein, F. (1998). "Statistical significance testing was not meant for weak corroborations of weaker theories." Behavioral and Brain Sciences 21(2): 195-196. 
Brosi, B. J., and E. G. Biber (2009). "Statistical inference, Type II error, and decision making under the U.S. Endangered Species Act." Frontiers in Ecology and Environment 7(9): 487-494. 
Buckland, S. T., D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers and L. Thomas (2001). Introduction to Distance Sampling: Estimating Abundance of Biological Populations. New York: Oxford University Press.
Burnham, K. P., and D. R. Anderson (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York: Springer. 
Butcher, J. A., J. E. Groce, et al. (2007). "Persistent controversy in statistical approaches in wildlife sciences: a perspective of students." The Journal of Wildlife Management 71(7): 2142-2144. 
Camp, R. J., N. E. Seavy, et al. (2008). "A statistical test to show negligible trend: comment." Ecology 89(5): 1469-1472. 
Chinipardaz, R., and A. Abtahi (2008). "Testing a point null hypothesis: The comparison of p-Values and Bayesian evidence in multivariate normal distribution." Pakistan Journal of Statistics 24(2): 123-133.
Chisholm, R., and R. Taylor (2007). "Null-hypothesis significance testing and the critical weight range for Australian mammals." Conservation Biology 21(6): 1641-1645. 
Chow, S. (1988). "Significance test or effect size?" Psychological Bulletin 103(1): 105-110. 
Chow, S. (1998). "The null-hypothesis significance-test procedure is still warranted." Behavioral and Brain Sciences 21(2): 228-235. 
Chow, S. L. (1998). "Précis of Statistical significance: Rationale, validity, and utility." Behavioral and Brain Sciences 21: 169-239. 
Cole, R., and G. McBride (2004). "Assessing impacts of dredge spoil disposal using equivalence tests: implications of a precautionary (proof of safety) approach." Marine Ecology Progress Series 279:63-72. 
Cormack, R. M. (1988). "Statistical challenges in the environmental sciences: A personal view." Journal of the Royal Statistical Society A 151:201-210.
Cowgill, G. (1977). "The trouble with significance tests and what we can do about it." American Antiquity 42(3): 350-368. 
Dahiru, T. 2008. "P-value, a true test of statistical significance? A cautionary note." Annals of Ibadan Postgraduate Medicine 6:21-26. 
Dar, R. (1998). "Null hypothesis tests and theory corroboration: Defending NHSTP out of context." Behavioral and Brain Sciences 21(2): 196-197. 
Denis, D. (2003). "Alternatives to null hypothesis significance testing." Theory & Science 4(1). 
D'Errico, G. E. (2009). "Issues in significance testing." Measurement 42(10): 1478-1481. 
D'Errico, G. E. (2009). "Issues in significance testing." Measurement 42(10): 1478-1481. 
Dienes, Z. (2011). "Bayesian versus orthodox statistics: Which side are you on?" Perspectives on Psychological Science 6(3): 274-290. 
Dixon, P. M., and J. H. K. Pechmann (2008). "A statistical test to show negligible trend: A reply." Ecology 89(5): 1473. 
Eberhardt, L. L. (2003). "What should we do about hypothesis testing?" The Journal of Wildlife Management 67(2): 241-247. 
Eguchi, T. and T. Gerrodette (2009) "A Bayesian approach to line-transect analysis for estimating abundance." Ecological Modelling, 220, 1620-1630. 
Erwin, E. (1998). "The logic of null hypothesis testing." Behavioral and Brain Sciences 21(2): 197-198. 
Fernandez-Duque, E. (1997). "Comparing and combining data across studies: Alternatives to significance testing." Oikos 79(3): 616-618. 
Fidler, F. (2002). "The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial." Educational and Psychological Measurement 62(5): 749-770. 
Fidler, F., M. A. Burgman, et al. (2006). "Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology." Conservation Biology 20(5): 1539-1544. 
Finch, S., G. Cumming, et al. (2001). "Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform." Educational and Psychological Measurement 61(2): 181-210. 
Fraley, R., and M. Marks (2007). "The null hypothesis significance-testing debate and its implications for personality research." In Handbook of Research Methods in Personality Psychology, ed. Robins, R. W., R. C. Fraley and R. F. Krueger, 149-169. New York: The Guilford Press. 
Freedman, D. A. (1983). "A note on screening regression equations." The American Statistician 37: 152-155.
Frick, R. (1998). "Chow's defense of null-hypothesis testing: Too traditional?" Behavioral and Brain Sciences 21(2): 199. 
Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin (2004). Bayesian Data Analysis. Boca Raton, FL: Chapman & Hall/CRC. 
Gelman, A., and H. Stern (2006). "The difference between 'significant' and 'not significant' is not itself statistically significant." The American Statistician 60(4): 328-331. 
Gerrodette, T. (2011). "Inference without significance: Measuring support for hypotheses rather than rejecting them." Marine Ecology. 32: 404-418.
Gerrodette, T., B. L. Taylor, R. Swift, S. Rankin, L. A. Jaramillo and L. Rojas-Bracho (2011) "A combined visual and acoustic estimate of 2008 abundance, and change in abundance since 1997, for the vaquita, Phocoena sinus." Marine Mammal Science 27(2): E79-E100.
Gibbons, J. M., N. M. J. Crout, et al. (2007). "What role should null-hypothesis significance tests have in statistical education and hypothesis falsification?" Trends in Ecology and Evolution 22(9): 445-446. 
Glück, J., and O. Vitouch (1998). "Stranded statistical paradigms: The last crusade." Behavioral and Brain Sciences 21(2): 200-201. 
Good, I. J. (1985). "Tail-area probabilities and Bayes factors as distances from the null hypothesis." Journal of Statistical Computation and Simulation 20(4): 325-325. 
Good, I. J. (1958). "Significance tests in parallel and in series." Journal of the American Statistical Association 53:799-813.
Goodie, A. S. (2004). "Null hypothesis statistical testing and the balance between positive and negative approaches." Behavioral and Brain Sciences 27(3): 338-339. 
Goodman, S. N. (2001). "Of p-values and Bayes: A modest proposal." Epidemiology 12(3): 295-297.
Goodman, D. (2004a). "Methods for joint inference from multiple data sources for improved estimates of population size and survival rates." Marine Mammal Science, 20(3), 401-423. 
Goodman, D. (2004b). "Taking the prior seriously: Bayesian analysis without subjective probability." In The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations, ed. M.L. Taper and S. R. Lele. Chicago: University of Chicago Press.
Goodman, S. N., and S. Greenland (2007). "Why most published research findings are false: Problems in the analysis." PLoS Medicine 4(4): e168. 
Gray, B. R., and M. M. Burlew (2007). "Estimating trend precision and power to detect trends across grouped count data." Ecology, 88(9), 2364-2372. 
Gregg, A. P., and C. Sedikides (2004). "Is social psychological research really so negatively biased?" Behavioral and Brain Sciences 27(3): 340-341. 
Guthery, Fred S. (2008). "Statistical ritual versus knowledge accrual in wildlife science." Journal of Wildlife Management 72(8): 1872-1875. 
Guthery, F. S., J. J. Lusk, et al. (2001). "The fall of the null hypothesis: Liabilities and opportunities." The Journal of Wildlife Management 65(3): 379-384. 
Hagen, R. (1997). "In praise of the null hypothesis significance test." American Psychologist 52(1): 15-24. 
Hamilton, W. I. (1993). "Testing the Null Hypothesis." Journal of Forestry 91(1): 5.
Harris, R. (1998). "'With friends like this...': Three flaws in Chow's defense of significance testing." Behavioral and Brain Sciences 21(2): 202-203. 
Hobbs, N. T., and R. Hilborn (2006). "Alternatives to statistical hypothesis testing in ecology: A guide to self teaching." Ecological Applications 16(1): 5-19. 
Hobbs, N. T., S. Twombly, et al. (2006). "Deepening ecological insights using contemporary statistics." Ecological Applications 16(1): 3-4. 
Hoekstra, R., S. FINCH, et al. (2006). "Probability as certainty: Dichotomous thinking and the misuse of p values." Psychonomic Bulletin & Review 13(6): 1033-1037. 
Hoenig, J. M., and D. M. Heisey (2001). "The abuse of power: The pervasive fallacy of power calculations for data analysis." The American Statistician 55(1): 19-24. 
Hunter, J. (1998). "Testing significance testing: A flawed defense." Behavioral and Brain Sciences 21(2): 204. 
Hurlbert, S. H., and C. M. Lombardi (2009). "Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian." Annales Zoologici Fennici 46(5): 311-349. 
Ioannidis, J. P. A. (2005). "Why most published research findings are false." PLoS Medicine 2(8): e124. 
Ioannidis, J. P. A. (2007). "Why most published research findings are false: Author's reply to Goodman and Greenland." PLoS Medicine 4(6): e215. 
Jaramillo-Legorreta, A., L. Rojas-Bracho, E. L. Brownell, Jr., A. J. Read, R. R. Reeves, K. Ralls and B. L. Taylor (2007). "Saving the vaquita: Immediate action, not more data." Conservation Biology, 21(6), 1653-1655.
Johnson, D. H. (2002). "The role of hypothesis testing in wildlife science." The Journal of Wildlife Management 66(2): 272-276. 
Jones, L., and J. Tukey (2000). "A sensible formulation of the significance test." Psychological Methods 5(4): 411-414. 
Kelley, J. (2009). "The perils of p-values: Why tests of statistical significance impede the progress of research." Handbook of Evidence-Based Psychodynamic Psychotherapy: 367-377. 
Kihlstrom, J. (1998). "If you've got an effect, test its significance; if you've got a weak effect, do a meta-analysis." Behavioral and Brain Sciences 21(2): 205-206. 
Kluger, A. N., and J. Tikochinsky (2001). "The error of accepting the "theoretical" null hypothesis: The rise, fall, and resurrection of commonsense hypotheses in psychology." Psychological Bulletin 127(3): 408-423. 
Krueger, L. (1998). "The Ego has landed! The .05 level of statistical significance is soft (Fisher) rather than hard (Neyman/Pearson)." Behavioral and Brain Sciences 21(2): 207-208. 
Läärä, E. (2009). "Statistics: reasoning on uncertainty, and the insignificance of testing null." Annales Zoologici Fennici 46(2): 138-157.
Lecoutre, B., M.-P. Lecoutre, et al. (2001). "Uses, abuses and misuses of significance tests in the scientific community: Won't the Bayesian choice be unavoidable?" International Statistical Review 69(3): 399-417. 
Lee, M. D., and K. J. Pope (2006). "Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference." Journal of Mathematical Psychology 50(2): 193-202. 
Lee, M. D., and E.-J. Wagenmakers (2005). "Bayesian statistical inference in psychology: Comment on Trafimow (2003)." Psychological Review 112(3): 662-668. 
Lew, M. J. (2006). "Principles: When there should be no difference - how to fail to reject the null hypothesis." Trends in Pharmacological Sciences 27(5): 274-278. 
Link, W. A., and R. J. Barker (2010). Bayesian Inference with Ecological Applications. New York: Academic Press. 
Loftus, G. R. (1996). "Psychology will be a much better science when we change the way we analyze data." Current Directions in Psychological Science 5(6): 161-171. 
Lombardi, C. M., and S. H. Hurlbert (2009). "Misprescription and misuse of one-tailed tests." Austral Ecology 34(4): 447-468. 
Luft, H. S. (2000). "Identifying and assessing the null hypothesis." Health Services Research 34(6): 1265-1271.
Mackintosh, N. J. (1987). "From null hypothesis to null dogma." Behavioral and Brain Sciences 10(4): 689-695. 
Malgady, R. (2000). "Myths about the null hypothesis and the path to reform." In Handbook of Cross-cultural and Multicultural Personality Assessment, ed. R. H. Dana, 49-62. Mahwah, NJ: Lawrence Erlbaum Associates.
Martien, K. K., and B. L. Taylor (2003). "Limitations of hypothesis-testing in defining management units for continuously distributed species." Journal of Cetacean Research and Management 5(3): 213-218.
Martínez del Rio, C., S. W. Buskirk, et al. (2007). "Response to Gibbons et al.: Null-hypothesis significance tests in education and inference." Trends in Ecology and Evolution 22(9): 446. 
McBride, G. B. (2002). "Statistical methods helping and hindering environmental science and management." Journal of Agricultural, Biological, and Environmental Statistics 7(3): 300-305. 
McIlroy, D. R. (2005). "Failing to reject the null hypothesis does not mean that the null hypothesis is true." Anesthesia and Analgesia 100(6): 1868-1869.
Meeks, S. L., and R. B. Dagostino (1983). "A note on the use of confidence-limits following rejection of a null hypothesis." American Statistician 37(2): 134-136. 
Mogie, M. (2004). "In support of null hypothesis significance testing." Proceedings: Biological Sciences. London: The Royal Society. 271(3): S82-S84. 
Morrison, D. E., and R. Henkel, Eds. (1970). The Significance Test Controversy: A Reader. New Brunswick, NJ: Transaction Publishers (reprinting).
Mundry, R., and C. L. Nunn (2009). "Stepwise model fitting and statistical inference: Turning noise into signal pollution." The American Naturalist 173(1): 119-123. 
Nakagawa, S., and I. C. Cuthill (2007). "Effect size, confidence interval and statistical significance: A practical guide for biologists." Biological Reviews 82(4): 591-605. 
Nester, M. R. (1998). "Significance tests cannot be justified in theory-corroboration experiments." Behavioral and Brain Sciences 21(2): 213. 
Ngatia, M., D. Gonzalez, et al. (2010). "Equivalence versus classical statistical tests in water quality assessments." Journal of Environmental Monitoring 12(1): 172-177. 
Nicholls, N. (2001). "The insignificance of significance testing." Bulletin of the American Meteorological Society 82(5): 981-986. 
Oakes, W. F. (1975). "On alleged falsity of null hypothesis." Psychological Record 25(2): 265-272.
Paley, J., H. Cheyne, et al. (2008). "The null hypothesis: A reply." Journal of Advanced Nursing 64(2): 209-210. 
Palm, G. (1998). "Significance testing–does it need this defence?" Behavioral and Brain Sciences 21(2): 214-215. 
Peres-Neto, P. (1999). "How many statistical tests are too many? The problem of conducting multiple ecological inferences revisited." Marine Ecology Progress Series 176: 303-306.
Poitevineau, J., and B. Lecoutre (1998). "Some statistical misconceptions in Chow's statistical significance." Behavioral and Brain Sciences 21(2): 215. 
Rigby, A. (1999). "Getting past the statistical referee: Moving away from p-values and towards interval estimation." Health Education Research 14(6): 713. 
Rindskopf, D. (1998). "Null-hypothesis tests are not completely stupid, but Bayesian statistics are better." Behavioral and Brain Sciences 21(2): 215-216. 
Robinson, D. H., and H. Wainer (2002). "On the past and future of null hypothesis significance testing." The Journal of Wildlife Management 66(2): 263-271. 
Rojas-Bracho, L., R. R. Reeves and A. Jaramillo-Legorreta (2006). "Conservation of the vaquita, Phocoena sinus." Mammal Review, 36(3), 179-216. 
Salsburg, D. (2001). The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: W. H. Freeman / Holt Paperbacks. 
Schwann, N. M., and J. Horrow (2005). "Failing to reject the null hypothesis does not mean that the null hypothesis is true - In response." Anesthesia and Analgesia 100(6): 1869. 
Scialli, A. R. (1992). "Confidence and the null hypothesis." Reproductive Toxicology 6(5): 383-384. 
Sedlmeier, P. (2009). "Beyond the significance test ritual." Zeitschrift für Psychologie/Journal of Psychology 217(1): 1-5. 
Sellke, T., M. J. Bayarri, et al. (2001). "Calibration of p values for testing precise null hypotheses." The American Statistician 55(1): 62-71. 
Shrader-Frechette, K. (2011). "Randomization and rules for causal inferences in biology: When the biological emperor (significance testing) has no clothes." Biological Theory 6:154-161.
Shrout, P. E. (1997). "Should significance tests be banned? Introduction to a special section exploring the pros and cons." Psychological Science 8:1-2.
Siegfried, T. (2010). "Odds are, it's wrong." Science News 177(7): 26. 
Silva-Aycaguer, L. C., P. Suarez-Gil, et al. (2010). "The null hypothesis significance test in health sciences research (1995-2006): Statistical analysis and interpretation." BMC Medical Research Methodology 10: 44. 
Simonne, E., M. Ozores-Hampton, et al. (2007). "So, you wanted to accept the null hypothesis? Analysis and interpretation of fertilizer trials in the BMP era." HortScience 42(3): 440.
Smedslund, G. (2008). "All bachelors are unmarried men (p < 0.05)." Quality & Quantity 42(1): 53-73. 
Sohn, D. (2000). "Does the finding of statistical significance justify the rejection of the null hypothesis?" Behavioral and Brain Sciences 23(2): 293-294. 
Soper, H. V., D. V. Cicchetti, et al. (1988). "Null hypothesis disrespect in neuropsychology: Dangers of alpha and beta errors." Journal of Clinical and Experimental Neuropsychology 10(2): 255-270. 
Stallings, W., and S. Singhal (1969). "Confidence level and significance level: Semantic confusion or logical fallacy." The Journal of Experimental Educational 37(4): 57-59. 
Stam, H., and G. Pasay (1998). "The historical case against null-hypothesis significance testing." Behavioral and Brain Sciences 21(2): 219-220. 
Stephens, P. A., S. W. Buskirk, et al. (2006). "Inference in ecology and evolution." Trends in Ecology and Evolution 22(4): 192-197. 
Sterne, J. A. C., and G. Davey Smith (2001). "Sifting the evidence - what's wrong with significance tests?" Physical Therapy 81(8): 1464-1469. 
Stürzebecher, E., M. Cebulla, et al. (2005). "Automated auditory response detection: Statistical problems with repeated testing." International Journal of Audiology 44(2): 110-117. 
Tachibana, T. (1982). "A comment on confusion in open-field studies: Abuse of null-hypothesis significance test." Physiology & Behavior 29(1): 159-161. 
Taper, M. L., and S. R. Lele (2004). The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations. Chicago, The University of Chicago Press.
Tassinary, L. (1998). "Significance tests: Necessary but not sufficient." Behavioral and Brain Sciences 21(2): 221-222. 
Thomas, L., S. T. Buckland, E. A. Rexstad, J. L. Laake, S. Strindberg, S. L. Hedley, J. R. B. Bishop, T. A. Marques and K. P. Burnham (2010). "Distance software: Design and analysis of distance sampling surveys for estimating population size." Journal of Applied Ecology, 47: 5-14. 
Thompson, C. F., and A. J. Neill (1993). "Statistical power and accepting the null hypothesis." Animal Behaviour 46(5): 1012.
Trafimow, D. (2003). "Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes's theorem." Psychological Review 110(3): 526-535. 
Trout, J. D. (1999). "Measured realism and statistical inference: An explanation for the fast progress of 'hard' psychology." Philosophy of Science 66(3): S260-S272. 
Turan, F. N., and M. Senocak (2007). "Evaluating 'superiority', 'equivalence' and 'non-inferiority' in clinical trials." Annals of Saudi Medicine 27(4): 284-288. 
Vokey, J. (1998). "Statistics without probability: Significance testing as typicality and exchangeability in data analysis." Behavioral and Brain Sciences 21(2): 225-226. 
Wagenmakers, E. (2007). "A practical solution to the pervasive problems of p values." Psychonomic Bulletin & Review 14(5): 779-804. 
Whittingham, M. J., P. A. Stephens, et al. (2006). "Why do we still use stepwise modelling in ecology and behaviour?" Journal of Animal Ecology 75(5): 1182-1189. 
Zumbo, B. (1998). "A viable alternative to null-hypothesis testing." Behavioral and Brain Sciences 21(2): 227-228.