Comments on: P-values are widely used in the social sciences, but often misunderstood: and that’s a problem. https://ensr.oii.ox.ac.uk/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem/ Understanding public policy online Mon, 07 Dec 2020 14:24:52 +0000 hourly 1 By: Otto https://ensr.oii.ox.ac.uk/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem/#comment-1 Mon, 14 Mar 2016 11:32:23 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3604#comment-1 Thanks for a nice post guys. This is an extremely important topic. It is surprising that it has been glossed over for so long, despite the fact that most undergrad statistics students should know the problems associated with p-values.

I have a question, and a comment… First the comment: Andrew Gelman and his co-authors have written extensively about ‘garden of forking paths’ (e.g. here http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf). Their point is that p-values can be invalid even if the researcher would not have actively done any “p-hacking”. Already the fact that a researcher adjusts her analyses (data transformations, outlier filtering, choice of statistical test, etc…) according to the data she has at hand invalidates the theoretical interpretation of p-values. This is because the analyses are dependent on the data; if the data were different, the analyses might also change! This seems like a profound problem in all empirical work, and also seems very poorly understood.

Then the question… After reading ASA’s (and your) warnings, should I stop using p-values? Wouldn’t many of the alternatives proposed by ASA like bayes factors, likelihood ratios, (bayesian) standard error bands, etc. suffer from many of the same problems? If I can ‘hack’ a p-value, I should be just as able to hack a bayes factor, right? In addition, when it comes to large datasets, I might also encounter some parameter estimates which have a very large bayes factor, but the parameter estimate could still too small to be of any practical relevance, or could be a result of an invalid research design.

In addition, if we move the goalposts so that instead of p<0.05 we would set the threshold of "significant" to p<0.01 or some other nunber smaller that 0.05, wouldn't that just encourage hacking in cases when a result is close to p=0.01?

I admit that this is more a question to ASA than you guys, but ASA does not have such a nice blog as you have. 🙂

]]>