{"id":3604,"date":"2016-03-07T18:53:29","date_gmt":"2016-03-07T18:53:29","guid":{"rendered":"http:\/\/blogs.oii.ox.ac.uk\/policy\/?p=3604"},"modified":"2020-12-07T14:24:52","modified_gmt":"2020-12-07T14:24:52","slug":"many-of-us-scientists-dont-understand-p-values-and-thats-a-problem","status":"publish","type":"post","link":"https:\/\/ensr.oii.ox.ac.uk\/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem\/","title":{"rendered":"P-values are widely used in the social sciences, but often misunderstood: and that&#8217;s a problem."},"content":{"rendered":"<p><em>P-values are widely used in the social sciences, especially &#8216;big data&#8217; studies, to calculate statistical significance. Yet they are\u00a0widely criticized for being easily hacked, and for not telling us what we want to know. Many have argued that, as a result, research is wrong far more often than we realize. In their recent article\u00a0<a href=\"http:\/\/journal.frontiersin.org\/article\/10.3389\/fphy.2016.00006\/full\">P-values: Misunderstood and Misused<\/a>\u00a0OII Research Fellow <a href=\"http:\/\/www.oii.ox.ac.uk\/people\/yasseri\/\">Taha Yasseri<\/a> and doctoral student <a href=\"http:\/\/www.oii.ox.ac.uk\/people\/?id=451\">Bertie Vidgen<\/a> argue that we need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process, if we are to maximise the\u00a0value of statistical analysis.<\/em><\/p>\n<figure id=\"attachment_3609\" aria-describedby=\"caption-attachment-3609\" style=\"width: 369px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blogs.oii.ox.ac.uk\/policy\/wp-content\/uploads\/sites\/77\/2016\/03\/significant.png\"><img loading=\"lazy\" class=\"wp-image-3609 size-large\" src=\"http:\/\/blogs.oii.ox.ac.uk\/policy\/wp-content\/uploads\/sites\/77\/2016\/03\/significant-369x1024.png\" alt=\"\u201cSignificant\u201d: an illustration of selective reporting and statistical significance from XKCD. Available online at http:\/\/xkcd.com\/882\/\" width=\"369\" height=\"1024\" srcset=\"https:\/\/ensr.oii.ox.ac.uk\/wp-content\/uploads\/sites\/77\/2016\/03\/significant-369x1024.png 369w, https:\/\/ensr.oii.ox.ac.uk\/wp-content\/uploads\/sites\/77\/2016\/03\/significant-108x300.png 108w, https:\/\/ensr.oii.ox.ac.uk\/wp-content\/uploads\/sites\/77\/2016\/03\/significant.png 540w\" sizes=\"(max-width: 369px) 100vw, 369px\" \/><\/a><figcaption id=\"caption-attachment-3609\" class=\"wp-caption-text\">\u201cSignificant\u201d: an illustration of selective reporting and<br \/> statistical significance from XKCD. Available online at<br \/> <a href=\"http:\/\/xkcd.com\/882\/\">http:\/\/xkcd.com\/882\/<\/a><\/figcaption><\/figure>\n<p>In an unprecedented move, the American Statistical Association recently released a statement (March 7 2016) <a href=\"http:\/\/www.amstat.org\/newsroom\/pressreleases\/P-ValueStatement.pdf\">warning against how p-values are currently used<\/a>. This reflects a growing concern in academic circles that whilst a lot of attention is paid to the huge impact of big data and algorithmic decision-making, there is considerably less focus on the crucial role played by statistics in enabling effective analysis of big data sets, and making sense of the complex relationships contained within them. Because much as datafication has created huge social opportunities, it has also brought to the fore many problems and limitations with current statistical practices. In particular, the deluge of data has made it crucial that we can work out whether studies are \u2018significant\u2019. <a href=\"http:\/\/journal.frontiersin.org\/article\/10.3389\/fphy.2016.00006\/full\">In our paper<\/a>, published three days before the ASA&#8217;s statement, we argued that the most commonly used tool in the social sciences for calculating significance \u2013 the p-value \u2013 is misused, misunderstood and, most importantly, <em>doesn\u2019t tell us what we want to know<\/em>.<\/p>\n<p>The basic problem of \u2018significance\u2019 is simple: it is simply unpractical to repeat an experiment an infinite number of times to make sure that what we observe is \u201cuniversal\u201d. The same applies to our sample size: we are often unable to analyse a \u201cwhole population\u201d sample and so have to generalize from our observations on a limited size sample to the whole population. The obvious problem here is that what we observe is based on a limited number of experiments (sometimes only one experiment) and from a limited size sample, and as such could have been generated by chance rather than by an underlying universal mechanism! We might find it impossible to make the same observation if we were to replicate the same experiment multiple times or analyse a larger sample. If this is the case then we will mischaracterise what is happening \u2013 which is a really big problem given the growing importance of \u2018evidence-based\u2019 public policy. If our evidence is faulty or unreliable then we will create policies, or intervene in social settings, in an equally faulty way.<\/p>\n<p>The way that social scientists have got round this problem (that samples might not be representative of the population) is through the \u2018p-value\u2019. The p-value tells you the probability of making a similar observation in a sample with the same size and in the same number of experiments, by pure chance In other words,\u00a0 it is actually telling you is how likely it is that you would see the same relationship between X and Y <em>even if no relationship exists between them<\/em>. On the face of it this is pretty useful, and in the social sciences we normally say that a p-value of 1 in 20 means the results are significant. Yet as the American Statistical Association has just noted, even though they are incredibly widespread many researchers mis-interpret what p-values really mean.<\/p>\n<p>In our paper we argued that p-values are misunderstood and misused because people think the p-value tells you much more than it really does. In particular, people think the p-value tells you (i) how likely it is that a relationship between X and Y really exists and (ii) the percentage of all findings that are false (which is actually something different called the <a href=\"https:\/\/en.wikipedia.org\/wiki\/False_discovery_rate\">False Discovery Rate<\/a>). As a result, we are far too confident that academic studies are correct. Some commentators have argued that <a href=\"http:\/\/rsos.royalsocietypublishing.org\/content\/1\/3\/140216\">at least 30% of studies are wrong <\/a>because of problems related to p-values: a huge figure. One of the main problems is that p-values can be \u2018hacked\u2019 and as such easily manipulated to show significance when none exists.<\/p>\n<p>If we are going to base public policy (and as such public funding) on \u2018evidence\u2019 then we need to make sure that the evidence used is reliable. P-values need to be used far more rigorously, with significance levels of 0.01 or 0.001 seen as standard. We also need to start being more open and transparent about how results are recorded. It is a fine line between data exploration (a legitimate academic exercise) and \u2018<a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_dredging\">data dredging<\/a>\u2019 (where results are manipulated in order to find something noteworthy). Only if researchers are honest about what they are doing will we be able to maximise the potential benefits offered by Big Data. Luckily there are some great initiatives \u2013 like the <a href=\"https:\/\/osf.io\">Open Science Framework<\/a> \u2013 which improve transparency around the research process, and we fully endorse researchers making use of these platforms.<\/p>\n<p>Scientific knowledge advances through corroboration and incremental progress, and it is crucial that we use and interpret statistics appropriately to ensure this progress continues. As our knowledge and use of big data methods increase, we need to ensure that our statistical tools keep pace.<\/p>\n<p><strong>Read the full paper:<\/strong>\u00a0Vidgen, B. and Yasseri, T., (2016) P-values: Misunderstood and Misused, Frontiers in Physics, 4:6. <a href=\"http:\/\/journal.frontiersin.org\/article\/10.3389\/fphy.2016.00006\/full\">http:\/\/dx.doi.org\/10.3389\/fphy.2016.00006<\/a><\/p>\n<hr \/>\n<p><a href=\"http:\/\/www.oii.ox.ac.uk\/people\/?id=451\">Bertie Vidgen<\/a>\u00a0is a doctoral student at the Oxford Internet Institute researching\u00a0far-right extremism in online contexts. He is supervised by <a href=\"http:\/\/www.oii.ox.ac.uk\/people\/yasseri\/\">Dr Taha Yasseri<\/a>, a research fellow at the Oxford Internet Institute interested in\u00a0how\u00a0Big Data can be used to understand human dynamics,\u00a0government-society interactions,\u00a0mass collaboration, and\u00a0opinion\u00a0dynamics.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>P-values are widely used in the social sciences, especially &#8216;big data&#8217; studies, to calculate statistical significance. Yet they are\u00a0widely criticized for being easily hacked, and for not telling us what we want to know. Many have argued that, as a result, research is wrong far more often than we realize. In their recent article\u00a0P-values: Misunderstood [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,16,17],"tags":[36,37,42,45,47,64,99,116,179,190,225,228,229,232],"_links":{"self":[{"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/posts\/3604"}],"collection":[{"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/comments?post=3604"}],"version-history":[{"count":1,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/posts\/3604\/revisions"}],"predecessor-version":[{"id":3664,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/posts\/3604\/revisions\/3664"}],"wp:attachment":[{"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/media?parent=3604"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/categories?post=3604"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ensr.oii.ox.ac.uk\/wp-json\/wp\/v2\/tags?post=3604"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}