computational social science – The Policy and Internet Blog https://ensr.oii.ox.ac.uk Understanding public policy online Mon, 07 Dec 2020 14:24:52 +0000 en-GB hourly 1 In a world of “connective action” — what makes an influential Twitter user? https://ensr.oii.ox.ac.uk/in-a-world-of-connective-action-what-makes-an-influential-twitter-user/ Sun, 10 Jun 2018 08:07:45 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4183 A significant part of political deliberation now takes place on online forums and social networking sites, leading to the idea that collective action might be evolving into “connective action”. The new level of connectivity (particularly of social media) raises important questions about its role in the political process. but understanding important phenomena, such as social influence, social forces, and digital divides, requires analysis of very large social systems, which traditionally has been a challenging task in the social sciences.

In their Policy & Internet article “Understanding Popularity, Reputation, and Social Influence in the Twitter Society“, David Garcia, Pavlin Mavrodiev, Daniele Casati, and Frank Schweitzer examine popularity, reputation, and social influence on Twitter using network information on more than 40 million users. They integrate measurements of popularity, reputation, and social influence to evaluate what keeps users active, what makes them more popular, and what determines their influence in the network.

Popularity in the Twitter social network is often quantified as the number of followers of a user. That implies that it doesn’t matter why some user follows you, or how important she is, your popularity only measures the size of your audience. Reputation, on the other hand, is a more complicated concept associated with centrality. Being followed by a highly reputed user has a stronger effect on one’s reputation than being followed by someone with low reputation. Thus, the simple number of followers does not capture the recursive nature of reputation.

In their article, the authors examine the difference between popularity and reputation on the process of social influence. They find that there is a range of values in which the risk of a user becoming inactive grows with popularity and reputation. Popularity in Twitter resembles a proportional growth process that is faster in its strongly connected component, and that can be accelerated by reputation when users are already popular. They find that social influence on Twitter is mainly related to popularity rather than reputation, but that this growth of influence with popularity is sublinear. In sum, global network metrics are better predictors of inactivity and social influence, calling for analyses that go beyond local metrics like the number of followers.

We caught up with the authors to discuss their findings:

Ed.: Twitter is a convenient data source for political scientists, but they tend to get criticised for relying on something that represents only a tiny facet of political activity. But Twitter is presumably very useful as a way of uncovering more fundamental / generic patterns of networked human interaction?

David: Twitter as a data source to study human behaviour is both powerful and limited. Powerful because it allows us to quantify and analyze human behaviour at scales and resolutions that are simply impossible to reach with traditional methods, such as experiments or surveys. But also limited because not every aspect of human behaviour is captured by Twitter and using its data comes with significant methodological challenges, for example regarding sampling biases or platform changes. Our article is an example of an analysis of general patterns of popularity and influence that are captured by spreading information in Twitter, which only make sense beyond the limitations of Twitter when we frame the results with respect to theories that link our work to previous and future scientific knowledge in the social sciences.

Ed.: How often do theoretical models (i.e. describing the behaviour of a network in theory) get linked up with empirical studies (i.e. of a network like Twitter in practice) but also with qualitative studies of actual Twitter users? And is Twitter interesting enough in itself for anyone to attempt to develop an overall theoretico-empirico-qualitative theory about it?

David: The link between theoretical models and large-scale data analyses of social media is less frequent than we all wish. But the gap between disciplines seems to be narrowing in the last years, with more social scientists using online data sources and computer scientists referring better to theories and previous results in the social sciences. What seems to be quite undeveloped is an interface with qualitative methods, specially with large-scale analyses like ours.

Qualitative methods can provide what data science cannot: questions about important and relevant phenomena that then can be explained within a wider theory if validated against data. While this seems to me as a fertile ground for interdisciplinary research, I doubt that Twitter in particular should be the paragon of such combination of approaches. I advocate for starting research from the aspect of human behaviour that is the subject of study, and not from a particularly popular social media platform that happens to be used a lot today, but might not be the standard tomorrow.

Ed.: I guess I’ve see a lot of Twitter networks in my time, but not much in the way of directed networks, i.e. showing direction of flow of content (i.e. influence, basically) — or much in the way of a time element (i.e. turning static snapshots into dynamic networks). Is that fair, or am I missing something? I imagine it would be fun to see how (e.g.) fake news or political memes propagate through a network?

David: While Twitter provides amazing volumes of data, its programming interface is notorious for the absence of two key sources: the date when follower links are created and the precise path of retweets. The reason for the general picture of snapshots over time is that researchers cannot fully trace back the history of a follower network, they can only monitor it with certain frequency to overcome the fact that links do not have a date attached.

The generally missing picture of flows of information is because when looking up a retweet, we can see the original tweet that is being retweeted, but not if the retweet is of a retweet of a friend. This way, without special access to Twitter data or alternative sources, all information flows look like stars around the original tweet, rather than propagation trees through a social network that allow the precise analysis of fake news or memes.

Ed.: Given all the work on Twitter, how well-placed do you think social scientists would be to advise a political campaign on “how to create an influential network” beyond just the obvious (Tweet well and often, and maybe hire a load of bots). i.e. are there any “general rules” about communication structure that would be practically useful to campaigning organisations?

David: When we talk about influence on Twitter, we usually talk about rather superficial behaviour, such as retweeting content or clicking on a link. This should not be mistaken as a more substantial kind of influence, the kind that makes people change their opinion or go to vote. Evaluating the real impact of Twitter influence is a bottleneck for how much social scientists can advise a political campaign. I would say than rather than providing general rules that can be applied everywhere, social scientists and computer scientists can be much more useful when advising, tracking, and optimizing individual campaigns that take into account the details and idiosyncrasies of the people that might be influenced by the campaign.

Ed.: Random question: but where did “computational social science” emerge from – is it actually quite dependent on Twitter (and Wikipedia?), or are there other commonly-used datasets? And are computational social science, “big data analytics”, and (social) data science basically describing the same thing?

David: Tracing back the meaning and influence of “computational social science” could take a whole book! My impression is that the concept started few decades ago as a spin on “sociophysics”, where the term “computational” was used as in “computational model”, emphasizing a focus on social science away from toy model applications from physics. Then the influential Science article by David Lazer and colleagues in 2009 defined the term as the application of digital trace datasets to test theories from the social sciences, leaving the whole computational modelling outside the frame. In that case, “computational” was used more as it is used in “computational biology”, to refer to social science with increased power and speed thanks to computer-based technologies. Later it seems to have converged back into a combination of both the modelling and the data analysis trends, as in the “Manifesto of computational social science” by Rosaria Conte and colleagues in 2012, inspired by the fact that we need computational modelling techniques from complexity science to understand what we observe in the data.

The Twitter and Wikipedia dependence of the field is just a path dependency due to the ease and open access to those datasets, and a key turning point in the field is to be able to generalize beyond those “model organisms”, as Zeynep Tufekci calls them. One can observe these fads in the latest computer science conferences, with the rising ones being Reddit and Github, or when looking at earlier research that heavily used product reviews and blog datasets. Computational social science seems to be maturing as a field, make sense out of those datasets and not just telling cool data-driven stories about one website or another. Perhaps we are beyond the peak of inflated expectations of the hype curve and the best part is yet to come.

With respect to big data and social data science, it is easy to get lost in the field of buzzwords. Big data analytics only deals with the technologies necessary to process large volumes of data, which could come from any source including social networks but also telescopes, seismographs, and any kind of sensor. These kind of techniques are only sometimes necessary in computational social science, but are far from the core of topics of the field.

Social data science is closer, but puts a stronger emphasis on problem-solving rather than testing theories from the social sciences. When using “data science” we usually try to emphasize a predictive or explorative aspect, rather than the confirmatory or generative approach of computational social science. The emphasis on theory and modelling of computational social science is the key difference here, linking back to my earlier comment about the role of computational modelling and complexity science in the field.

Ed.: Finally, how successful do you think computational social scientists will be in identifying any underlying “social patterns” — i.e. would you agree that the Internet is a “Hadron Collider” for social science? Or is society fundamentally too chaotic and unpredictable?

David: As web scientists like to highlight, the Web (not the Internet, which is the technical infrastructure connecting computers) is the largest socio-technical artifact ever produced by humanity. Rather than as a Hadron Collider, which is a tool to make experiments, I would say that the Web can be the Hubble telescope of social science: it lets us observe human behaviour at an amazing scale and resolution, not only capturing big data but also, fast, long, deep, mixed, and weird data that we never imagined before.

While I doubt that we will be able to predict society in some sort of “psychohistory” manner, I think that the Web can help us to understand much more about ourselves, including our incentives, our feelings, and our health. That can be useful knowledge to make decisions in the future and to build a better world without the need to predict everything.

Read the full article: Garcia, D., Mavrodiev, P., Casati, D., and Schweitzer, F. (2017) Understanding Popularity, Reputation, and Social Influence in the Twitter Society. Policy & Internet 9 (3) doi:10.1002/poi3.151

David Garcia was talking to blog editor David Sutcliffe.

]]>
Did you consider Twitter’s (lack of) representativeness before doing that predictive study? https://ensr.oii.ox.ac.uk/did-you-consider-twitters-lack-of-representativeness-before-doing-that-predictive-study/ Mon, 10 Apr 2017 06:12:36 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4062 Twitter data have many qualities that appeal to researchers. They are extraordinarily easy to collect. They are available in very large quantities. And with a simple 140-character text limit they are easy to analyze. As a result of these attractive qualities, over 1,400 papers have been published using Twitter data, including many attempts to predict disease outbreaks, election results, film box office gross, and stock market movements solely from the content of tweets.

Easy availability of Twitter data links nicely to a key goal of computational social science. If researchers can find ways to impute user characteristics from social media, then the capabilities of computational social science would be greatly extended. However few papers consider the digital divide among Twitter users. But the question of who uses Twitter has major implications for research attempts to use the content of tweets for inference about population behaviour. Do Twitter users share identical characteristics with the population interest? For what populations are Twitter data actually appropriate?

A new article by Grant Blank published in Social Science Computer Review provides a multivariate empirical analysis of the digital divide among Twitter users, comparing Twitter users and nonusers with respect to their characteristic patterns of Internet activity and to certain key attitudes. It thereby fills a gap in our knowledge about an important social media platform, and it joins a surprisingly small number of studies that describe the population that uses social media.

Comparing British (OxIS survey) and US (Pew) data, Grant finds that generally, British Twitter users are younger, wealthier, and better educated than other Internet users, who in turn are younger, wealthier, and better educated than the offline British population. American Twitter users are also younger and wealthier than the rest of the population, but they are not better educated. Twitter users are disproportionately members of elites in both countries. Twitter users also differ from other groups in their online activities and their attitudes.

Under these circumstances, any collection of tweets will be biased, and inferences based on analysis of such tweets will not match the population characteristics. A biased sample can’t be corrected by collecting more data; and these biases have important implications for research based on Twitter data, suggesting that Twitter data are not suitable for research where representativeness is important, such as forecasting elections or gaining insight into attitudes, sentiments, or activities of large populations.

Read the full article: Blank, G. (2016) The Digital Divide Among Twitter Users and Its Implications for Social Research. Social Science Computer Review. DOI: 10.1177/0894439316671698

We caught up with Grant to explore the implications of the findings:

Ed.: Despite your cautions about lack of representativeness, you mention that the bias in Twitter could actually make it useful to study (for example) elite behaviours: for example in political communication?

Grant: Yes. If you want to study elites and channels of elite influence then Twitter is a good candidate. Twitter data could be used as one channel of elite influence, along with other online channels like social media or blog posts, and offline channels like mass media or lobbying. There is an ecology of media and Twitter is one part.

Ed.: You also mention that Twitter is actually quite successful at forecasting certain offline, commercial behaviours (e.g. box office receipts).

Grant: Right. Some commercial products are disproportionately used by wealthier or younger people. That certainly would include certain forms of mass entertainment like cinema. It also probably includes a number of digital products like smartphones, especially more expensive phones, and wearable devices like a Fitbit. If a product is disproportionately bought by the same population groups that use Twitter then it may be possible to forecast sales using Twitter data. Conversely, products disproportionately used by poorer or older people are unlikely to be predictable using Twitter.

Ed.: Is there a general trend towards abandoning expensive, time-consuming, multi-year surveys and polling? And do you see any long-term danger in that? i.e. governments and media (and academics?) thinking “Oh, we can just get it off social media now”.

Grant: Yes and no. There are certainly people who are thinking about it and trying to make it work. The ease and low cost of social media is very seductive. However, that has to be balanced against major weaknesses. First the population using Twitter (and other social media) is unclear, but it is not a random sample. It is just a population of Twitter users, which is not a population of interest to many.

Second, tweets are even less representative. As I point out in the article, over 40% of people with a Twitter account have never sent a tweet, and the top 15% of users account for 85% of tweets. So tweets are even less representative of any real-world population than Twitter users. What these issues mean is that you can’t calculate measures of error or confidence intervals from Twitter data. This is crippling for many academic and government uses.

Third, Twitter’s limited message length and simple interface tends to give it advantages on devices with restricted input capability, like phones. It is well-suited for short, rapid messages. These characteristics tend to encourage Twitter use for political demonstrations, disasters, sports events, and other live events where reports from an on-the-spot observer are valuable. This suggests that Twitter usage is not like other social media or like email or blogs.

Fourth, researchers attempting to extract the meaning of words have 140 characters to analyze and they are littered with abbreviations, slang, non-standard English, misspellings and links to other documents. The measurement issues are immense. Measurement is hard enough in surveys when researchers have control over question wording and can do cognitive interviews to understand how people interpret words.

With Twitter (and other social media) researchers have no control over the process that generated the data, and no theory of the data generating process. Unlike surveys, social media analysis is not a general-purpose tool for research. Except in limited areas where these issues are less important, social media is not a promising tool.

Ed.: How would you respond to claims that for example Facebook actually had more accurate political polling than anyone else in the recent US Election? (just that no-one had access to its data, and Facebook didn’t say anything)?

Grant: That is an interesting possibility. The problem is matching Facebook data with other data, like voting records. Facebook doesn’t know where people live. Finding their location would not be an easy problem. It is simpler because Facebook would not need an actual address; it would only need to locate the correct voting district or the state (for the Electoral College in US Presidential elections). Still, there would be error of unknown magnitude, probably impossible to calculate. It would be a very interesting research project. Whether it would be more accurate than a poll is hard to say.

Ed.: Do you think social media (or maybe search data) scraping and analysis will ever successfully replace surveys?

Grant: Surveys are such versatile, general purpose tools. They can be used to elicit many kinds information on all kinds of subjects from almost any population. These are not characteristics of social media. There is no real danger that surveys will be replaced in general.

However, I can see certain specific areas where analysis of social media will be useful. Most of these are commercial areas, like consumer sentiments. If you want to know what people are saying about your product, then going to social media is a good, cheap source of information. This is especially true if you sell a mass market product that many people use and talk about; think: films, cars, fast food, breakfast cereal, etc.

These are important topics to some people, but they are a subset of things that surveys are used for. Too many things are not talked about, and some are very important. For example, there is the famous British reluctance to talk about money. Things like income, pensions, and real estate or financial assets are not likely to be common topics. If you are a government department or a researcher interested in poverty, the effect of government assistance, or the distribution of income and wealth, you have to depend on a survey.

There are a lot of other situations where surveys are indispensable. For example, if the OII wanted to know what kind of jobs OII alumni had found, it would probably have to survey them.

Ed.: Finally .. 1400 Twitter articles in .. do we actually know enough now to say anything particularly useful or concrete about it? Are we creeping towards a Twitter revelation or consensus, or is it basically 1400 articles saying “it’s all very complicated”?

Grant: Mostly researchers have accepted Twitter data at face value. Whatever people write in a tweet, it means whatever the researcher thinks it means. This is very easy and it avoids a whole collection of complex issues. All the hard work of understanding how meaning is constructed in Twitter and how it can be measured is yet to be done. We are a long way from understanding Twitter.

Read the full article: Blank, G. (2016) The Digital Divide Among Twitter Users and Its Implications for Social Research. Social Science Computer Review. DOI: 10.1177/0894439316671698


Grant Blank was talking to blog editor David Sutcliffe.

]]>
Edit wars! Examining networks of negative social interaction https://ensr.oii.ox.ac.uk/edit-wars-examining-networks-of-negative-social-interaction/ Fri, 04 Nov 2016 10:05:06 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3893
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). More details:
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). Read the full article for details.
While network science has significantly advanced our understanding of the structure and dynamics of the human social fabric, much of the research has focused on positive relations and interactions such as friendship and collaboration. Considerably less is known about networks of negative social interactions such as distrust, disapproval, and disagreement. While these interactions are less common, they strongly affect people’s psychological well-being, physical health, and work performance.

Negative interactions are also rarely explicitly declared and recorded, making them hard for scientists to study. In their new article on the structural and temporal features of negative interactions in the community, Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri use complex network methods to analyze patterns in the timing and configuration of reverts of article edits to Wikipedia. In large online collaboration communities like Wikipedia, users sometimes undo or downrate contributions made by other users; most often to maintain and improve the collaborative project. However, it is also possible that these actions are social in nature, with previous research acknowledging that they could also imply negative social interactions.

The authors find evidence that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. However, they don’t find evidence that editors “pay forward” a revert, coordinate with others to revert an editor, or revert different editors serially. These interactions can be related to the status of the editors. Even though the individual reverts might not necessarily be negative social interactions, their analysis points to the existence of certain patterns of negative social dynamics within the editorial community. Some of these patterns have not been previously explored and certainly carry implications for Wikipedia’s own knowledge collection practices — and can also be applied to other large-scale collaboration networks to identify the existence of negative social interactions.

Read the full article: Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri (2016) Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration. Scientific Reports 6 doi:10.1038/srep36333

We caught up with the authors to explore the implications of the work.

Ed: You find that certain types of negative social interactions and status considerations interfere with knowledge production on Wikipedia. What could or should Wikipedia do about it — or is it not actually a significant problem?

Taha: We believe it is an issue to consider. While the Wikipedia community might not be able to directly cope with it, as negative social interactions are intrinsic to human societies, an important consequence of our report would be to use the information in Wikipedia articles with extra care — and also to bear in mind that Wikipedia content might carry more subjectivity compared to a professionally written encyclopaedia.

Ed: Does reverting behaviour correlate with higher quality articles (i.e. with a lot of editorial attention) or simply with controversial topics — i.e. do you see reverting behaviour as generally a positive or negative thing?

Taha: In a different project we looked at the correlation between controversy and quality. We observed that controversy, up to a certain level, is correlated with higher quality of the article, specifically as far as the completeness of the article is concerned. However, the articles with very high scores of controversy, started to show less quality. In short, a certain amount of controversy helps the articles to become more complete, but too much controversy is a bad sign.

Ed: Do you think your results say more about the structure of Wikipedia, the structure of crowds, or about individuals?

Taha: Our results shed light on some of the most fundamental patterns in human behavior. It is one of the few examples in which a large dataset of negative interactions is analysed and the dynamics of negativity are studied. In this sense, this article is more about human behavior in interaction with other community members in a collaborative environment. However, because our data come from Wikipedia, I believe there are also lessons to be learnt about Wikipedia itself.

Ed: You note that by focusing on the macro-level you miss the nuanced understanding that thick ethnographic descriptions can produce. How common is it for computational social scientists to work with ethnographers? What would you look at if you were to work with ethnographers on this project?

Taha: One of the drawbacks in big data analysis in computational social science is the small depth of the analysis. We are lacking any demographic information about the individuals that we study. We can draw conclusions about the community of Wikipedia editors in a certain language, but that is by no means specific enough. An ethnographic approach, which would benefit our research tremendously, would go deeper in analyzing individuals and studying the features and attributes which lead to certain behavior. For example, we report, at a high level, that “status” determines editors’ actions to a good extend, but of course the mechanisms behind this observation can only be explained based on ethnographic analysis.

Ed: I guess Wikipedia (whether or not unfairly) is commonly associated with edit wars — while obviously also being a gigantic success: how about other successful collaborative platforms — how does Wikipedia differ from Zooniverse, for example?

Taha: There is no doubt that Wikipedia is a huge success and probably the largest collaborative project in the history of mankind. Our research mostly focuses on its dark side, but it does not question its success and value. Compared to other collaborative projects, such as Zooniverse, the main difference is in the management model. Wikipedia is managed and run by the community of editors. Very little top-down management is employed in Wikipedia. Whereas in Zooniverse for instance, the overall structure of the project is designed by a few researchers and the crowd can only act within a pre-determined framework. For more of these sort of comparisons, I suggest to look at our HUMANE project, in which we provide a typology and comparison for a wide range of Human-Machine Networks.

Ed: Finally — do you edit Wikipedia? And have you been on the receiving end of reverts yourself?

Taha: I used to edit Wikipedia much more. And naturally I have had my own share of reverts, at both ends!


Taha Yasseri was talking to blog editor David Sutcliffe.

]]>
P-values are widely used in the social sciences, but often misunderstood: and that’s a problem. https://ensr.oii.ox.ac.uk/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem/ https://ensr.oii.ox.ac.uk/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem/#comments Mon, 07 Mar 2016 18:53:29 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3604 P-values are widely used in the social sciences, especially ‘big data’ studies, to calculate statistical significance. Yet they are widely criticized for being easily hacked, and for not telling us what we want to know. Many have argued that, as a result, research is wrong far more often than we realize. In their recent article P-values: Misunderstood and Misused OII Research Fellow Taha Yasseri and doctoral student Bertie Vidgen argue that we need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process, if we are to maximise the value of statistical analysis.

“Significant”: an illustration of selective reporting and statistical significance from XKCD. Available online at http://xkcd.com/882/
“Significant”: an illustration of selective reporting and
statistical significance from XKCD. Available online at
http://xkcd.com/882/

In an unprecedented move, the American Statistical Association recently released a statement (March 7 2016) warning against how p-values are currently used. This reflects a growing concern in academic circles that whilst a lot of attention is paid to the huge impact of big data and algorithmic decision-making, there is considerably less focus on the crucial role played by statistics in enabling effective analysis of big data sets, and making sense of the complex relationships contained within them. Because much as datafication has created huge social opportunities, it has also brought to the fore many problems and limitations with current statistical practices. In particular, the deluge of data has made it crucial that we can work out whether studies are ‘significant’. In our paper, published three days before the ASA’s statement, we argued that the most commonly used tool in the social sciences for calculating significance – the p-value – is misused, misunderstood and, most importantly, doesn’t tell us what we want to know.

The basic problem of ‘significance’ is simple: it is simply unpractical to repeat an experiment an infinite number of times to make sure that what we observe is “universal”. The same applies to our sample size: we are often unable to analyse a “whole population” sample and so have to generalize from our observations on a limited size sample to the whole population. The obvious problem here is that what we observe is based on a limited number of experiments (sometimes only one experiment) and from a limited size sample, and as such could have been generated by chance rather than by an underlying universal mechanism! We might find it impossible to make the same observation if we were to replicate the same experiment multiple times or analyse a larger sample. If this is the case then we will mischaracterise what is happening – which is a really big problem given the growing importance of ‘evidence-based’ public policy. If our evidence is faulty or unreliable then we will create policies, or intervene in social settings, in an equally faulty way.

The way that social scientists have got round this problem (that samples might not be representative of the population) is through the ‘p-value’. The p-value tells you the probability of making a similar observation in a sample with the same size and in the same number of experiments, by pure chance In other words,  it is actually telling you is how likely it is that you would see the same relationship between X and Y even if no relationship exists between them. On the face of it this is pretty useful, and in the social sciences we normally say that a p-value of 1 in 20 means the results are significant. Yet as the American Statistical Association has just noted, even though they are incredibly widespread many researchers mis-interpret what p-values really mean.

In our paper we argued that p-values are misunderstood and misused because people think the p-value tells you much more than it really does. In particular, people think the p-value tells you (i) how likely it is that a relationship between X and Y really exists and (ii) the percentage of all findings that are false (which is actually something different called the False Discovery Rate). As a result, we are far too confident that academic studies are correct. Some commentators have argued that at least 30% of studies are wrong because of problems related to p-values: a huge figure. One of the main problems is that p-values can be ‘hacked’ and as such easily manipulated to show significance when none exists.

If we are going to base public policy (and as such public funding) on ‘evidence’ then we need to make sure that the evidence used is reliable. P-values need to be used far more rigorously, with significance levels of 0.01 or 0.001 seen as standard. We also need to start being more open and transparent about how results are recorded. It is a fine line between data exploration (a legitimate academic exercise) and ‘data dredging’ (where results are manipulated in order to find something noteworthy). Only if researchers are honest about what they are doing will we be able to maximise the potential benefits offered by Big Data. Luckily there are some great initiatives – like the Open Science Framework – which improve transparency around the research process, and we fully endorse researchers making use of these platforms.

Scientific knowledge advances through corroboration and incremental progress, and it is crucial that we use and interpret statistics appropriately to ensure this progress continues. As our knowledge and use of big data methods increase, we need to ensure that our statistical tools keep pace.

Read the full paper: Vidgen, B. and Yasseri, T., (2016) P-values: Misunderstood and Misused, Frontiers in Physics, 4:6. http://dx.doi.org/10.3389/fphy.2016.00006


Bertie Vidgen is a doctoral student at the Oxford Internet Institute researching far-right extremism in online contexts. He is supervised by Dr Taha Yasseri, a research fellow at the Oxford Internet Institute interested in how Big Data can be used to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

]]>
https://ensr.oii.ox.ac.uk/many-of-us-scientists-dont-understand-p-values-and-thats-a-problem/feed/ 1