Ralph Schroeder, Oxford Internet Institute, University of Oxford
Eric Meyer, Oxford Internet Institute, University of Oxford
Abstract
Arguments abound that big data is old wine in new bottles. Alternatively, it is claimed that the impact of big data is unprecedented. Both are wrong, and the reason why neither of these claims can be substantiated is because they are not based on an understanding of what would make big data more powerful.
Currently, there is no systematic sociological analysis of big data, and indeed a paucity of social science understanding of the role of data in society in general (the exceptions, including Star and Bowker, will be reviewed in the full paper). This paper will start to address this gap, focusing in particular on digital research. It will develop a typology of uses of digital data, examine a number of cases, and draw out some lessons for policy. In particular, it will distinguish between areas where there are policy challenges - and where not – since there is often a confusion between the two.
First, it is necessary to define big data: the capture, aggregation and manipulation of digital information on scales that are made possible due to the availability of unprecedented distributed storage and analysis capacity. Big data has recently become widespread, but it is useful to distinguish between knowledge in three domains: scientific, governmental, and commercial. These may often be difficult to separate in practice, but as we shall see, they give rise to quite different societal issues.
Scientific research includes social science, but for academic social science knowledge to be used in the manipulation of human behaviour it must typically be translated into policy, such as health or education policy, thus taking it effectively into the realm of government (or, more rarely, the commercial realm). As long as it does not translate into the manipulation of human behaviour, the challenges are mainly related to research ethics. This puts into perspective a recent article (Burrows and Savage) which generated much debate: this article argued that the massive growth in data that is available in the commercial world - from store loyalty cards, CCTV cameras, tracking communication interactions and the like – would entail that social scientists will be outflanked by commercial research in its ability to understand society: perhaps, but it is worth bearing the different aims of research in mind: academic social science is constrained and enabled by using digital data about human behaviour towards advancing social science knowledge per se, whereas commercial knowledge is enabled (via resources, for example) to advancing knowledge in order with the limited (but also powerful) aim of influencing economically-relevant behaviour (which, in turn, has limited bearing on social scientific knowledge).
In the commercial realm, there has been a growth of analytics about online behaviour, such as from the use of search engines, social networking sites, online shopping and financial ‘moods’. Companies are able to manipulate human behaviour on the basis of gathering data about people. This has been going on for some time (Beniger, Yates), and it is illegal only if it falls afoul of laws such as those pertaining to privacy and data retention. The reason it is nevertheless currently a topic of discussion is because, again, an increase in the powerfulness of knowledge is raising the question of the manipulability of human behaviour. The same applies, though with the different constraints of possibilities of citizens’ rights vis-a-vis governments, in the realm of government efforts to collect and use data. Again, these are have recently become more powerful uses of data, in this case about the politically relevant behaviour of populations. The key issue in both cases is that peoples’ behaviour is being ‘nudged’ in the sense that, because of the information that is available about our behaviour, we can be induced to make choices – politically- or economically-relevant - that we would not have otherwise made. An increasing ability to do so means that there will be slippery slopes, and boundaries will be tested when they violate laws or rights (regarding political or economic behaviour that can be manipulated) that should not be crossed because this manipulation violates citizens’ or consumers’ rights.
A distinction must therefore be made between big data aimed at human populations for the purpose of social - economic or political - engineering, as against for the advancement of knowledge. Clearly, there will be overlap between them, but the separation is essential for understanding why knowledge is thought to be ‘threatening’. The ‘threat’ posed by big data can be explained by reference to a general unease about the advance of knowledge, and how expanded knowledge must lead to increasingly intrusive invasions of people’s personal lives. This threat is due to the perception of technological determinism, the idea that technology exercises inexorable control over society, with an attendant feeling of powerlessness.
Yet this fear is also widely misunderstood: The advance of scientific knowledge and of technology, if properly understood, lead, in the first instance, only to more powerful knowledge of the physical world and transformations of the natural environment, without penetrating into everyday culture. Much big data, like scientific knowledge generally, simply leads to greater control over the natural environment excluding humans, and the consequences of this enhancement of this power are therefore also those pertaining to this environment. Greater manipulability of the natural environment – a disenchanted and impersonal world - can coexist with a re-enchanted cultural life, and scientific knowledge about human behaviour does not necessarily diminish the ethical value and responsibility of human beings: only a zero-sum view of scientific and technological determinism leads to a mistaken fear of advancing knowledge.
The distinction made between these (science and technology advances for the human-made control of the natural environment) as against political and commercial human consequences of big data and possibilities of manipulating behaviour can thus pinpoint specific threats (for anonymity, privacy, surveillance and the like). When knowledge about human behaviour is subject to symbolic manipulation (‘mathematization’), it becomes easier to control. When big data can anticipate what we are thinking (our search queries, or products or places that we are seeking online, or connections to people we are trying to make), this will seem uncanny or creepy, and there will be a ‘big brother’ feel to the information that governments and companies can use. This increased power of knowledge is a product of greater probabilistic and taxonomic power – being able to predict our searches, our demand for products and services, and being able to pinpoint who we are in order to do so.
Armed with these distinctions, we can identify where fears of big data are baseless, brought about by misunderstandings of scientific and technological determinism, but also where to anticipate where boundaries are being crossed in manipulating behaviour, and thus to determine and set limits to these uses of knowledge.