Ethics – The Policy and Internet Blog https://ensr.oii.ox.ac.uk Understanding public policy online Mon, 07 Dec 2020 14:25:46 +0000 en-GB hourly 1 Could Counterfactuals Explain Algorithmic Decisions Without Opening the Black Box? https://ensr.oii.ox.ac.uk/could-counterfactuals-explain-algorithmic-decisions-without-opening-the-black-box/ Mon, 15 Jan 2018 10:37:21 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4465 The EU General Data Protection Regulation (GDPR) has sparked much discussion about the “right to explanation” for the algorithm-supported decisions made about us in our everyday lives. While there’s an obvious need for transparency in the automated decisions that are increasingly being made in areas like policing, education, healthcare and recruitment, explaining how these complex algorithmic decision-making systems arrive at any particular decision is a technically challenging problem—to put it mildly.

In their article “Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR” which is forthcoming in the Harvard Journal of Law & Technology, Sandra Wachter, Brent Mittelstadt, and Chris Russell present the concept of “unconditional counterfactual explanations” as a novel type of explanation of automated decisions that could address many of these challenges. Counterfactual explanations describe the minimum conditions that would have led to an alternative decision (e.g. a bank loan being approved), without the need to describe the full logic of the algorithm.

Relying on counterfactual explanations as a means to help us act rather than merely to understand could help us gauge the scope and impact of automated decisions in our lives. They might also help bridge the gap between the interests of data subjects and data controllers, which might otherwise be a barrier to a legally binding right to explanation.

We caught up with the authors to explore the role of algorithms in our everyday lives, and how a “right to explanation” for decisions might be achievable in practice:

Ed: There’s a lot of discussion about algorithmic “black boxes” — where decisions are made about us, using data and algorithms about which we (and perhaps the operator) have no direct understanding. How prevalent are these systems?

Sandra: Basically, every decision that can be made by a human can now be made by an algorithm. Which can be a good thing. Algorithms (when we talk about artificial intelligence) are very good at spotting patterns and correlations that even experienced humans might miss, for example in predicting disease. They are also very cost efficient—they don’t get tired, and they don’t need holidays. This could help to cut costs, for example in healthcare.

Algorithms are also certainly more consistent than humans in making decisions. We have the famous example of judges varying the severity of their judgements depending on whether or not they’ve had lunch. That wouldn’t happen with an algorithm. That’s not to say algorithms are always going to make better decisions: but they do make more consistent ones. If the decision is bad, it’ll be distributed equally, but still be bad. Of course, in a certain way humans are also black boxes—we don’t understand what humans do either. But you can at least try to understand an algorithm: it can’t lie, for example.

Brent: In principle, any sector involving human decision-making could be prone to decision-making by algorithms. In practice, we already see algorithmic systems either making automated decisions or producing recommendations for human decision-makers in online search, advertising, shopping, medicine, criminal justice, etc. The information you consume online, the products you are recommended when shopping, the friends and contacts you are encouraged to engage with, even assessments of your likelihood to commit a crime in the immediate and long-term future—all of these tasks can currently be affected by algorithmic decision-making.

Ed: I can see that algorithmic decision-making could be faster and better than human decisions in many situations. Are there downsides?

Sandra: Simple algorithms that follow a basic decision tree (with parameters decided by people) can be easily understood. But we’re now also using much more complex systems like neural nets that act in a very unpredictable way, and that’s the problem. The system is also starting to become autonomous, rather than being under the full control of the operator. You will see the output, but not necessarily why it got there. This also happens with humans, of course: I could be told by a recruiter that my failure to land a job had nothing to do with my gender (even if it did); an algorithm, however, would not intentionally lie. But of course the algorithm might be biased against me if it’s trained on biased data—thereby reproducing the biases of our world.

We have seen that the COMPAS algorithm used by US judges to calculate the probability of re-offending when making sentencing and parole decisions is a major source of discrimination. Data provenance is massively important, and probably one of the reasons why we have biased decisions. We don’t necessarily know where the data comes from, and whether it’s accurate, complete, biased, etc. We need to have lots of standards in place to ensure that the data set is unbiased. Only then can the algorithm produce nondiscriminatory results.

A more fundamental problem with predictions is that you might never know what would have happened—as you’re just dealing with probabilities; with correlations in a population, rather than with causalities. Another problem is that algorithms might produce correct decisions, but not necessarily fair ones. We’ve been wrestling with the concept of fairness for centuries, without consensus. But lack of fairness is certainly something the system won’t correct itself—that’s something that society must correct.

Brent: The biases and inequalities that exist in the real world and in real people can easily be transferred to algorithmic systems. Humans training learning systems can inadvertently or purposefully embed biases into the model, for example through labelling content as ‘offensive’ or ‘inoffensive’ based on personal taste. Once learned, these biases can spread at scale, exacerbating existing inequalities. Eliminating these biases can be very difficult, hence we currently see much research done on the measurement of fairness or detection of discrimination in algorithmic systems.

These systems can also be very difficult—if not impossible—to understand, for experts as well as the general public. We might traditionally expect to be able to question the reasoning of a human decision-maker, even if imperfectly, but the rationale of many complex algorithmic systems can be highly inaccessible to people affected by their decisions. These potential risks aren’t necessarily reasons to forego algorithmic decision-making altogether; rather, they can be seen as potential effects to be mitigated through other means (e.g. a loan programme weighted towards historically disadvantaged communities), or at least to be weighed against the potential benefits when choosing whether or not to adopt a system.

Ed: So it sounds like many algorithmic decisions could be too complex to “explain” to someone, even if a right to explanation became law. But you propose “counterfactual explanations” as an alternative— i.e. explaining to the subject what would have to change (e.g. about a job application) for a different decision to be arrived at. How does this simplify things?

Brent: So rather than trying to explain the entire rationale of a highly complex decision-making process, counterfactuals allow us to provide simple statements about what would have needed to be different about an individual’s situation to get a different, preferred outcome. You basically work from the outcome: you say “I am here; what is the minimum I need to do to get there?” By providing simple statements that are generally meaningful, and that reveal a small bit of the rationale of a decision, the individual has grounds to change their situation or contest the decision, regardless of their technical expertise. Understanding even a bit of how a decision is made is better than being told “sorry, you wouldn’t understand”—at least in terms of fostering trust in the system.

Sandra: And the nice thing about counterfactuals is that they work with highly complex systems, like neural nets. They don’t explain why something happened, but they explain what happened. And three things people might want to know are:

(1) What happened: why did I not get the loan (or get refused parole, etc.)?

(2) Information so I can contest the decision if I think it’s inaccurate or unfair.

(3) Even if the decision was accurate and fair, tell me what I can do to improve my chances in the future.

Machine learning and neural nets make use of so much information that individuals have really no oversight of what they’re processing, so it’s much easier to give someone an explanation of the key variables that affected the decision. With the counterfactual idea of a “close possible world” you give an indication of the minimal changes required to get what you actually want.

Ed: So would a series of counterfactuals (e.g. “over 18” “no prior convictions” “no debt”) essentially define a space within which a certain decision is likely to be reached? This decision space could presumably be graphed quite easily, to help people understand what factors will likely be important in reaching a decision?

Brent: This would only work for highly simplistic, linear models, which are not normally the type that confound human capacities for understanding. The complex systems that we refer to as ‘black boxes’ are highly dimensional and involve a multitude of (probabilistic) dependencies between variables that can’t be graphed simply. It may be the case that if I were aged between 35-40 with an income of £30,000, I would not get a loan. But, I could be told that if I had an income of £35,000, I would have gotten the loan. I may then assume that an income over £35,000 guarantees me a loan in the future. But, it may turn out that I would be refused a loan with an income above £40,000 because of a change in tax bracket. Non-linear relationships of this type can make it misleading to graph decision spaces. For simple linear models, such a graph may be a very good idea, but not for black box systems; they could, in fact, be highly misleading.

Chris: As Brent says, we’re concerned with understanding complicated algorithms that don’t just use hard cut-offs based on binary features. To use your example, maybe a little bit of debt is acceptable, but it would increase your risk of default slightly, so the amount of money you need to earn would go up. Or maybe certain convictions committed in the past also only increase your risk of defaulting slightly, and can be compensated for with higher salary. It’s not at all obvious how you could graph these complicated interdependencies over many variables together. This is why we picked on counterfactuals as a way to give people a direct and easy to understand path to move from the decision they got now, to a more favourable one at a later date.

Ed: But could a counterfactual approach just end up kicking the can down the road, if we know “how” a particular decision was reached, but not “why” the algorithm was weighted in such a way to produce that decision?

Brent: It depends what we mean by “why”. If this is “why” in the sense of, why was the system designed this way, to consider this type of data for this task, then we should be asking these questions while these systems are designed and deployed. Counterfactuals address decisions that have already been made, but still can reveal uncomfortable knowledge about a system’s design and functionality. So it can certainly inform “why” questions.

Sandra: Just to echo Brent, we don’t want to imply that asking the “why” is unimportant—I think it’s very important, and interpretability as a field has to be pursued, particularly if we’re using algorithms in highly sensitive areas. Even if we have the “what”, the “why” question is still necessary to ensure the safety of those systems.

Chris: And anyone who’s talked to a three-year old knows there is an endless stream of “Why” questions that can be asked. But already, counterfactuals provide a major step forward in answering why, compared to previous approaches that were concerned with providing approximate descriptions of how algorithms make decisions—but not the “why” or the external facts leading to that decision. I think when judging the strength of an explanation, you also have to look at questions like “How easy is this to understand?” and “How does this help the person I’m explaining things to?” For me, counterfactuals are a more immediately useful explanation, than something which explains where the weights came from. Even if you did know, what could you do with that information?

Ed: I guess the question of algorithmic decision making in society involves a hugely complex intersection of industry, research, and policy making? Are we control of things?

Sandra: Artificial intelligence (and the technology supporting it) is an area where many sectors are now trying to work together, including in the crucial areas of fairness, transparency and accountability of algorithmic decision-making. I feel at the moment we see a very multi-stakeholder approach, and I hope that continues in the future. We can see for example that industry is very concerned with it—the Partnership in AI is addressing these topics and trying to come up with a set of industry guidelines, recognising the responsibilities inherent in producing these systems. There are also lots of data scientists (eg at the OII and Turing Institute) working on these questions. Policy-makers around the world (e.g. UK, EU, US, China) preparing their countries for the AI future, so it’s on everybody’s mind at the moment. It’s an extremely important topic.

Law and ethics obviously has an important role to play. The opacity, unpredictability of AI and its potentially discriminatory nature, requires that we think about the legal and ethical implications very early on. That starts with educating the coding community, and ensuring diversity. At the same time, it’s important to have an interdisciplinary approach. At the moment we’re focusing a bit too much on the STEM subjects; there’s a lot of funding going to those areas (which makes sense, obviously), but the social sciences are currently a bit neglected despite the major role they play in recognising things like discrimination and bias, which you might not recognise from just looking at code.

Brent: Yes—and we’ll need much greater interaction and collaboration between these sectors to stay ‘in control’ of things, so to speak. Policy always has a tendency to lag behind technological developments; the challenge here is to stay close enough to the curve to prevent major issues from arising. The potential for algorithms to transform society is massive, so ensuring a quicker and more reflexive relationship between these sectors than normal is absolutely critical.

Read the full article: Sandra Wachter, Brent Mittelstadt, Chris Russell (2018) Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology (Forthcoming).

This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1.


Sandra Wachter, Brent Mittelstadt and Chris Russell were talking to blog editor David Sutcliffe.

]]>
Latest Report by UN Special Rapporteur for the Right to Freedom of Expression is a Landmark Document https://ensr.oii.ox.ac.uk/latest-report-by-un-special-rapporteur-for-the-right-to-freedom-of-expression-is-a-landmark-document/ Thu, 15 Jun 2017 12:15:31 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4252 “The digital access industry is in the business of digital expression (…). Since privately owned networks are indispensable to the contemporary exercise of freedom of expression, their operators also assume critical social and public functions. The industry’s decisions (…) can directly impact freedom of expression and related human rights in both beneficial and detrimental ways.” [Report of the Special Rapporteur on the right to freedom of expression, June 2017]

The Internet is often portrayed as a disruptive equalizer, an information medium able to directly give individuals access to information and provide a platform to share their opinions unmediated. But the Internet is also a tool for surveillance, censorship, and information warfare. Often states drive such practices, but increasingly the private sector plays a role. While states have a clear obligation to protect human rights on the Internet, questions surrounding the human right accountability of the private sector are unclear. Which begs the question what the responsibility is of the private industry, which runs and owns much of the Internet, towards human rights?

During the 35th session of the United Nations (UN) Human Rights Council this month, David Kaye, UN Special Rapporteur (UNSR) for the right to freedom of expression, presented his latest report [1], which focuses on the role of the private sector in the provision of Internet and telecommunications access. The UNSR on freedom of expression is an independent expert, appointed by the Human Rights Council to analyse, document, and report on the state of freedom of expression globally [2]. The rapporteur is also expected to make recommendations towards ‘better promoting and protection of the right to freedom of expression’ [3]. In recent years, the UNSRs on freedom of expression increasingly focus on the intersection between access to information, expression, and the Internet [4].

This most recent report is a landmark document. Its focus on the role and responsibilities of the private sector towards the right to freedom of expression presents a necessary step forward in the debate about the responsibility for the realization of human rights online. The report takes on the legal difficulties surrounding the increased reliance of states on access to privately owned networks and data, whether by necessity, through cooperation, or through coercion, for surveillance, security, and service provision. It also tackles the legal responsibilities that private organizations have to respect human rights.

The first half of Kaye’s report emphasises the role of states in protecting the right to freedom of expression and access to information online, in particular in the context of state-mandated Internet shutdowns and private-public data sharing. Kaye highlights several major Internet shutdowns across the world and argues that considering ‘the number of essential activities and services they affect, shutdowns restrict expression and interfere with other fundamental rights’ [5]. In order to address this issue, he recommends that the Human Rights Council supplements and specifies resolution 32/13, on ‘the promotion, protection and enjoyment of human rights on the Internet’ [6], in which it condemns such disruptions to the network. On the interaction between private actors and the state, Kaye walks a delicate line. On the one hand, he argues that governments should not pressure or threaten companies to provide them with access to data. On the other hand, he also argues that states should not allow companies to make network management decisions that treat data differentially based on its origin.

The second half of the report focusses on the responsibility of the private sector. In this context, the UNSR highlights the responsibilities of private actors towards the right to freedom of expression. Kaye argues that this sector plays a crucial role in providing access to information and communication services to millions across the globe. He looks specifically at the role of telecommunication and Internet service providers, Internet exchange points, content delivery networks, network equipment vendors, and other private actors. He argues that four contextual factors are relevant to understanding the responsibility of private actors vis-à-vis human rights:

(1) private actors provide access to ‘a public good’,
(2) due to the technical nature of the Internet, any restrictions on access affect freedom of expression on a global level,
(3) the private sector is vulnerable to state pressure,
(4) but it is also in a unique position to respect users’ rights.

The report draws out the dilemma of the boundaries of responsibility. When should companies decide to comply with state policies that might undermine the rights of Internet end-users? What remedies should they offer end-users if they are complicit in human rights violations? How can private actors assess what impact their technologies might have on human rights?

Private actors across the spectrum, from multinational social media platforms to the garage-based start-ups are likely to run into these questions. As the Internet underpins a large part of the functioning of our societies, and will only further continue to do so as physical devices increasingly become part of the network (aka the Internet of Things), it is even more important to understand and allocate private sector responsibility for protecting human rights.

The report has a dedicated addendum [7] that specifically details the responsibility of Internet Standard Developing Organizations (SDOs). In it, Kaye relies on the article written by Corinne Cath and Luciano Floridi of the Oxford Internet Institute (OII) entitled ‘The Design of the Internet’s Architecture by the Internet Engineering Task Force (IETF) and Human Rights’ [8] to support his argument that SDOs should take on a credible approach to human rights accountability.

Overall, Kaye argues that companies should adopt the UN Guiding Principles on Business and Human Rights [9], which would provide a ‘minimum baseline for corporate human rights accountability’. To operationalize this commitment, the private sector will need to take several urgent steps. It should ensure that sufficient resources are reserved for meeting its responsibility towards human rights, and it should integrate the principles of due diligence, human rights by design, stakeholder engagement, mitigation of the harms of government-imposed restrictions, transparency, and effective remedies to complement its ‘high level commitment to human rights’.

While this report is not binding [10] on states or companies, it does set out a much-needed detailed blue print of how to address questions of corporate responsibility towards human rights in the digital age.

References

[1] https://documents-dds-ny.un.org/doc/UNDOC/GEN/G17/077/46/PDF/G1707746.pdf?OpenElement
[2] http://www.ijrcenter.org/un-special-procedures/
[3] http://www.ohchr.org/EN/Issues/FreedomOpinion/Pages/OpinionIndex.aspx
[4] http://www2.ohchr.org/english/bodies/hrcouncil/docs/17session/A.HRC.17.27_en.pdf
[5] The author of this blog has written about this issue here: https://www.cfr.org/blog-post/should-technical-actors-play-political-role-internet-age
[6] http://ap.ohchr.org/documents/dpage_e.aspx?si=A/HRC/32/L.20
[7] https://documents-dds-ny.un.org/doc/UNDOC/GEN/G17/141/31/PDF/G1714131.pdf?OpenElement
[8] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2912308
[9] http://www.ohchr.org/Documents/Publications/GuidingPrinciplesBusinessHR_EN.pdf
[10] http://www.ohchr.org/Documents/Publications/FactSheet27en.pdf

]]>
Our knowledge of how automated agents interact is rather poor (and that could be a problem) https://ensr.oii.ox.ac.uk/our-knowledge-of-how-automated-agents-interact-is-rather-poor-and-that-could-be-a-problem/ Wed, 14 Jun 2017 15:12:05 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4191 Recent years have seen a huge increase in the number of bots online — including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overally, and more than 50% in certain language editions.)

While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful.

In their PLOS ONE article “Even good bots fight: The case of Wikipedia“, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyze the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia — identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc. — the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years.

They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash..).

We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings:

Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading): i.e. is this just (another) example of us not always being able to anticipate how code interacts in the wild?

Taha: There are similarities and differences. The most notable difference is that here bots are not competing. They all work based on same rules and more importantly to achieve the same goal that is to increase the quality of the encyclopedia. Considering these features, the rather antagonistic interactions between the bots come as a surprise.

Ed.: Wikipedia have said that they know about it, and that it’s a minor problem: but I suppose Wikipedia presents a nice, open, benevolent system to make a start on examining and understanding bot interactions. What other bot-systems are you aware of, or that you could have looked at?

Taha: In terms of content generating bots, Twitter bots have turned out to be very important in terms of online propaganda. The crawlers bots that collect information from social media or the web (such as personal information or email addresses) are also being heavily deployed. In fact we have come up with a first typology of the Internet bots based on their type of action and their intentions (benevolent vs malevolent), that is presented in the article.

Ed.: You’ve also done work on human collaborations (e.g. in the citizen science projects of the Zooniverse) — is there any work comparing human collaborations with bot collaborations — or even examining human-bot collaborations and interactions?

Taha: In the present work we do compare bot-bot interactions with human-human interactions to observe similarities and differences. The most striking difference is in the dynamics of negative interactions. While human conflicts heat up very quickly and then disappear after a while, bots undoing each others’ contribution comes as a steady flow which might persist over years. In the HUMANE project, we discuss the co-existence of humans and machines in the digital world from a theoretical point of view and there we discuss such ecosystems in details.

Ed.: Humans obviously interact badly, fairly often (despite being a social species) .. why should we be particularly worried about how bots interact with each other, given humans seem to expect and cope with social inefficiency, annoyances, conflict and break-down? Isn’t this just more of the same?

Luciano: The fact that bots can be as bad as humans is far from reassuring. The fact that this happens even when they are programmed to collaborate is more disconcerting than what happens among humans when these compete, or fight each other. Here are very elementary mechanisms that through simple interactions generate messy and conflictual outcomes. One may hope this is not evidence of what may happen when more complex systems and interactions are in question. The lesson I learnt from all this is that without rules or some kind of normative framework that promote collaboration, not even good mechanisms ensure a good outcome.

Read the full article: Tsvetkova M, Garcia-Gavilanes R, Floridi, L, Yasseri T (2017) Even good bots fight: The case of Wikipedia. PLoS ONE 12(2): e0171774. doi:10.1371/journal.pone.0171774


Taha Yasseri and Luciano Floridi were talking to blog editor David Sutcliffe.

]]>
Exploring the world of digital detoxing https://ensr.oii.ox.ac.uk/exploring-the-world-of-digital-detoxing/ Thu, 02 Mar 2017 10:50:06 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3973 As our social interactions become increasingly entangled with the online world, there are some who insist on the benefits of disconnecting entirely from digital technology. These advocates of “digital detoxing” view digital communication as eroding our ability to concentrate, to empathise, and to have meaningful conversations.

A 2016 survey by OnePoll found that 40% of respondents felt they had “not truly experienced valuable moments such as a child’s first steps or graduation” because “technology got in the way”, and OfCom’s 2016 survey showed that 15 million British Internet users (representing a third of those online), have already tried a digital detox. In recent years, America has sought to pathologise a perceived over-use of digital technology as “Internet addiction”. While the term is not recognized by the DSM, the idea is commonly used in media rhetoric and forms an important backdrop to digital detoxing.

The article Disconnect to reconnect: The food/technology metaphor in digital detoxing (First Monday) by Theodora Sutton presents a short ethnography of the digital detoxing community in the San Francisco Bay Area. Her informants attend an annual four-day digital detox and summer camp for adults in the Californian forest called Camp Grounded. She attended two Camp Grounded sessions in 2014, and followed up with semi-structured interviews with eight detoxers.

We caught up with Theodora to examine the implications of the study and to learn more about her PhD research, which focuses on the same field site.

Ed.: In your forthcoming article you say that Camp Grounded attendees used food metaphors (and words like “snacking” and “nutrition”) to understand their own use of technology and behaviour. How useful is this as an analogy?

Theodora: The food/technology analogy is an incredibly neat way to talk about something we think of as immaterial in a more tangible way. We know that our digital world relies on physical connections, but we forget that all the time. Another thing it does in lending a dietary connotation is to imply we should regulate our consumption of digital use; that there are healthy and unhealthy or inappropriate ways of using it.

I explore more pros and cons to the analogy in the paper, but the biggest con in my opinion is that while it’s neat, it’s often used to make value judgments about technology use. For example, saying that online sociality is like processed food is implying that it lacks authenticity. So the food analogy is a really useful way to understand how people are interpreting technology culturally, but it’s important to be aware of how it’s used.

Ed.: How do people rationalise ideas of the digital being somehow “less real” or “genuine” (less “nourishing”), despite the fact that it obviously is all real: just different? Is it just a peg to blame an “other” and excuse their own behaviour .. rather than just switching off their phones and going for a run / sail etc. (or any other “real” activity..).

Theodora: The idea of new technologies being somehow less real or less natural is a pretty established Western concept, and it’s been fundamental in moral panics following new technologies. That digital sociality is different, not lesser, is something we can academically agree on, but people very often believe otherwise.

My personal view is that figuring out what kind of digital usage suits you and then acting in moderation is ideal, without the need for extreme lengths, but in reality moderation can be quite difficult to achieve. And the thing is, we’re not just talking about choosing to text rather than meet in person, or read a book instead of go on Twitter. We’re talking about digital activities that are increasingly inescapable and part of life, like work e-mail or government services being moved online.

The ability to go for a run or go sailing are again privileged activities for people with free time. Many people think getting back to nature or meeting in person are really important for human needs. But increasingly, not everyone has the ability to get away from devices, especially if you don’t have enough money to visit friends or travel to a forest, or you’re just too tired from working all the time. So Camp Grounded is part of what they feel is an urgent conversation about whether the technology we design addresses human, emotional needs.

Ed.: You write in the paper that “upon arrival at Camp Grounded, campers are met with hugs and milk and cookies” .. not to sound horrible, but isn’t this replacing one type of (self-focused) reassurance with another? I mean, it sounds really nice (as does the rest of the Camp), but it sounds a tiny bit like their “problem” is being fetishised / enjoyed a little bit? Or maybe that their problem isn’t to do with technology, but rather with confidence, anxiety etc.

Theodora: The people who run Camp Grounded would tell you themselves that digital detoxing is not really about digital technology. That’s just the current scapegoat for all the alienating aspects of modern life. They also take away real names, work talk, watches, and alcohol. One of the biggest things Camp Grounded tries to do is build up attendees’ confidence to be silly and playful and have their identities less tied to their work persona, which is a bit of a backlash against Silicon Valley’s intense work ethic. Milk and cookies comes from childhood, or America’s summer camps which many attendees went to as children, so it’s one little thing they do to get you to transition into that more relaxed and childlike way of behaving.

I’m not sure about “fetishized,” but Camp Grounded really jumps on board with the technology idea, using really ironic things like an analog dating service called “embers,” a “human powered search” where you pin questions on a big noticeboard and other people answer, and an “inbox” where people leave you letters.

And you’re right, there is an aspect of digital detoxing which is very much a “middle class ailment” in that it can seem rather surface-level and indulgent, and tickets are pretty pricey, making it quite a privileged activity. But at the same time I think it is a genuine conversation starter about our relationship with technology and how it’s designed. I think a digital detox is more than just escapism or reassurance, for them it’s about testing a different lifestyle, seeing what works best for them and learning from that.

Ed.: Many of these technologies are designed to be “addictive” (to use the term loosely: maybe I mean “seductive”) in order to drive engagement and encourage retention: is there maybe an analogy here with foods that are too sugary, salty, fatty (i.e. addictive) for us? I suppose the line between genuine addiction and free choice / agency is a difficult one; and one that may depend largely on the individual. Which presumably makes any attempts to regulate (or even just question) these persuasive digital environments particularly difficult? Given the massive outcry over perfectly rational attempts to tax sugar, fat etc.

Theodora: The analogy between sugary, salty, or fatty foods and seductive technologies is drawn a lot — it was even made by danah boyd in 2009. Digital detoxing comes from a standpoint that tech companies aren’t necessarily working to enable meaningful connection, and are instead aiming to “hook” people in. That’s often compared to food companies that exist to make a profit rather than improve your individual nutrition, using whatever salt, sugar, flavourings, or packaging they have at their disposal to make you keep coming back.

There are two different ways of “fixing” perceived problems with tech: there’s technical fixes that might only let you use the site for certain amounts of time, or re-designing it so that it’s less seductive; then there’s normative fixes, which could be on an individual level deciding to make a change, or even society wide, like the French labour law giving the “right to disconnect” from work emails on evenings and weekends.

One that sort of embodies both of these is The Time Well Spent project, run by Tristan Harris and the OII’s James Williams. They suggest different metrics for tech platforms, such as how well they enable good experiences away from the computer altogether. Like organic food stickers, they’ve suggested putting a stamp on websites whose companies have these different metrics. That could encourage people to demand better online experiences, and encourage tech companies to design accordingly.

So that’s one way that people are thinking about regulating it, but I think we’re still in the stages of sketching out what the actual problems are and thinking about how we can regulate or “fix” them. At the moment, the issue seems to depend on what the individual wants to do. I’d be really interested to know what other ideas people have had to regulate it, though.

Ed.: Without getting into the immense minefield of evolutionary psychology (and whether or not we are creating environments that might be detrimental to us mentally or socially: just as the Big Mac and Krispy Kreme are not brilliant for us nutritionally) — what is the lay of the land — the academic trends and camps — for this larger question of “Internet addiction” .. and whether or not it’s even a thing?

Theodora: In my experience academics don’t consider it a real thing, just as you wouldn’t say someone had an addiction to books. But again, that doesn’t mean it isn’t used all the time as a shorthand. And there are some academics who use it, like Kimberly Young who proposed it in the 1990’s. She still runs an Internet addiction treatment centre in New York, and there’s another in Fall City, Washington state.

The term certainly isn’t going away any time soon and the centres treat people who genuinely seem to have a very problematic relationship with their technology. People like the OII’s Andrew Przybylski (@ShuhBillSkee) are working on untangling this kind of problematic digital use from the idea of addiction, which can be a bit of a defeatist and dramatic term.

Ed.: As an ethnographer working at the Camp according to its rules (hand-written notes, analogue camera) .. did it affect your thinking or subsequent behaviour / habits in any way?

Theodora: Absolutely. In a way that’s a struggle, because I never felt that I wanted or needed a digital detox, yet having been to it three times now I can see the benefits. Going to camp made a strong case for the argument to be more careful with my technology use, for example not checking my phone mid-conversation, and I’ve been much more aware of it since. For me, that’s been part of an on-going debate that I have in my own life, which I think is a really useful fuel towards continuing to unravel this topic in my studies.

Ed.: So what are your plans now for your research in this area — will you be going back to Camp Grounded for another detox?

Theodora: Yes — I’ll be doing an ethnography of the digital detoxing community again this summer for my PhD and that will include attending Camp Grounded again. So far I’ve essentially done just preliminary fieldwork and visited to touch base with my informants. It’s easy to listen to the rhetoric around digital detoxing, but I think what’s been missing is someone spending time with them to really understand their point of view, especially their values, that you can’t always capture in a survey or in interviews.

In my PhD I hope to understand things like: how digital detoxers even think about technology, what kind of strategies they have to use it appropriately once they return from a detox, and how metaphor and language work in talking about the need to “unplug.” The food analogy is just one preliminary finding that shows how fascinating the topic is as soon as you start scratching away the surface.

Read the full article: Sutton, T. (2017) Disconnect to reconnect: The food/technology metaphor in digital detoxing. First Monday 22 (6).


OII DPhil student Theodora Sutton was talking to blog editor David Sutcliffe.

]]>
Five Pieces You Should Probably Read On: Fake News and Filter Bubbles https://ensr.oii.ox.ac.uk/five-pieces-you-should-probably-read-on-fake-news-and-filter-bubbles/ Fri, 27 Jan 2017 10:08:39 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3940 This is the second post in a series that will uncover great writing by faculty and students at the Oxford Internet Institute, things you should probably know, and things that deserve to be brought out for another viewing. This week: Fake News and Filter Bubbles!

Fake news, post-truth, “alternative facts”, filter bubbles — this is the news and media environment we apparently now inhabit, and that has formed the fabric and backdrop of Brexit (“£350 million a week”) and Trump (“This was the largest audience to ever witness an inauguration — period”). Do social media divide us, hide us from each other? Are you particularly aware of what content is personalised for you, what it is you’re not seeing? How much can we do with machine-automated or crowd-sourced verification of facts? And are things really any worse now than when Bacon complained in 1620 about the false notions that “are now in possession of the human understanding, and have taken deep root therein”?

 

1. Bernie Hogan: How Facebook divides us [Times Literary Supplement]

27 October 2016 / 1000 words / 5 minutes

“Filter bubbles can create an increasingly fractured population, such as the one developing in America. For the many people shocked by the result of the British EU referendum, we can also partially blame filter bubbles: Facebook literally filters our friends’ views that are least palatable to us, yielding a doctored account of their personalities.”

Bernie Hogan says it’s time Facebook considered ways to use the information it has about us to bring us together across political, ideological and cultural lines, rather than hide us from each other or push us into polarized and hostile camps. He says it’s not only possible for Facebook to help mitigate the issues of filter bubbles and context collapse; it’s imperative, and it’s surprisingly simple.

 

2. Luciano Floridi: Fake news and a 400-year-old problem: we need to resolve the ‘post-truth’ crisis [the Guardian]

29 November 2016 / 1000 words / 5 minutes

“The internet age made big promises to us: a new period of hope and opportunity, connection and empathy, expression and democracy. Yet the digital medium has aged badly because we allowed it to grow chaotically and carelessly, lowering our guard against the deterioration and pollution of our infosphere. […] some of the costs of misinformation may be hard to reverse, especially when confidence and trust are undermined. The tech industry can and must do better to ensure the internet meets its potential to support individuals’ wellbeing and social good.”

The Internet echo chamber satiates our appetite for pleasant lies and reassuring falsehoods, and has become the defining challenge of the 21st century, says Luciano Floridi. So far, the strategy for technology companies has been to deal with the ethical impact of their products retrospectively, but this is not good enough, he says. We need to shape and guide the future of the digital, and stop making it up as we go along. It is time to work on an innovative blueprint for a better kind of infosphere.

 

3. Philip Howard: Facebook and Twitter’s real sin goes beyond spreading fake news

3 January 2017 / 1000 words / 5 minutes

“With the data at their disposal and the platforms they maintain, social media companies could raise standards for civility by refusing to accept ad revenue for placing fake news. They could let others audit and understand the algorithms that determine who sees what on a platform. Just as important, they could be the platforms for doing better opinion, exit and deliberative polling.”

Only Facebook and Twitter know how pervasive fabricated news stories and misinformation campaigns have become during referendums and elections, says Philip Howard — and allowing fake news and computational propaganda to target specific voters is an act against democratic values. But in a time of weakening polling systems, withholding data about public opinion is actually their major crime against democracy, he says.

 

4. Brent Mittelstadt: Should there be a better accounting of the algorithms that choose our news for us?

7 December 2016 / 1800 words / 8 minutes

“Transparency is often treated as the solution, but merely opening up algorithms to public and individual scrutiny will not in itself solve the problem. Information about the functionality and effects of personalisation must be meaningful to users if anything is going to be accomplished. At a minimum, users of personalisation systems should be given more information about their blind spots, about the types of information they are not seeing, or where they lie on the map of values or criteria used by the system to tailor content to users.”

A central ideal of democracy is that political discourse should allow a fair and critical exchange of ideas and values. But political discourse is unavoidably mediated by the mechanisms and technologies we use to communicate and receive information, says Brent Mittelstadt. And content personalization systems and the algorithms they rely upon create a new type of curated media that can undermine the fairness and quality of political discourse.

 

5. Heather Ford: Verification of crowd-sourced information: is this ‘crowd wisdom’ or machine wisdom?

19 November 2013 / 1400 words / 6 minutes

“A key question being asked in the design of future verification mechanisms is the extent to which verification work should be done by humans or non-humans (machines). Here, verification is not a binary categorisation, but rather there is a spectrum between human and non-human verification work, and indeed, projects like Ushahidi, Wikipedia and Galaxy Zoo have all developed different verification mechanisms.”

‘Human’ verification, a process of checking whether a particular report meets a group’s truth standards, is an acutely social process, says Heather Ford. If code is law and if other aspects in addition to code determine how we can act in the world, it is important that we understand the context in which code is deployed. Verification is a practice that determines how we can trust information coming from a variety of sources — only by illuminating such practices and the variety of impacts that code can have in different environments can we begin to understand how code regulates our actions in crowdsourcing environments.

 

.. and just to prove we’re capable of understanding and acknowledging and assimilating multiple viewpoints on complex things, here’s Helen Margetts, with a different slant on filter bubbles: “Even if political echo chambers were as efficient as some seem to think, there is little evidence that this is what actually shapes election results. After all, by definition echo chambers preach to the converted. It is the undecided people who (for example) the Leave and Trump campaigns needed to reach. And from the research, it looks like they managed to do just that.”

 

The Authors

Bernie Hogan is a Research Fellow at the OII; his research interests lie at the intersection of social networks and media convergence.

Luciano Floridi is the OII’s Professor of Philosophy and Ethics of Information. His  research areas are the philosophy of Information, information and computer ethics, and the philosophy of technology.

Philip Howard is the OII’s Professor of Internet Studies. He investigates the impact of digital media on political life around the world.

Brent Mittelstadt is an OII Postdoc His research interests include the ethics of information handled by medical ICT, theoretical developments in discourse and virtue ethics, and epistemology of information.

Heather Ford completed her doctorate at the OII, where she studied how Wikipedia editors write history as it happens. She is now a University Academic Fellow in Digital Methods at the University of Leeds. Her forthcoming book “Fact Factories: Wikipedia’s Quest for the Sum of All Human Knowledge” will be published by MIT Press.

Helen Margetts is the OII’s Director, and Professor of Society and the Internet. She specialises in digital era government, politics and public policy, and data science and experimental methods. Her most recent book is Political Turbulence (Princeton).

 

Coming up! .. It’s the economy, stupid / Augmented reality and ambient fun / The platform economy / Power and development / Internet past and future / Government / Labour rights / The disconnected / Ethics / Staying critical

]]>
Should there be a better accounting of the algorithms that choose our news for us? https://ensr.oii.ox.ac.uk/should-there-be-a-better-accounting-of-the-algorithms-that-choose-our-news-for-us/ Wed, 07 Dec 2016 14:44:31 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3875 A central ideal of democracy is that political discourse should allow a fair and critical exchange of ideas and values. But political discourse is unavoidably mediated by the mechanisms and technologies we use to communicate and receive information — and content personalization systems (think search engines, social media feeds and targeted advertising), and the algorithms they rely upon, create a new type of curated media that can undermine the fairness and quality of political discourse.

A new article by Brent Mittlestadt explores the challenges of enforcing a political right to transparency in content personalization systems. Firstly, he explains the value of transparency to political discourse and suggests how content personalization systems undermine open exchange of ideas and evidence among participants: at a minimum, personalization systems can undermine political discourse by curbing the diversity of ideas that participants encounter. Second, he explores work on the detection of discrimination in algorithmic decision making, including techniques of algorithmic auditing that service providers can employ to detect political bias. Third, he identifies several factors that inhibit auditing and thus indicate reasonable limitations on the ethical duties incurred by service providers — content personalization systems can function opaquely and be resistant to auditing because of poor accessibility and interpretability of decision-making frameworks. Finally, Brent concludes with reflections on the need for regulation of content personalization systems.

He notes that no matter how auditing is pursued, standards to detect evidence of political bias in personalized content are urgently required. Methods are needed to routinely and consistently assign political value labels to content delivered by personalization systems. This is perhaps the most pressing area for future work—to develop practical methods for algorithmic auditing.

The right to transparency in political discourse may seem unusual and farfetched. However, standards already set by the U.S. Federal Communication Commission’s fairness doctrine — no longer in force — and the British Broadcasting Corporation’s fairness principle both demonstrate the importance of the idealized version of political discourse described here. Both precedents promote balance in public political discourse by setting standards for delivery of politically relevant content. Whether it is appropriate to hold service providers that use content personalization systems to a similar standard remains a crucial question.

Read the full article: Mittelstadt, B. (2016) Auditing for Transparency in Content Personalization Systems. International Journal of Communication 10(2016), 4991–5002.

We caught up with Brent to explore the broader implications of the study:

Ed: We basically accept that the tabloids will be filled with gross bias, populism and lies (in order to sell copy) — and editorial decisions are not generally transparent to us. In terms of their impact on the democratic process, what is the difference between the editorial boardroom and a personalising social media algorithm?

Brent: There are a number of differences. First, although not necessarily transparent to the public, one hopes that editorial boardrooms are at least transparent to those within the news organisations. Editors can discuss and debate the tone and factual accuracy of their stories, explain their reasoning to one another, reflect upon the impact of their decisions on their readers, and generally have a fair debate about the merits and weaknesses of particular content.

This is not the case for a personalising social media algorithm; those working with the algorithm inside a social media company are often unable to explain why the algorithm is functioning in a particular way, or determined a particular story or topic to be ‘trending’ or displayed to particular users, while others are not. It is also far more difficult to ‘fact check’ algorithmically curated news; a news item can be widely disseminated merely by many users posting or interacting with it, without any purposeful dissemination or fact checking by the platform provider.

Another big difference is the degree to which users can be aware of the bias of the stories they are reading. Whereas a reader of The Daily Mail or The Guardian will have some idea of the values of the paper, the same cannot be said of platforms offering algorithmically curated news and information. The platform can be neutral insofar as it disseminates news items and information reflecting a range of values and political viewpoints. A user will encounter items reflecting her particular values (or, more accurately, her history of interactions with the platform and the values inferred from them), but these values, and their impact on her exposure to alternative viewpoints, may not be apparent to the user.

Ed: And how is content “personalisation” different to content filtering (e.g. as we see with the Great Firewall of China) that people get very worked up about? Should we be more worried about personalisation?

Brent: Personalisation and filtering are essentially the same mechanism; information is tailored to a user or users according to some prevailing criteria. One difference is whether content is merely infeasible to access, or technically inaccessible. Content of all types will typically still be accessible in principle when personalisation is used, but the user will have to make an effort to access content that is not recommended or otherwise given special attention. Filtering systems, in contrast, will impose technical measures to make particular content inaccessible from a particular device or geographical area.

Another difference is the source of the criteria used to set the visibility of different types of content. In the case of personalisation, these criteria are typically based on the users (inferred) interests, values, past behaviours and explicit requests. Critically, these values are not necessarily apparent to the user. For filtering, criteria are typically externally determined by a third party, often a government. Some types of information are set off limits, according to the prevailing values of the third party. It is the imposition of external values, which limit the capacity of users to access content of their choosing, which often causes an outcry against filtering and censorship.

Importantly, the two mechanisms do not necessarily differ in terms of the transparency of the limiting factors or rules to users. In some cases, such as the recently proposed ban in the UK of adult websites that do not provide meaningful age verification mechanisms, the criteria that determine whether sites are off limits will be publicly known at a general level. In other cases, and especially with personalisation, the user inside the ‘filter bubble’ will be unaware of the rules that determine whether content is (in)accessible. And it is not always the case that the platform provider intentionally keeps these rules secret. Rather, the personalisation algorithms and background analytics that determine the rules can be too complex, inaccessible or poorly understood even by the provider to give the user any meaningful insight.

Ed: Where are these algorithms developed: are they basically all proprietary? i.e. how would you gain oversight of massively valuable and commercially sensitive intellectual property?

Brent: Personalisation algorithms tend to be proprietary, and thus are not normally open to public scrutiny in any meaningful sense. In one sense this is understandable; personalisation algorithms are valuable intellectual property. At the same time the lack of transparency is a problem, as personalisation fundamentally affects how users encounter and digest information on any number of topics. As recently argued, it may be the case that personalisation of news impacts on political and democratic processes. Existing regulatory mechanisms have not been successful in opening up the ‘black box’ so to speak.

It can be argued, however, that legal requirements should be adopted to require these algorithms to be open to public scrutiny due to the fundamental way they shape our consumption of news and information. Oversight can take a number of forms. As I argue in the article, algorithmic auditing is one promising route, performed both internally by the companies themselves, and externally by a government agency or researchers. A good starting point would be for the companies developing and deploying these algorithms to extend their cooperation with researchers, thereby allowing a third party to examine the effects these systems are having on political discourse, and society more broadly.

Ed: By “algorithm audit” — do you mean examining the code and inferring what the outcome might be in terms of bias, or checking the outcome (presumably statistically) and inferring that the algorithm must be introducing bias somewhere? And is it even possible to meaningfully audit personalisation algorithms, when they might rely on vast amounts of unpredictable user feedback to train the system?

Brent: Algorithm auditing can mean both of these things, and more. Audit studies are a tool already in use, whereby human participants introduce different inputs into a system, and examine the effect on the system’s outputs. Similar methods have long been used to detect discriminatory hiring practices, for instance. Code audits are another possibility, but are generally prohibitive due to problems of access and complexity. Also, even if you can access and understand the code of an algorithm, that tells you little about how the algorithm performs in practice when given certain input data. Both the algorithm and input data would need to be audited.

Alternatively, auditing can assess just the outputs of the algorithm; recent work to design mechanisms to detect disparate impact and discrimination, particularly in the Fairness, Accountability and Transparency in Machine Learning (FAT-ML) community, is a great example of this type of auditing. Algorithms can also be designed to attempt to prevent or detect discrimination and other harms as they occur. These methods are as much about the operation of the algorithm, as they are about the nature of the training and input data, which may itself be biased. In short, auditing is very difficult, but there are promising avenues of research and development. Once we have reliable auditing methods, the next major challenge will be to tailor them to specific sectors; a one-size-meets-all approach to auditing is not on the cards.

Ed: Do you think this is a real problem for our democracy? And what is the solution if so?

Brent: It’s difficult to say, in part because access and data to study the effects of personalisation systems are hard to come by. It is one thing to prove that personalisation is occurring on a particular platform, or to show that users are systematically displayed content reflecting a narrow range of values or interests. It is quite another to prove that these effects are having an overall harmful effect on democracy. Digesting information is one of the most basic elements of social and political life, so any mechanism that fundamentally changes how information is encountered should be subject to serious and sustained scrutiny.

Assuming personalisation actually harms democracy or political discourse, mitigating its effects is quite a different issue. Transparency is often treated as the solution, but merely opening up algorithms to public and individual scrutiny will not in itself solve the problem. Information about the functionality and effects of personalisation must be meaningful to users if anything is going to be accomplished.

At a minimum, users of personalisation systems should be given more information about their blind spots, about the types of information they are not seeing, or where they lie on the map of values or criteria used by the system to tailor content to users. A promising step would be proactively giving the user some idea of what the system thinks it knows about them, or how they are being classified or profiled, without the user first needing to ask.


Brent Mittelstadt was talking to blog editor David Sutcliffe.

]]>
Alan Turing Institute and OII: Summit on Data Science for Government and Policy Making https://ensr.oii.ox.ac.uk/alan-turing-institute-and-oii-summit-on-data-science-for-government-and-policy-making/ Tue, 31 May 2016 06:45:39 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3804 The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings.

The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI – with the academic partners in listening mode.

We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does – collect tax from the whole population, or give money away at scale, or possess the legitimate use of force – it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are unique and tackling them could put government working with researchers at the forefront of data science innovation.

During the Summit a range of stakeholders provided insight from their distinctive perspectives; the Government Chief Scientific Advisor, Sir Mark Walport; Deputy Director of the ATI, Patrick Wolfe; the National Statistician and Director of ONS, John Pullinger; Director of Data at the Government Digital Service, Paul Maltby. Representatives of frontline departments recounted how algorithmic decision-making is already bringing predictive capacity into operational business, improving efficiency and effectiveness.

Discussion revolved around the challenges of how to build core capability in data science across government, rather than outsourcing it (as happened in an earlier era with information technology) or confining it to a data science profession. Some delegates talked of being in the ‘foothills’ of data science. The scale, heterogeneity and complexity of some government departments currently works against data science innovation, particularly when larger departments can operate thousands of databases, creating legacy barriers to interoperability. Out-dated policies can work against data science methodologies. Attendees repeatedly voiced concerns about sharing data across government departments, in some case because of limitations of legal protections; in others because people were unsure what they can and cannot do.

The potential power of data science creates an urgent need for discussion of ethics. Delegates and speakers repeatedly affirmed the importance of an ethical framework and for thought leadership in this area, so that ethics is ‘part of the science’. The clear emergent option was a national Council for Data Ethics (along the lines of the Nuffield Council for Bioethics) convened by the ATI, as recommended in the recent Science and Technology parliamentary committee report The big data dilemma and the government response. Luciano Floridi (OII’s professor of the philosophy and ethics of information) warned that we cannot reduce ethics to mere compliance. Ethical problems do not normally have a single straightforward ‘right’ answer, but require dialogue and thought and extend far beyond individual privacy. There was consensus that the UK has the potential to provide global thought leadership and to set the standard for the rest of Europe. It was announced during the Summit that an ATI Working Group on the Ethics of Data Science has been confirmed, to take these issues forward.

So what happens now?

Throughout the Summit there were calls from policy makers for more data science leadership. We hope that the ATI will be instrumental in providing this, and an interface both between government, business and academia, and between separate Government departments. This Summit showed just how much real demand – and enthusiasm – there is from policy makers to develop data science methods and harness the power of big data. No-one wants to repeat with data science the history of government information technology – where in the 1950s and 60s, government led the way as an innovator, but has struggled to maintain this position ever since. We hope that the ATI can act to prevent the same fate for data science and provide both thought leadership and the ‘time and space’ (as one delegate put it) for policy-makers to work with the Institute to develop data science for the public good.

So since the Summit, in response to the clear need that emerged from the discussion and other conversations with stakeholders, the ATI has been designing a Policy Innovation Unit, with the aim of working with government departments on ‘data science for public good’ issues. Activities could include:

  • Secondments at the ATI for data scientists from government
  • Short term projects in government departments for ATI doctoral students and postdoctoral researchers
  • Developing ATI as an accredited data facility for public data, as suggested in the current Cabinet Office consultation on better use of data in government
  • ATI pilot policy projects, using government data
  • Policy symposia focused on specific issues and challenges
  • ATI representation in regular meetings at the senior level (for example, between Chief Scientific Advisors, the Cabinet Office, the Office for National Statistics, GO-Science).
  • ATI acting as an interface between public and private sectors, for example through knowledge exchange and the exploitation of non-government sources as well as government data
  • ATI offering a trusted space, time and a forum for formulating questions and developing solutions that tackle public policy problems and push forward the frontiers of data science
  • ATI as a source of cross-fertilization of expertise between departments
  • Reviewing the data science landscape in a department or agency, identifying feedback loops – or lack thereof – between policy-makers, analysts, front-line staff and identifying possibilities for an ‘intelligent centre’ model through strategic development of expertise.

The Summit, and a series of Whitehall Roundtables convened by GO-Science which led up to it, have initiated a nascent network of stakeholders across government, which we aim to build on and develop over the coming months. If you are interested in being part of this, please do be in touch with us

Helen Margetts, Oxford Internet Institute, University of Oxford (director@oii.ox.ac.uk)

Tom Melham, Department of Computer Science, University of Oxford

]]>
Exploring the Ethics of Monitoring Online Extremism https://ensr.oii.ox.ac.uk/exploring-the-ethics-of-monitoring-online-extremism/ Wed, 23 Mar 2016 09:59:02 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3616 (Part 2 of 2) The Internet serves not only as a breeding ground for extremism, but also offers myriad data streams which potentially hold great value to law enforcement. The report by the OII’s Ian Brown and Josh Cowls for the VOX-Pol project: Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material explores the complexities of policing the web for extremist material, and its implications for security, privacy and human rights. In the second of a two-part post, Josh Cowls and Ian Brown discuss the report with blog editor Bertie Vidgen. Read the first post.

Surveillance in NYC's financial district. Photo by Jonathan McIntosh (flickr).
Surveillance in NYC’s financial district. Photo by Jonathan McIntosh (flickr).

Ed: Josh, political science has long posed a distinction between public spaces and private ones. Yet it seems like many platforms on the Internet, such as Facebook, cannot really be categorized in such terms. If this correct, what does it mean for how we should police and govern the Internet?

Josh: I think that is right – many online spaces are neither public nor private. This is also an issue for some for privacy legal frameworks (especially in the US).. A lot of the covenants and agreements were written forty or fifty years ago, long before anyone had really thought about the Internet. That has now forced governments, societies and parliaments to adapt these existing rights and protocols for the online sphere. I think that we have some fairly clear laws about the use of human intelligence sources, and police law in the offline sphere. The interesting question is how we can take that online. How can the pre-existing standards, like the requirement that procedures are necessary and proportionate, or the ‘right to appeal’, be incorporated into online spaces? In some cases there are direct analogies. In other cases there needs to be some re-writing of the rule book to try figure out what we mean. And, of course, it is difficult because the internet itself is always changing!

Ed: So do you think that concepts like proportionality and justification need to be updated for online spaces?

Josh: I think that at a very basic level they are still useful. People know what we mean when we talk about something being necessary and proportionate, and about the importance of having oversight. I think we also have a good idea about what it means to be non-discriminatory when applying the law, though this is one of those areas that can quickly get quite tricky. Consider the use of online data sources to identify people. On the one hand, the Internet is ‘blind’ in that it does not automatically codify social demographics. In this sense it is not possible to profile people in the same way that we can offline. On the other hand, it is in some ways the complete opposite. It is very easy to directly, and often invisibly, create really firm systems of discrimination – and, most problematically, to do so opaquely.

This is particularly challenging when we are dealing with extremism because, as we pointed out in the report, extremists are generally pretty unremarkable in terms of demographics. It perhaps used to be true that extremists were more likely to be poor or to have had challenging upbringings, but many of the people going to fight for the Islamic State are middle class. So we have fewer demographic pointers to latch onto when trying to find these people. Of course, insofar as there are identifiers they won’t be released by the government. The real problem for society is that there isn’t very much openness and transparency about these processes.

Ed: Governments are increasingly working with the private sector to gain access to different types of information about the public. For example, in Australia a Telecommunications bill was recently passed which requires all telecommunication companies to keep the metadata – though not the content data – of communications for two years. A lot of people opposed the Bill because metadata is still very informative, and as such there are some clear concerns about privacy. Similar concerns have been expressed in the UK about an Investigatory Powers Bill that would require new Internet Connection Records about customers, online activities.  How much do you think private corporations should protect people’s data? And how much should concepts like proportionality apply to them?

Ian: To me the distinction between metadata and content data is fairly meaningless. For example, often just knowing when and who someone called and for how long can tell you everything you need to know! You don’t have to see the content of the call. There are a lot of examples like this which highlight the slightly ludicrous nature of distinguishing between metadata and content data. It is all data. As has been said by former US CIA and NSA Director Gen. Michael Hayden, “we kill people based on metadata.”

One issue that we identified in the report is the increased onus on companies to monitor online spaces, and all of the legal entanglements that come from this given that companies might not be based in the same country as the users. One of our interviewees called this new international situation a ‘very different ballgame’. Working out how to deal with problematic online content is incredibly difficult, and some huge issues of freedom of speech are bound up in this. On the one hand, there is a government-led approach where we use the law to take down content. On the other hand is a broader approach, whereby social networks voluntarily take down objectionable content even if it is permissible under the law. This causes much more serious problems for human rights and the rule of law.

Read the full report: Brown, I., and Cowls, J., (2015) Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material. VOX-Pol Publications.


Ian Brown is Professor of Information Security and Privacy at the OII. His research is focused on surveillance, privacy-enhancing technologies, and Internet regulation.

Josh Cowls is a a student and researcher based at MIT, working to understand the impact of technology on politics, communication and the media.

Josh and Ian were talking to Blog Editor Bertie Vidgen.

]]>
Assessing the Ethics and Politics of Policing the Internet for Extremist Material https://ensr.oii.ox.ac.uk/assessing-the-ethics-and-politics-of-policing-the-internet-for-extremist-material/ Thu, 18 Feb 2016 22:59:20 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3558 The Internet serves not only as a breeding ground for extremism, but also offers myriad data streams which potentially hold great value to law enforcement. The report by the OII’s Ian Brown and Josh Cowls for the VOX-Pol project: Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material explores the complexities of policing the web for extremist material, and its implications for security, privacy and human rights. Josh Cowls discusses the report with blog editor Bertie Vidgen.*

*please note that the views given here do not necessarily reflect the content of the report, or those of the lead author, Ian Brown.

In terms of counter-speech there are different roles for government, civil society, and industry. Image by Miguel Discart (Flickr).

 

Ed: Josh, could you let us know the purpose of the report, outline some of the key findings, and tell us how you went about researching the topic?

Josh: Sure. In the report we take a step back from the ground-level question of ‘what are the police doing?’ and instead ask, ‘what are the ethical and political boundaries, rationale and justifications for policing the web for these kinds of activity?’ We used an international human rights framework as an ethical and legal basis to understand what is being done. We also tried to further the debate by clarifying a few things: what has already been done by law enforcement, and, really crucially, what the perspectives are of all those involved, including lawmakers, law enforcers, technology companies, academia and many others.

We derived the insights in the report from a series of workshops, one of which was held as part of the EU-funded VOX-Pol network. The workshops involved participants who were quite high up in law enforcement, the intelligence agencies, the tech industry civil society, and academia. We followed these up with interviews with other individuals in similar positions and conducted background policy research.

Ed: You highlight that many extremist groups (such as Isis) are making really significant use of online platforms to organize, radicalize people, and communicate their messages.

Josh: Absolutely. A large part of our initial interest when writing the report lay in finding out more about the role of the Internet in facilitating the organization, coordination, recruitment and inspiration of violent extremism. The impact of this has been felt very recently in Paris and Beirut, and many other places worldwide. This report pre-dates these most recent developments, but was written in the context of these sorts of events.

Given the Internet is so embedded in our social lives, I think it would have been surprising if political extremist activity hadn’t gone online as well. Of course, the Internet is a very powerful tool and in the wrong hands it can be a very destructive force. But other research, separate from this report, has found that the Internet is not usually people’s first point of contact with extremism: more often than not that actually happens offline through people you know in the wider world. Nonetheless it can definitely serve as an incubator of extremism and can serve to inspire further attacks.

Ed: In the report you identify different groups in society that are affected by, and affecting, issues of extremism, privacy, and governance – including civil society, academics, large corporations and governments

Josh: Yes, in the later stages of the report we do divide society into these groups, and offer some perspectives on what they do, and what they think about counter-extremism. For example, in terms of counter-speech there are different roles for government, civil society, and industry. There is this idea that ISIS are really good at social media, and that that is how they are powering a lot of their support; but one of the people that we spoke to said that it is not the case that ISIS are really good, it is just that governments are really bad!

We shouldn’t ask government to participate in the social network: bureaucracies often struggle to be really flexible and nimble players on social media. In contrast, civil society groups tend to be more engaged with communities and know how to “speak the language” of those who might be vulnerable to radicalization. As such they can enter that dialogue in a much more informed and effective way.

The other tension, or paradigm, that we offer in this report is the distinction between whether people are ‘at risk’ or ‘a risk’. What we try to point to is that people can go from one to the other. They start by being ‘at risk’ of radicalization, but if they do get radicalized and become a violent threat to society, which only happens in the minority of cases, then they become ‘a risk’. Engaging with people who are ‘at risk’ highlights the importance of having respect and dialogue with communities that are often the first to be lambasted when things go wrong, but which seldom get all the help they need, or the credit when they get it right. We argue that civil society is particularly suited for being part of this process.

Ed: It seems like the things that people do or say online can only really be understood in terms of the context. But often we don’t have enough information, and it can be very hard to just look at something and say ‘This is definitely extremist material that is going to incite someone to commit terrorist or violent acts’.

Josh: Yes, I think you’re right. In the report we try to take what is a very complicated concept – extremist material – and divide it into more manageable chunks of meaning. We talk about three hierarchical levels. The degree of legal consensus over whether content should be banned decreases as it gets less extreme. The first level we identified was straight up provocation and hate speech. Hate speech legislation has been part of the law for a long time. You can’t incite racial hatred, you can’t incite people to crimes, and you can’t promote terrorism. Most countries in Europe have laws against these things.

The second level is the glorification and justification of terrorism. This is usually more post-hoc as by definition if you are glorifying something it has already happened. You may well be inspiring future actions, but that relationship between the act of violence and the speech act is different than with provocation. Nevertheless, some countries, such as Spain and France, have pushed hard on criminalising this. The third level is non-violent extremist material. This is the most contentious level, as there is very little consensus about what types of material should be called ‘extremist’ even though they are non-violent. One of the interviewees that we spoke to said that often it is hard to distinguish between someone who is just being friendly and someone who is really trying to persuade or groom someone to go to Syria. It is really hard to put this into a legal framework with the level of clarity that the law demands.

There is a proportionality question here. When should something be considered specifically illegal? And, then, if an illegal act has been committed what should the appropriate response be? This is bound to be very different in different situations.

Ed: Do you think that there are any immediate or practical steps that governments can take to improve the current situation? And do you think that there any ethical concerns which are not being paid sufficient attention?

Josh: In the report we raised a few concerns about existing government responses. There are lots of things beside privacy that could be seen as fundamental human rights and that are being encroached upon. Freedom of association and assembly is a really interesting one. We might not have the same reverence for a Facebook event plan or discussion group as we would a protest in a town hall, but of course they are fundamentally pretty similar.

The wider danger here is the issue of mission creep. Once you have systems in place that can do potentially very powerful analytical investigatory things then there is a risk that we could just keep extending them. If something can help us fight terrorism then should we use it to fight drug trafficking and violent crime more generally? It feels to me like there is a technical-military-industrial complex mentality in government where if you build the systems then you just want to use them. In the same way that CCTV cameras record you irrespective of whether or not you commit a violent crime or shoplift, we need to ask whether the same panoptical systems of surveillance should be extended to the Internet. Now, to a large extent they are already there. But what should we train the torchlight on next?

This takes us back to the importance of having necessary, proportionate, and independently authorized processes. When you drill down into how rights privacy should be balanced with security then it gets really complicated. But the basic process-driven things that we identified in the report are far simpler: if we accept that governments have the right to take certain actions in the name of security, then, no matter how important or life-saving those actions are, there are still protocols that governments must follow. We really wanted to infuse these issues into the debate through the report.

Read the full report: Brown, I., and Cowls, J., (2015) Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material. VOX-Pol Publications.


Josh Cowls is a a student and researcher based at MIT, working to understand the impact of technology on politics, communication and the media.

Josh Cowls was talking to Blog Editor Bertie Vidgen.

]]>
Government “only” retaining online metadata still presents a privacy risk https://ensr.oii.ox.ac.uk/government-only-retaining-online-metadata-still-presents-a-privacy-risk/ Mon, 30 Nov 2015 08:14:56 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3514 Issues around data capture, retention and control are gaining significant attention in many Western countries — including in the UK. In this piece originally posted on the Ethics Centre Blog, the OII’s Brent Mittelstadt considers the implications of metadata retention for privacy. He argues that when considered in relation to individuals’ privacy, metadata should not be viewed as fundamentally different to data about the content of a communication.

From 13 October onwards telecommunications providers in Australia will be required to retain metadata on communications for two years. Image by r2hox (Flickr).
Since 13 October 2015 telecommunications providers in Australia have been required to retain metadata on communications for two years. Image by h2hox (Flickr)

Australia’s new data retention law for telecommunications providers, comparable to extant UK and US legislation, came into effect 13 October 2015. Telecoms and ISPs are now required to retain metadata about communications for two years to assist law enforcement agencies in crime and terrorism investigation. Despite now being in effect, the extent and types of data to be collected remain unclear. The law has been widely criticised for violating Australians’ right to privacy by introducing overly broad surveillance of civilians. The Government has argued against this portrayal. They argue the content of communications will not be retained but rather the “data about the data” – location, time, date and duration of a call.

Metadata retention raises complex ethical issues often framed in terms of privacy which are relevant globally. A popular argument is that metadata offers a lower risk of violating privacy compared to primary data – the content of communication. The distinction between the “content” and “nature” of a communication implies that if the content of a message is protected, so is the privacy of the sender and receiver.

The assumption that metadata retention is more acceptable because of its lower privacy risks is unfortunately misguided. Sufficient volumes of metadata offer comparable opportunities to generate invasive information about civilians. Consider a hypothetical. I am given access to a mobile carrier’s dataset that specifies time, date, caller and receiver identity in addition to a continuous record of location constructed with telecommunication tower triangulation records. I see from this that when John’s wife Jane leaves the house, John often calls Jill and visits her for a short period from afterwards. From this I conclude that John may be having an affair with Jill. Now consider the alternative. Instead of metadata I have access to recordings of the calls between John and Jill with which I reach the same conclusion.

From a privacy perspective the method I used to infer something about John’s marriage is trivial. In both cases I am making an intrusive inference about John based on data that describes his behaviours. I cannot be certain but in both cases I am sufficiently confident that my inference is correct based on the data available. My inferences are actionable – I treat them as if they are reliable, accurate knowledge when interacting with John. It is this willingness to act on uncertainty (which is central to ‘Big Data’) that makes metadata ethically similar to primary data. While it is comparatively difficult to learn something from metadata, the potential is undeniable. Both types allow for invasive inferences to be made about the lives and behaviours of people.

Going further, some would argue that metadata can actually be more invasive than primary data. Variables such as location, time and duration are easier to assemble into a historical record of behaviour than content. These concerns are deepened by the difficulty of “opting out” of metadata surveillance. While a person can hypothetically forego all modern communication technologies, privacy suddenly has a much higher cost in terms of quality of life.

Technologies such as encrypted communication platforms, virtual private networks (VPN) and anonymity networks have all been advocated as ways to subvert metadata collection by hiding aspects of your communications. It is worth remembering that these techniques remain feasible only so long as they remain legal, one has the technical knowledge and (in some cases) ability to pay. These technologies raise a question of whether a right to anonymity exists. Perhaps privacy enhancing technologies are immoral? Headlines about digital piracy and the “dark web” show how quickly technologically hiding one’s identity and behaviours can take on a criminal and immoral tone. The status quo of privacy subtly shifts when techniques to hide aspects of one’s personal life are portrayed as necessarily subversive. The technologies to combat metadata retention are not criminal or immoral – they are privacy enhancing technologies.

Privacy is historically a fundamental human value. Individuals have a right to privacy. Violations must be justified by a competing interest. In discussing the ethics of metadata retention and anonymity technologies it is easy to forget this status quo. Privacy is not something that individuals have to justify or argue for – it should be assumed.


Brent Mittelstadt is a Postdoctoral Research Fellow at the Oxford Internet Institute working on the ‘Ethics of Biomedical Big Data‘ project with Prof. Luciano Floridi. His research interests include the ethics of information handled by medical ICT, theoretical developments in discourse and virtue ethics, and epistemology of information.

]]>
Ethics in Networked Systems Research: ACM SigComm Workshop Report https://ensr.oii.ox.ac.uk/ethics-in-networked-systems-research-acm-sigcomm-workshop-report/ Tue, 15 Sep 2015 09:58:17 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3383 Network-home
The image shows the paths taken through the Internet to reach a large number of DNS servers in China used in experiments on DNS censorship by Joss Wright and Ning Wang, where they queried blocked domain names across China to discover patterns in where the network filtered DNS requests, and how it responded.

To maintain an open and working Internet, we need to make sense of how the complex and decentralised technical system operates. Research groups, governments, and companies have dedicated teams working on highly technical research and experimentation to make sense of information flows and how these can be affected by new developments, be they intentional or due to unforeseen consequences of decisions made in another domain.

These teams, composed of network engineers and computer scientists, therefore analyse Internet data transfers, typically by collecting data from devices of large groups of individuals as well as organisations. The Internet, however, has become a complex and global socio-technical information system that mediates a significant amount of our social or professional activities, relationships, as well as mental processes. Experimentation and research on the Internet therefore require ethical scrutiny in order to give useful feedback to engineers and researchers about the social impact of their work.

The organising committee of the Association of Computing Machinery (ACM) SigComm (Signal Communications) conference has regularly encountered paper submissions that can be considered dubious from an ethical point of view. A strong debate on the research ethics of the ACM was sparked by the paper entitled “Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests,” among others submitted for the 2015 conference. In the study, researchers directed unsuspecting Internet users to test potential censorship systems in their country by directing their browser to specified URLs that could be blocked in their jurisdiction. Concerns were raised about whether this could be considered ‘human subject research’ and whether the unsuspecting users could be harmed as a result of this experiment. Consider, for example, a Chinese citizen continuously requesting the Falun Gong website from their Beijing-based laptop with no knowledge of this occurring whatsoever.

As a result of these discussions, the ACM realised that there was no formal procedure or methodology in place to make informed decisions about the ethical dimensions of such research. The conference therefore hosted a one-day workshop led by the OII’s Ethics in Networked Systems Research (ENSR) project. The day brought together 55 participants from different academic disciplines, ranging from computer science to philosophy, law, sociology, and social science. As part of a broader mission to establish ethical guidelines for Internet research, the aim of the workshop was to inform participants about the pressing ethical issues of the network measurement discipline, and to exchange ideas, reasoning, and proposed solutions.

The workshop began with two interactive sessions in which participants split into small, multidisciplinary groups to debate the submitted papers. Participants recorded their thoughts on key issues that emerged in the discussions. The remaining sessions of the day concentrated on the main themes surfacing from these notes as well as the active feedback of attendees. In this manner, participants from both sides of the debate — that is, the technical researchers and the non-technical researchers — were able to continually quiz each other about the strengths and weaknesses of their approach. The workshop’s emphasis on collaboration across academic disciplines, thereby creating an interdisciplinary community of researchers interested in Internet ethics, aimed to create a more solid foundation for building functional ethical standards in this area.

The interactive discussions yielded some particularly interesting recommendations regarding both the general ethical governance of computer science research as well as particular pressing issues. The main suggestion of the workshop was to create a procedure for an iterative approach to ethical review, whereby the relevant authority (e.g. conference programme committee, institutional ethics board, journal editor, funding agencies) and the researchers could engage in a dialogue about the impact of research, rather than have these restricted by a top-down, one-time decision of the authority.

This approach could be supported by the guidelines that the OII’s ENSR project is currently drafting. Further, participants explored to what extent computer ethics can be taught as part of every module of computer science degrees, rather than the current generic ethics courses generally taught to engineering students. This adjustment would thereby allow aspiring technical researchers to develop a hands-on sense of the social and ethical implications of new technologies and methodologies. Participants agreed that this idea would take an intensive department-wide effort, but would be very worthwhile in the end.

In more practical discussions, participants exchanged views on a wide range of potential solutions or approaches to ethical issues resulting from Internet research. For example, technical researchers struggling with obtaining  informed consent were advised to focus their efforts on user-risk mitigation (with many nuances that exceed this blog post). For those studying the Internet in foreign countries, participants recommended running a few probes with the proposed methodology. This exploratory study would then serve to underpin an informed discussion on the possible social implications of the project with organizations and researchers who are more knowledgeable of the local context (e.g. anthropologists, sociologists or NGOs, among others).

Other concrete measures proposed to improve academic research included: fictionalizing rejected case studies to help researchers understand reasons for rejection without creating a ‘hall of shame’; generating a list of basic ethical questions that all papers should answer in the proposal phase; and starting a dialogue with other research communities in analogous situations concerning ethics.

The workshop comprised some high-level discussions to get participants on the same page, and deep dives into specific topics to generate some concrete solutions. As participants wrote down their thoughts on post-it notes, the next steps will be to categorise these notes, develop initial draft guidelines, and discuss these with all participants on the dedicated mailing list.

If you would like to join this mailing list, please e-mail bendert.zevenbergen [at] oii.ox.ac.uk! More detailed write-ups of the workshop outcomes will be published in due course.


Ben ZevenbergenBen Zevenbergen is a student at the Oxford Internet Institute pursuing a DPhil on the intersection of privacy law, technology, social science, and the Internet. He runs a side project that aims to establish ethics guidelines for Internet research, as well as working in multidisciplinary teams such as the EU funded Network of Excellence in Internet Science. He has worked on legal, political and policy aspects of the information society for several years. Most recently he was a policy advisor to an MEP in the European Parliament, working on Europe’s Digital Agenda. Previously Ben worked as an ICT/IP lawyer and policy consultant in the Netherlands. Ben holds a degree in law, specialising in Information Law.

Pamina Smith currently serves as an Assistant Editor at the Oxford Internet Institute and recently completed an MPhil in Comparative Social Policy at the University of Oxford. She previously worked as Assistant Policy Officer at the European Commission, handling broadband policy and telecommunications regulation, and has a degree in the History and Literature of Modern Europe from Harvard College.

 

]]>
Should we use old or new rules to regulate warfare in the information age? https://ensr.oii.ox.ac.uk/should-we-use-old-or-new-rules-to-regulate-warfare-in-the-information-age/ https://ensr.oii.ox.ac.uk/should-we-use-old-or-new-rules-to-regulate-warfare-in-the-information-age/#comments Mon, 09 Mar 2015 12:43:21 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3171 Caption
Critical infrastructures such as electric power grids are susceptible to cyberwarfare, leading to economic disruption in the event of massive power outages. Image courtesy of Pacific Northwest National Laboratory.

Before the pervasive dissemination of Information and Communication Technologies (ICTs), the use of information in war waging referred to intelligence gathering and propaganda. In the age of the information revolution things have radically changed. Information has now acquired a pivotal role in contemporary warfare, for it has become both an effective target and a viable means. These days, we use ‘cyber warfare’ to refer to the use of ICTs by state actors to disruptive (or even destructive) ends.

As contemporary societies grow increasingly dependant on ICTs, any form of attack that involves their informational infrastructures poses serious risks and raises the need for adequate defence and regulatory measures. However, such a need contrasts with the novelty of this phenomenon, with cyber warfare posing a radical shift in the paradigm within which warfare has been conceived so far. In the new paradigm, impairment of functionality, disruption, and reversible damage substitute for bloodshed, destruction, and casualties. At the same time, the intangible environment (the cyber sphere), targets, and agents substitute for beings in blood and flesh, firearms, and physical targets (at least in the non-kinetic instances of cyber warfare).

The paradigm shift raises questions about the adequacy and efficacy of existing laws and ethical theories for the regulation of cyber warfare. Military experts, strategy planners, law- and policy-makers, philosophers, and ethicists all participate in discussions around this problem. The debate is polarised around two main approaches: (1) the analogy approach, and (2) the discontinuous approach. The former stresses that the regulatory gap concerning cyber warfare is only apparent, insofar as cyber conflicts are not radically different from other forms of conflicts. As Schmitt put it “a thick web of international law norms suffuses cyber-space. These norms both outlaw many malevolent cyber-operations and allow states to mount robust responses. The UN Charter, NATO Treaty, Geneva Conventions, the first two Additional Protocols, and Convention restricting or prohibiting the use of certain conventional weapons are more than sufficient to regulate cyber warfare; all that is needed is an in-depth analysis of such laws and an adequate interpretation. This is the approach underpinning, for example, the so-called Tallinn Manual.

The opposite position, the discontinuous approach, stresses the novelty of cyber conflicts and maintains that existing ethical principles and laws are not adequate to regulate this phenomenon. Just War Theory is the main object of contention in this case. Those defending this approach argue that Just War Theory is not the right conceptual tool to address non-kinetic forms of warfare, for it assumes bloody and violent warfare occurring in the physical domain. This view sees cyber warfare as one of the most compelling signs of the information revolution — as Luciano Floridi has put it “those who live by the digit, die by the digit”. As such, it claims that any successful attempt to regulate cyber warfare cannot ignore the conceptual and ethical changes that such a revolution has brought about.

These two approaches have proceeded in parallel over the last decade, stalling rather than fostering a fruitful debate. There is therefore a clear need to establish a coordinated interdisciplinary approach that allows for experts with different backgrounds to collaborate and find a common ground to overcome the polarisation of the discussion. This is precisely the goal of the project financed by the NATO Cooperative Cyber Defence Centre of Excellence (NATO CCD COE) and that I co-led with Lt Glorioso, a representative of the Centre. The project has convened a series of workshops gathering international experts in the fields of law, military strategies, philosophy, and ethics to discuss the ethical and regulatory problems posed by cyber warfare.

The first workshop was held in 2013 at the Centro Alti Studi Difesa in Rome and had the goal of launching an interdisciplinary and coordinated approach to the problems posed by cyber warfare. The second event was hosted in last November at Magdalen College, Oxford. It relied on the approach established in 2013 to foster an interdisciplinary discussion on issues concerning attribution, the principle of proportionality, the distinction between combatant and non-combatant, and the one between pre-emption and prevention. A report on the workshop has now been published surveying the main positions and the key discussion points that emerged during the meeting.

One of most relevant points concerned the risks that cyber warfare poses for the established political equilibrium and maintaining peace. The risk of escalation, both in the nature and in the number of conflicts, was perceived as realistic by both the speakers and the audience attending the workshop. Deterrence therefore emerged as one of the most pressing challenges posed by cyber warfare – and one that experts need to take into account in their efforts to develop new forms of regulation in support of peace and stability in the information age.

Read the full report: Corinne J.N. Cath, Ludovica Glorioso, Maria Rosaria Taddeo (2015) Ethics and Policies for Cyber Warfare [PDF, 400kb]. Report on the NATO CCD COE Workshop on ‘Ethics and Policies for Cyber Warfare’, Magdalen College, Oxford, 11-12 November 2014.


Dr Mariarosaria Taddeo is a researcher at the Oxford Internet Institute, University of Oxford. Her main research areas are information and computer ethics, philosophy of information, philosophy of technology, ethics of cyber-conflict and cyber-security, and applied ethics. She also serves as president of the International Association for Computing and Philosophy.

]]>
https://ensr.oii.ox.ac.uk/should-we-use-old-or-new-rules-to-regulate-warfare-in-the-information-age/feed/ 1
The Future of Europe is Science — and ethical foresight should be a priority https://ensr.oii.ox.ac.uk/the-future-of-europe-is-science/ Thu, 20 Nov 2014 17:15:38 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3014 On October 6 and 7, the European Commission, with the participation of Portuguese authorities and the support of the Champalimaud Foundation, organised in Lisbon a high-level conference on “The Future of Europe is Science”. Mr. Barroso, President of the European Commission, opened the meeting. I had the honour of giving one of the keynote addresses.

The explicit goal of the conference was twofold. On the one hand, we tried to take stock of European achievements in science, engineering, technology and innovation (SETI) during the last 10 years. On the other hand, we looked into potential future opportunities that SETI may bring to Europe, both in economic terms (growth, jobs, new business opportunities) and in terms of wellbeing (individual welfare and higher social standards).

One of the most interesting aspects of the meeting was the presentation of the latest report on “The Future of Europe is Science” by the President’s Science and Technology Advisory Council (STAC). The report addresses some very big questions: How will we keep healthy? How will we live, learn, work and interact in the future? How will we produce and consume and how will we manage resources? It also seeks to outline some key challenges that will be faced by Europe over the next 15 years. It is well written, clear, evidence-based and convincing. I recommend reading it. In what follows, I wish to highlight three of its features that I find particularly significant.

First, it is enormously refreshing and reassuring to see that the report treats science and technology as equally important and intertwined. The report takes this for granted, but anyone stuck in some Greek dichotomy between knowledge (episteme, science) and mere technique (techne, technology) will be astonished. While this divorcing of the two has always been a bad idea, it is still popular in contexts where applied science, e.g. applied physics or engineering, is considered a Cinderella. During my talk, I referred to Galileo as a paradigmatic scientist who had to be innovative in terms of both theories and instruments.

Today, technology is the outcome of innovative science and there is almost no science that is independent of technology, in terms of reliance on digital data and processing or (and this is often an inclusive or) in terms of investigations devoted to digital phenomena, e.g. in the social sciences. Of course, some Fields Medallists may not need computers to work, and may not work on computational issues, but they represent an exception. This year, Hiroshi Amano, Shuji Nakamura and Isamu Akasaki won the Nobel in physics “for the invention of efficient blue light-emitting diodes which has enabled bright and energy-saving white light sources”. Last year, François Englert and Peter Higgs were awarded the Nobel in physics “for the theoretical discovery of a mechanism that contributes to our understanding of the origin of mass of subatomic particles, and which recently was confirmed through the discovery of the predicted fundamental particle, by the ATLAS and CMS experiments at CERN’s Large Hadron Collider”. Without the technologically sophisticated work done at CERN, their theoretical discovery would have remained unsupported. The hope is that universities, research institutions, R&D centres as well as national research agencies will follow the approach espoused by STAC and think strategically in terms of technoscience.

The second point concerns some interesting statistics. The report uses several sources—especially the 2014 Eurobarometer survey of “Public perception of science, research and innovation”—to analyse and advise about the top priorities for SETI over the next 15 years, as identified by EU respondents. The picture that emerges is an ageing population worried, first of all, about its health, then about its children’s jobs, and only after that about the environment: 55 % of respondents identified “health and medical care” as among what they thought should be the main priorities for science and technological development over the next 15 years; 49 % opted for “job creation”; 33 % privileged “education and skills”. So we spent most of the meeting in Lisbon discussing these three areas. Other top priorities include “protection of the environment” (30 %), “energy supply” (25 %) and the “fight against climate change” (22 %).

So far so predictable, although it is disappointing to see such a low concern about the environment, a clear sign that even educated Europeans (with the exception of Danish and Swedish respondents) may not be getting the picture: there is no point in being healthy and employed in a desert. Yet this is not what I wish to highlight. Rather, on p. 14 of the report, the authors themselves admit that: “Contrary to our expectations, citizens do not consider the protection of personal data to be a high priority for SET in the next 15 years (11 %)”. This is very interesting. As a priority, data protection ranks as low as quality of housing: nice, but very far from essential. The authors quickly add that “but this might change in the future if citizens are confronted with serious security problems”.

They are right, but the point remains that, at the moment, all the fuss about privacy in the EU is a political rather than a social priority. Recall that this is an ageing population of grown-ups, not a bunch of teenagers in love with pictures of cats and friends online, allegedly unable to appreciate what privacy means (a caricature increasingly unbelievable anyway). Perhaps we “do not get it” when we should (a bit like the environmental issues) and need to be better informed. Or perhaps we are informed and still think that other issues are much more pressing. Either way, our political representatives should take notice.

Finally, and most importantly, the report contains a recommendation that I find extremely wise and justified. On p. 19, the Advisory Council acknowledges that, among the many foresight activities to be developed by the Commission, one in particular “should also be a priority”: ethical foresight. This must be one of the first times that ethical foresight is theorised as a top priority in the development of science and technology. The recommendation is based on the crucial and correct realisation that ethical choices, values, options and constraints influence the world of SETI much more than any other force. The evaluation of what is morally good, right or necessary shapes public opinion, hence the socially acceptable and the politically feasible and so, ultimately, the legally enforceable.

In the long run, business is constrained by law, which is constrained by ethics. This essential triangle means that—in the context of technoscientific research, development and innovation—ethics cannot be a mere add-on, an afterthought, a latecomer or an owl of Minerva that takes its flight only when the shades of night are gathering, once bad solutions have been implemented and mistakes have been made. Ethics must sit at the table of policy-making and decision-taking procedures from day one. It must inform our strategies about SETI especially at the beginning, when changing the course of action is easier and less costly, in terms of resources and impact. We must think twice but above all we must think before taking important steps, in order to avoid wandering into what Galileo defined as the dark labyrinth of ignorance.

As I stressed at the end of my keynote, the future of Europe is science, and this is why our priority must be ethics now.

Read the editorial: Floridi, L. (2014) Technoscience and Ethics Foresight. Editorial, Philosophy & Technology 27 (4) 499-501.


Luciano Floridi is the OII’s Professor of Philosophy and Ethics of Information. His research areas are the philosophy of Information, information and computer ethics, and the philosophy of technology. His most recent book is The Fourth Revolution – How the infosphere is reshaping human reality (2014, Oxford University Press).

]]>
Designing Internet technologies for the public good https://ensr.oii.ox.ac.uk/designing-internet-technologies-for-the-public-good/ https://ensr.oii.ox.ac.uk/designing-internet-technologies-for-the-public-good/#comments Wed, 08 Oct 2014 11:48:59 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2887
Caption
MEPs failed to support a Green call to protect Edward Snowden as a whistleblower, in order to allow him to give his testimony to the European Parliament in March. Image by greensefa.
Computers have developed enormously since the Second World War: alongside a rough doubling of computer power every two years, communications bandwidth and storage capacity have grown just as quickly. Computers can now store much more personal data, process it much faster, and rapidly share it across networks.

Data is collected about us as we interact with digital technology, directly and via organisations. Many people volunteer data to social networking sites, and sensors – in smartphones, CCTV cameras, and “Internet of Things” objects – are making the physical world as trackable as the virtual. People are very often unaware of how much data is gathered about them – let alone the purposes for which it can be used. Also, most privacy risks are highly probabilistic, cumulative, and difficult to calculate. A student sharing a photo today might not be thinking about a future interview panel; or that the heart rate data shared from a fitness gadget might affect future decisions by insurance and financial services (Brown 2014).

Rather than organisations waiting for something to go wrong, then spending large amounts of time and money trying (and often failing) to fix privacy problems, computer scientists have been developing methods for designing privacy directly into new technologies and systems (Spiekermann and Cranor 2009). One of the most important principles is data minimization; that is, limiting the collection of personal data to that needed to provide a service – rather than storing everything that can be conveniently retrieved. This limits the impact of data losses and breaches, for example by corrupt staff with authorised access to data – a practice that the UK Information Commissioner’s Office (2006) has shown to be widespread.

Privacy by design also protects against function creep (Gürses et al. 2011). When an organisation invests significant resources to collect personal data for one reason, it can be very tempting to use it for other purposes. While this is limited in the EU by data protection law, government agencies are in a good position to push for changes to national laws if they wish, bypassing such “purpose limitations”. Nor do these rules tend to apply to intelligence agencies.

Another key aspect of putting users in control of their personal data is making sure they know what data is being collected, how it is being used – and ideally being asked for their consent. There have been some interesting experiments with privacy interfaces, for example helping smartphone users understand who is asking for their location data, and what data has been recently shared with whom.

Smartphones have enough storage and computing capacity to do some tasks, such as showing users adverts relevant to their known interests, without sharing any personal data with third parties such as advertisers. This kind of user-controlled data storage and processing has all kinds of applications – for example, with smart electricity meters (Danezis et al. 2013), and congestion charging for roads (Balasch et al. 2010).

What broader lessons can be drawn about shaping technologies for the public good? What is the public good, and who gets to define it? One option is to look at opinion polling about public concerns and values over long periods of time. The European Commission’s Eurobarometer polls reveal that in most European countries (including the UK), people have had significant concerns about data privacy for decades.

A more fundamental view of core social values can be found at the national level in constitutions, and between nations in human rights treaties. As well as the protection of private life and correspondence in the European Convention on Human Rights’ Article 8, the freedom of thought, expression, association and assembly rights in Articles 9-11 (and their equivalents in the US Bill of Rights, and the International Covenant on Civil and Political Rights) are also relevant.

This national and international law restricts how states use technology to infringe human rights – even for national security purposes. There are several US legal challenges to the constitutionality of NSA communications surveillance, with a federal court in Washington DC finding that bulk access to phone records is against the Fourth Amendment [1] (but another court in New York finding the opposite [2]). The UK campaign groups Big Brother Watch, Open Rights Group, and English PEN have taken a case to the European Court of Human Rights, arguing that UK law in this regard is incompatible with the Human Rights Convention.

Can technology development be shaped more broadly to reflect such constitutional values? One of the best-known attempts is the European Union’s data protection framework. Privacy is a core European political value, not least because of the horrors of the Nazi and Communist regimes of the 20th century. Germany, France and Sweden all developed data protection laws in the 1970s in response to the development of automated systems for processing personal data, followed by most other European countries. The EU’s Data Protection Directive (95/46/EC) harmonises these laws, and has provisions that encourage organisations to use technical measures to protect personal data.

An update of this Directive, which the European parliament has been debating over the last year, more explicitly includes this type of regulation by technology. Under this General Data Protection Regulation, organisations that are processing personal data will have to implement appropriate technical measures to protect Regulation rights. By default, organisations should only collect the minimum personal data they need, and allow individuals to control the distribution of their personal data. The Regulation would also require companies to make it easier for users to download all of their data, so that it could be uploaded to a competitor service (for example, one with better data protection) – bringing market pressure to bear (Brown and Marsden 2013).

This type of technology regulation is not uncontroversial. The European Commissioner responsible until July for the Data Protection Regulation, Viviane Reding, said that she had seen unprecedented and “absolutely fierce” lobbying against some of its provisions. Legislators would clearly be foolish to try and micro-manage the development of new technology. But the EU’s principles-based approach to privacy has been internationally influential, with over 100 countries now having adopted the Data Protection Directive or similar laws (Greenleaf 2014).

If the EU can find the right balance in its Regulation, it has the opportunity to set the new global standard for privacy-protective technologies – a very significant opportunity indeed in the global marketplace.

[1] Klayman v. Obama, 2013 WL 6571596 (D.D.C. 2013)

[2] ACLU v. Clapper, No. 13-3994 (S.D. New York December 28, 2013)

References

Balasch, J., Rial, A., Troncoso, C., Preneel, B., Verbauwhede, I. and Geuens, C. (2010) PrETP: Privacy-preserving electronic toll pricing. 19th USENIX Security Symposium, pp. 63–78.

Brown, I. (2014) The economics of privacy, data protection and surveillance. In J.M. Bauer and M. Latzer (eds.) Research Handbook on the Economics of the Internet. Cheltenham: Edward Elgar.

Brown, I. and Marsden, C. (2013) Regulating Code: Good Governance and Better Regulation in the Information Age. Cambridge, MA: MIT Press.

Danezis, G., Fournet, C., Kohlweiss, M. and Zanella-Beguelin, S. (2013) Smart Meter Aggregation via Secret-Sharing. ACM Smart Energy Grid Security Workshop.

Greenleaf, G. (2014) Sheherezade and the 101 data privacy laws: Origins, significance and global trajectories. Journal of Law, Information & Science.

Gürses, S., Troncoso, C. and Diaz, C. (2011) Engineering Privacy by Design. Computers, Privacy & Data Protection.

Haddadi, H, Hui, P., Henderson, T. and Brown, I. (2011) Targeted Advertising on the Handset: Privacy and Security Challenges. In Müller, J., Alt, F., Michelis, D. (eds) Pervasive Advertising. Heidelberg: Springer, pp. 119-137.

Information Commissioner’s Office (2006) What price privacy? HC 1056.

Spiekermann, S. and Cranor, L.F. (2009) Engineering Privacy. IEEE Transactions on Software Engineering 35 (1).


Read the full article: Keeping our secrets? Designing Internet technologies for the public good, European Human Rights Law Review 4: 369-377. This article is adapted from Ian Brown’s 2014 Oxford London Lecture, given at Church House, Westminster, on 18 March 2014, supported by Oxford University’s Romanes fund.

Professor Ian Brown is Associate Director of Oxford University’s Cyber Security Centre and Senior Research Fellow at the Oxford Internet Institute. His research is focused on information security, privacy-enhancing technologies, and Internet regulation.

]]>
https://ensr.oii.ox.ac.uk/designing-internet-technologies-for-the-public-good/feed/ 1
Ethical privacy guidelines for mobile connectivity measurements https://ensr.oii.ox.ac.uk/ethical-privacy-guidelines-for-mobile-connectivity-measurements/ Thu, 07 Nov 2013 16:01:33 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2386 Caption
Four of the 6.8 billion mobile phones worldwide. Measuring the mobile Internet can expose information about an individual’s location, contact details, and communications metadata. Image by Cocoarmani.

Ed: GCHQ / the NSA aside … Who collects mobile data and for what purpose? How can you tell if your data are being collected and passed on?

Ben: Data collected from mobile phones is used for a wide range of (divergent) purposes. First and foremost, mobile operators need information about mobile phones in real-time to be able to communicate with individual mobile handsets. Apps can also collect all sorts of information, which may be necessary to provide entertainment, location specific services, to conduct network research and many other reasons.

Mobile phone users usually consent to the collection of their data by clicking “I agree” or other legally relevant buttons, but this is not always the case. Sometimes data is collected lawfully without consent, for example for the provision of a mobile connectivity service. Other times it is harder to substantiate a relevant legal basis. Many applications keep track of the information that is generated by a mobile phone and it is often not possible to find out how the receiver processes this data.

Ed: How are data subjects typically recruited for a mobile research project? And how many subjects might a typical research data set contain?

Ben: This depends on the research design; some research projects provide data subjects with a specific app, which they can use to conduct measurements (so called ‘active measurements’). Other apps collect data in the background and, in effect, conduct local surveillance of the mobile phone use (so called passive measurements). Other research uses existing datasets, for example provided by telecom operators, which will generally be de-identified in some way. We purposely do not use the term anonymisation in the report, because much research and several case studies have shown that real anonymisation is very difficult to achieve if the original raw data is collected about individuals. Datasets can be re-identified by techniques such as fingerprinting or by linking them with existing, auxiliary datasets.

The size of datasets differs per release. Telecom operators can provide data about millions of users, while it will be more challenging to reach such a number with a research specific app. However, depending on the information collected and provided, a specific app may provide richer information about a user’s behaviour.

Ed: What sort of research can be done with this sort of data?

Ben: Data collected from mobile phones can reveal much interesting and useful information. For example, such data can show exact geographic locations and thus the movements of the owner, which can be relevant for the social sciences. On a larger scale, mass movements of persons can be monitored via mobile phones. This information is useful for public policy objectives such as crowd control, traffic management, identifying migration patterns, emergency aid, etc. Such data can also be very useful for commercial purposes, such as location specific advertising, studying the movement of consumers, or generally studying the use of mobile phones.

Mobile phone data is also necessary to understand the complex dynamics of the underlying Internet architecture. The mobile Internet is has different requirements than the fixed line Internet, so targeted investments in future Internet architecture will need to be assessed by detailed network research. Also, network research can study issues such as censorship or other forms of blocking information and transactions, which are increasingly carried out through mobile phones. This can serve as early warning systems for policy makers, activists and humanitarian aid workers, to name only a few stakeholders.

Ed: Some of these research datasets are later published as ‘open data’. What sorts of uses might researchers (or companies) put these data to? Does it tend to be mostly technical research, or there also social science applications?

Ben: The intriguing characteristic of the open data concept is that secondary uses can be unpredictable. A re-use is not necessarily technical, even if the raw data has been collected for a purely technical network research. New social science research could be based on existing technical data, or existing research analyses may be falsified or validated by other researchers. Artists, developers, entrepreneurs or public authorities can also use existing data to create new applications or to enrich existing information systems. There have been many instances when open data has been re-used for beneficial or profitable means.

However, there is also a flipside to open data, especially when the dataset contains personal information, or information that can be linked to individuals. A working definition of open data is that one makes entire databases available, in standardized, machine readable and electronic format, to any secondary user, free of charge and free of restrictions or obligations, for any purpose. If a dataset contains information about your Internet browsing habits, your movements throughout the day or the phone numbers you have called over a specific period of time, it could be quite troubling if you have no control over who re-uses this information.

The risks and harms of such re-use are very context dependent, of course. In the Western world, such data could be used as means for blackmail, stalking, identity theft, unsolicited commercial communications, etc. Further, if there is a chance our telecom operators just share data on how we use our mobile phones, we may refrain from activities, such as taking part in demonstrations, attending political gatherings, or accessing certain socially unacceptable information. Such self-censorship will damage the free society we expect. In the developing world, or in authoritarian regimes, risks and harms can be a matter of life and death for data subjects, or at least involve the risk of physical harm. This is true for all citizens, but also diplomats, aid workers and journalists or social media users.

Finally, we cannot envisage how political contexts will change in the future. Future malevolent governments, even in Europe or the US, could easily use datasets containing sensitive information to harm or control specific groups of society. One only need look at the changing political landscape in Hungary to see how specific groups are suddenly targeted in what we thought was becoming a country that adheres to Western values.

Ed: The ethical privacy guidelines note the basic relation between the level of detail in information collected and the resulting usefulness of the dataset (datasets becoming less powerful as subjects are increasingly de-identified). This seems a fairly intuitive and fundamentally unavoidable problem; is there anything in particular to say about it?

Ben: Research often requires rich datasets for worthwhile analyses to be conducted. These will inevitably sometimes contain personal information, as it can be important to relate specific data to data subjects, whether anonymised, pseudonymised or otherwise. Far reaching deletion, aggregation or randomisation of data can make the dataset useless for the research purposes.

Sophisticated methods of re-identifying datasets, and unforeseen methods which will be developed in future, mean that much information must be deleted or aggregated in order for a dataset containing personal information to be truly anonymous. It has become very difficult to determine when a dataset is sufficiently anonymised to the extent that it can enjoy the legal exception offered by data protection laws around the world and therefore be distributed as open data, without legal restrictions.

As a result, many research datasets cannot simply be released. The guidelines do not force the researcher to a zero-risk situation, where only useless or meaningless datasets can be released. The guidelines force the researcher to think very carefully about the type of data that will be collected, about data processing techniques and different disclosure methods. Although open data is an attractive method of disseminating research data, sometimes managed access systems may be more appropriate. The guidelines constantly trigger the researcher to consider the risks to data subjects in their specific context during each stage of the research design. They serve as a guide, but also a normative framework for research that is potentially privacy invasive.

Ed: Presumably mobile companies have a duty to delete their data after a certain period; does this conflict with open datasets, whose aim is to be available indefinitely?

Ben: It is not a requirement for open data to be available indefinitely. However, once information is published freely on the Internet, it is very hard – if not impossible – to delete it. The researcher loses all control over a dataset once it is published online. So, if a dataset is sufficiently de-identified for the re-identification techniques that are known today, this does not mean that future techniques cannot re-identify the dataset. We can’t expect researchers to take into account all science-fiction type future developments, but the guidelines to force the researcher to consider what successful re-identification would reveal about data subjects.

European mobile phone companies do have a duty to keep logs of communications for 6 months to 2 years, depending on the implication of the misguided data retention directive. We have recently learned that intelligence services worldwide have more or less unrestricted access to such information. We have no idea how long this information is stored in practice. Recently it has been frequently been stated that deleting data has become more expensive than just keeping it. This means that mobile phone operators and intelligence agencies may keep data on our mobile phone use forever. This must be taken into account when assessing which auxiliary datasets could be used to re-identify a research dataset. An IP-address could be sufficient to link much information to an individual.

Ed: Presumably it’s impossible for a subject to later decide they want to be taken out of an open dataset; firstly due to cost, but also because (by definition) it ought to be impossible to find them in an anonymised dataset. Does this present any practical or legal problems?

Ben: In some countries, especially in Europe, data subjects have a legal right to object to their data being processed, by withdrawing consent or engaging in a legal procedure with the data processor. Although this is an important right, exercising it may lead to undesirable consequences for research. For example, the underlying dataset will be incomplete for secondary researchers who want to validate findings.

Our guidelines encourage researchers to be transparent about their research design, data processing and foreseeable secondary uses of the data. On the one hand, this builds trust in the network research discipline. On the other, it gives data subjects the necessary information to feel confident to share their data. Still, data subjects should be able to retract their consent via electronic means, instead of sending letters, if they can substantiate an appreciable harm to them.

Ed: How aware are funding bodies and ethics boards of the particular problems presented by mobile research; and are they categorically different from other human-subject research data? (eg interviews / social network data / genetic studies etc.)

Ben: University ethical boards or funding bodies are be staffed by experts in a wide range of disciplines. However, this does not mean they understand the intricate details of complex Internet measurements, de-identification techniques or the state of affairs with regards to re-identification techniques, nor the harms a research programme can inflict given a specific context. For example, not everyone’s intuitive moral privacy compass will be activated when they read in a research proposal that the research systems will “monitor routing dynamics, by analysing packet traces collected from cell towers and internet exchanges”, or similar sentences.

Our guidelines encourage the researcher to write up the choices made with regards to personal information in a manner that is clear and understandable for the layperson. Such a level of transparency is useful for data subjects —  as well as ethical boards and funding bodies — to understand exactly what the research entails and how risks have been accommodated.

Ed: Linnet Taylor has already discussed mobile data mining from regions of the world with weak privacy laws: what is the general status of mobile privacy legislation worldwide?

Ben: Privacy legislation itself is about as fragmented and disputed as it gets. The US generally treats personal information as a commodity that can be traded, which enables Internet companies in Silicon Valley to use data as the new raw material in the information age. Europe considers privacy and data protection as a fundamental right, which is currently regulated in detail, albeit based on a law from 1995. The review of European data protection regulation has been postponed to 2015, possibly as a result of the intense lobbying effort in Brussels to either weaken or strengthen the proposed law. Some countries have not regulated privacy or data protection at all. Other countries have a fundamental right to privacy, which is not further developed in a specific data protection law and thus hardly enforced. Another group of countries have transplanted the European approach, but do not have the legal expertise to apply the 1995 law to the digital environment. The future of data protection is very much up in the air and requires much careful study.

The guidelines we have publishing take the international human rights framework as a base, while drawing inspiration from several existing legal concepts such as data minimisation, purpose limitation, privacy by design and informed consent. The guidelines give a solid base for privacy aware research design. We do encourage researchers to discuss their projects with colleagues and legal experts as much as possible, though, because best practices and legal subtleties can vary per country, state or region.

Read the guidelines: Zevenbergen, B., Brown,I., Wright, J., and Erdos, D. (2013) Ethical Privacy Guidelines for Mobile Connectivity Measurements. Oxford Internet Institute, University of Oxford.


Ben Zevenbergen was talking to blog editor David Sutcliffe.

]]>
The promises and threats of big data for public policy-making https://ensr.oii.ox.ac.uk/promises-threats-big-data-for-public-policy-making/ https://ensr.oii.ox.ac.uk/promises-threats-big-data-for-public-policy-making/#comments Mon, 28 Oct 2013 15:07:29 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2299 The environment in which public policy is made has entered a period of dramatic change. Widespread use of digital technologies, the Internet and social media means both citizens and governments leave digital traces that can be harvested to generate big data. Policy-making takes place in an increasingly rich data environment, which poses both promises and threats to policy-makers.

On the promise side, such data offers a chance for policy-making and implementation to be more citizen-focused, taking account of citizens’ needs, preferences and actual experience of public services, as recorded on social media platforms. As citizens express policy opinions on social networking sites such as Twitter and Facebook; rate or rank services or agencies on government applications such as NHS Choices; or enter discussions on the burgeoning range of social enterprise and NGO sites, such as Mumsnet, 38 degrees and patientopinion.org, they generate a whole range of data that government agencies might harvest to good use. Policy-makers also have access to a huge range of data on citizens’ actual behaviour, as recorded digitally whenever citizens interact with government administration or undertake some act of civic engagement, such as signing a petition.

Data mined from social media or administrative operations in this way also provide a range of new data which can enable government agencies to monitor – and improve – their own performance, for example through log usage data of their own electronic presence or transactions recorded on internal information systems, which are increasingly interlinked. And they can use data from social media for self-improvement, by understanding what people are saying about government, and which policies, services or providers are attracting negative opinions and complaints, enabling identification of a failing school, hospital or contractor, for example. They can solicit such data via their own sites, or those of social enterprises. And they can find out what people are concerned about or looking for, from the Google Search API or Google trends, which record the search patterns of a huge proportion of internet users.

As for threats, big data is technologically challenging for government, particularly those governments which have always struggled with large-scale information systems and technology projects. The UK government has long been a world leader in this regard and recent events have only consolidated its reputation. Governments have long suffered from information technology skill shortages and the complex skill sets required for big data analytics pose a particularly acute challenge. Even in the corporate sector, over a third of respondents to a recent survey of business technology professionals cited ‘Big data expertise is scarce and expensive’ as their primary concern about using big data software.

And there are particular cultural barriers to government in using social media, with the informal style and blurring of organizational and public-private boundaries which they engender. And gathering data from social media presents legal challenges, as companies like Facebook place barriers to the crawling and scraping of their sites.

More importantly, big data presents new moral and ethical dilemmas to policy makers. For example, it is possible to carry out probabilistic policy-making, where policy is made on the basis of what a small segment of individuals will probably do, rather than what they have done. Predictive policing has had some success particularly in California, where robberies declined by a quarter after use of the ‘PredPol’ policing software, but can lead to a “feedback loop of injustice” as one privacy advocacy group put it, as policing resources are targeted at increasingly small socio-economic groups. What responsibility does the state have to devote disproportionately more – or less – resources to the education of those school pupils who are, probabilistically, almost certain to drop out of secondary education? Such challenges are greater for governments than corporations. We (reasonably) happily trade privacy to allow Tesco and Facebook to use our data on the basis it will improve their products, but if government tries to use social media to understand citizens and improve its own performance, will it be accused of spying on its citizenry in order to quash potential resistance.

And of course there is an image problem for government in this field – discussion of big data and government puts the word ‘big’ dangerously close to the word ‘government’ and that is an unpopular combination. Policy-makers’ responses to Snowden’s revelations of the US Tempora and UK Prism programmes have done nothing to improve this image, with their focus on the use of big data to track down individuals and groups involved in acts of terrorism and criminality – rather than on anything to make policy-making better, or to use the wealth of information that these programmes collect for the public good.

However, policy-makers have no choice but to tackle some of these challenges. Big data has been the hottest trend in the corporate world for some years now, and commentators from IBM to the New Yorker are starting to talk about the big data ‘backlash’. Government has been far slower to recognize the advantages for policy-making and services. But in some policy sectors, big data poses very fundamental questions which call for an answer; how should governments conduct a census, for or produce labour statistics, for example, in the age of big data? Policy-makers will need to move fast to beat the backlash.


This post is based on discussions at the workshop on Responsible Research Agendas for Public Policy in the era of Big Data workshop.

Helen Margetts is the Director of the OII, and Professor of Society and the Internet. She is a political scientist specialising in digital era governance and politics.

]]>
https://ensr.oii.ox.ac.uk/promises-threats-big-data-for-public-policy-making/feed/ 1
Responsible research agendas for public policy in the era of big data https://ensr.oii.ox.ac.uk/responsible-research-agendas-for-public-policy-in-the-era-of-big-data/ Thu, 19 Sep 2013 15:17:01 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2164 Last week the OII went to Harvard. Against the backdrop of a gathering storm of interest around the potential of computational social science to contribute to the public good, we sought to bring together leading social science academics with senior government agency staff to discuss its public policy potential. Supported by the OII-edited journal Policy and Internet and its owners, the Washington-based Policy Studies Organization (PSO), this one-day workshop facilitated a thought-provoking conversation between leading big data researchers such as David Lazer, Brooke Foucault-Welles and Sandra Gonzalez-Bailon, e-government experts such as Cary Coglianese, Helen Margetts and Jane Fountain, and senior agency staff from US federal bureaus including Labor Statistics, Census, and the Office for the Management of the Budget.

It’s often difficult to appreciate the impact of research beyond the ivory tower, but what this productive workshop demonstrated is that policy-makers and academics share many similar hopes and challenges in relation to the exploitation of ‘big data’. Our motivations and approaches may differ, but insofar as the youth of the ‘big data’ concept explains the lack of common language and understanding, there is value in mutual exploration of the issues. Although it’s impossible to do justice to the richness of the day’s interactions, some of the most pertinent and interesting conversations arose around the following four issues.

Managing a diversity of data sources. In a world where our capacity to ask important questions often exceeds the availability of data to answer them, many participants spoke of the difficulties of managing a diversity of data sources. For agency staff this issue comes into sharp focus when available administrative data that is supposed to inform policy formulation is either incomplete or inadequate. Consider, for example, the challenge of regulating an economy in a situation of fundamental data asymmetry, where private sector institutions track, record and analyse every transaction, whilst the state only has access to far more basic performance metrics and accounts. Such asymmetric data practices also affect academic research, where once again private sector tech companies such as Google, Facebook and Twitter often offer access only to portions of their data. In both cases participants gave examples of creative solutions using merged or blended data sources, which raise significant methodological and also ethical difficulties which merit further attention. The Berkman Center’s Rob Faris also noted the challenges of combining ‘intentional’ and ‘found’ data, where the former allow far greater certainty about the circumstances of their collection.

Data dictating the questions. If participants expressed the need to expend more effort on getting the most out of available but diverse data sources, several also canvassed against the dangers of letting data availability dictate the questions that could be asked. As we’ve experienced at the OII, for example, the availability of Wikipedia or Twitter data means that questions of unequal digital access (to political resources, knowledge production etc.) can often be addressed through the lens of these applications or platforms. But these data can provide only a snapshot, and large questions of great social or political importance may not easily be answered through such proxy measurements. Similarly, big data may be very helpful in providing insights into policy-relevant patterns or correlations, such as identifying early indicators of seasonal diseases or neighbourhood decline, but seem ill-suited to answer difficult questions regarding say, the efficacy of small-scale family interventions. Just because the latter are harder to answer using currently vogue-ish tools doesn’t mean we should cease to ask these questions.

Ethics. Concerns about privacy are frequently raised as a significant limitation of the usefulness of big data. Given that with two or more data sets even supposedly anonymous data subjects may be identified, the general consensus seems to be that ‘privacy is dead’. Whilst all participants recognised the importance of public debate around this issue, several academics and policy-makers expressed a desire to get beyond this discussion to a more nuanced consideration of appropriate ethical standards. Accountability and transparency are often held up as more realistic means of protecting citizens’ interests, but one workshop participant also suggested it would be helpful to encourage more public debate about acceptable and unacceptable uses of our data, to determine whether some uses might simply be deemed ‘off-limits’, whilst other uses could be accepted as offering few risks.

Accountability. Following on from this debate about the ethical limits of our uses of big data, discussion exposed the starkly differing standards to which government and academics (to say nothing of industry) are held accountable. As agency officials noted on several occasions it matters less what they actually do with citizens’ data, than what they are perceived to do with it, or even what it’s feared they might do. One of the greatest hurdles to be overcome here concerns the fundamental complexity of big data research, and the sheer difficulty of communicating to the public how it informs policy decisions. Quite apart from the opacity of the algorithms underlying big data analysis, the explicit focus on correlation rather than causation or explanation presents a new challenge for the justification of policy decisions, and consequently, for public acceptance of their legitimacy. As Greg Elin of Gitmachines emphasised, policy decisions are still the result of explicitly normative political discussion, but the justifiability of such decisions may be rendered more difficult given the nature of the evidence employed.

We could not resolve all these issues over the course of the day, but they served as pivot points for honest and productive discussion amongst the group. If nothing else, they demonstrate the value of interaction between academics and policy-makers in a research field where the stakes are set very high. We plan to reconvene in Washington in the spring.

*We are very grateful to the Policy Studies Organization (PSO) and the American Public University for their generous support of this workshop. The workshop “Responsible Research Agendas for Public Policy in the Era of Big Data” was held at the Harvard Faculty Club on 13 September 2013.

Also read: Big Data and Public Policy Workshop by Eric Meyer, workshop attendee and PI of the OII project Accessing and Using Big Data to Advance Social Science Knowledge.


Victoria Nash received her M.Phil in Politics from Magdalen College in 1996, after completing a First Class BA (Hons) Degree in Politics, Philosophy and Economics, before going on to complete a D.Phil in Politics from Nuffield College, Oxford University in 1999. She was a Research Fellow at the Institute of Public Policy Research prior to joining the OII in 2002. As Research and Policy Fellow at the OII, her work seeks to connect OII research with policy and practice, identifying and communicating the broader implications of OII’s research into Internet and technology use.

]]>
Uncovering the patterns and practice of censorship in Chinese news sites https://ensr.oii.ox.ac.uk/uncovering-the-patterns-and-practice-of-censorship-in-chinese-news-sites/ Thu, 08 Aug 2013 08:17:55 +0000 http://blogs.oii.ox.ac.uk/policy/?p=1992 Ed: How much work has been done on censorship of online news in China? What are the methodological challenges and important questions associated with this line of enquiry?

Sonya: Recent research is paying much attention to social media and aiming to quantify their censorial practices and to discern common patterns in them. Among these empirical studies, Bamman et al.’s (2012) work claimed to be “the first large-scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter. On an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analyzed the deleted messages. Most studies on news censorship, however, are devoted to narratives of special cases, such as the closure of Freeing Point, an outspoken news and opinion journal, and the blocking of the New York Times after it disclosed the wealth possessed by the family of Chinese former premier Wen Jiabao.

The shortage of news censorship research could be attributed to several methodological challenges. First, it is tricky to detect censorship to begin with, given the word ‘censorship’ is one of the first to be censored. Also, news websites will not simply let their readers hit a glaring “404 page not found”. Instead, they will use a “soft 404”, which returns a “success” code for a request of a deleted web page and takes readers to a (different) existing web page. While humans may be able to detect these soft 404s, it will be harder for computer programs (eg run by researchers) to do so. Moreover, because different websites employ varying soft 404 techniques, much labor is required to survey them and to incorporate the acquired knowledge into a generic monitoring tool.

Second, high computing power and bandwidth are required to handle the large amount of news publications and the slow network access to Chinese websites. For instance, NetEase alone publishes 8,000 – 10,000 news articles every day. Meanwhile, the Internet connection between the Chinese cyberspace and the outer world is fairly slow and it takes more than a second to check one link because the Great Firewall checks both incoming and outgoing Internet traffic. These two factors translate to 2-3 hours for a single program to check one day’s news publications of NetEase alone. If we fire up too many programs to accelerate the progress, the database system and/or the network connection may be challenged. In my case, even though I am using high performance computers at Michigan State University to conduct this research, they are overwhelmed every now and then.

Despite all the difficulties, I believe it is of great importance to reveal censored news stories to the public, especially to the audience inside China who do not enjoy a free flow of information. Censored news is a special type of information, as it is too inconvenient to exist in authorities’ eyes and it is deemed important to citizens’ everyday lives. For example, the outbreak of SARS had been censored from Chinese media presumably to avoid spoiling the harmonious atmosphere created for the 16th National Congress of the Communist Party. This allowed the virus to develop into a worldwide epidemic. Like SARS, a variety of censored issues are not only inconvenient but also crucial, because the authorities would not otherwise allocate substantial resources to monitor or eliminate them if they were merely trivial. Therefore, after censored news is detected, it is vital to seek effective and efficient channels to disclose it to the public so as to counterbalance potential damage that censorship may entail.

Ed: You found that party organs, ie news organizations tightly affiliated with the Chinese Communist Party, published a considerable amount of deleted news. Was this surprising?

Sonya: Yes, I was surprised when looking at the results the first time. To be exact, our finding is that commercial media experience a higher deletion rate, but party organs contribute the most deleted news by sheer volume, reflecting the fact that party organs possess more resources allocated by the central and local governments and therefore have the capacity to produce more news. Consequently, party organs have a higher chance of publishing controversial information that may be deleted in the future, especially when a news story becomes sensitive for some reason that is hard to foresee. For example, investigations of some government officials started when netizens recognized them in the news with different luxury watches and other expensive accessories. As such, even though party organs are obliged to write odes to the party, they may eventually backfire on the cadres if the beautiful words are discovered to be too far from reality.

Ed: How sensitive are citizens to the fact that some topics are actively avoided in the news media? And how easy is it for people to keep abreast of these topics (eg the “three Ts” of Tibet, Taiwan, and Tiananmen) from other information sources?

Sonya: This question highlights the distinction between pre-censorship and post-censorship. Our study looked at post-censorship, ie information that is published but subsequently deleted. By contrast, the topics that are “actively avoided” fall under the category of pre-censorship. I am fairly convinced that the current pre- and post-censorship practice is effective in terms of keeping the public from learning inconvenient facts and from mobilizing for collective action. If certain topics are consistently wiped from the mass media, how will citizens ever get to know about them?

The Tiananmen Square protest, for instance, has never been covered by Chinese mass media, leaving an entire generation growing up since 1989 that is ignorant of this historical event. As such, if younger Chinese citizens have never heard of the Tiananmen Square protest, how could they possibly start an inquiry into this incident? Or, if they have heard of it and attempt to learn about it from the Internet, what they will soon realize is that domestic search engines, social media, and news media all fail their requests and foreign ones are blocked. Certainly, they could use circumvention tools to bypass the Great Firewall, but the sad truth is that probably under 1% of them have ever made such an effort, according to the Harvard Berkman Center’s report in 2011.

Ed: Is censorship of domestic news (such as food scares) more geared towards “avoiding panics and maintaining social order”, or just avoiding political embarrassment? For example, do you see censorship of environmental issues and (avoidable) disasters?

Sonya: The government certainly tries to avoid political embarrassment in the case of food scares by manipulating news coverage, but it is also their priority to maintain social order or so-called “social harmony”. Exactly for this reason, Zhao Lianhai, the most outspoken parent of a toxic milk powder victim was charged with “inciting social disorder” and sentenced to two and a half years in prison. Frustrated by Chinese milk powder, Chinese tourists are aggressively stocking up on milk powder from elsewhere, such as in Hong Kong and New Zealand, causing panics over milk powder shortages in those places.

After the earthquake in Sichuan, another group of grieving parents were arrested on similar charges when they questioned why their children were buried under crumbled schools whereas older buildings remained standing. The high death toll of this earthquake was among the avoidable disasters that the government attempts to mask and force the public to forget. Environmental issues, along with land acquisition, social unrest, and labor exploitation, are other frequently censored topics in the name of “stability maintenance”.

Ed: You plotted a map to show the geographic distribution of news deletion: what does the pattern show?

Sonya: We see an apparent geographic pattern in news deletion, with neighboring countries being more likely to be deleted than distant ones. Border disputes between China and its neighbors may be one cause; for example with Japan over the Diaoyu-Senkaku Islands, with the Philippines over the Huangyan Island-Scarborough Shoal, and with India over South Tibet. Another reason may be a concern over maintaining allies. Burma had the highest deletion rates among all the countries, with the deleted news mostly covering its curb on censorship. Watching this shift, China might worry that media reform in Burma could lead to copycat attempts inside China.

On the other hand, China has given Burma diplomatic cover, considering it as the “second coast” to the Indian Ocean and importing its natural resources (Howe & Knight, 2012). For these reasons, China may be compelled to censor Burma more than other countries, even though they don’t share a border. Nonetheless, although oceans apart, the US topped the list by sheer number of news deletions, reflecting the bittersweet relation between the two nations.

Ed: What do you think explains the much higher levels of censorship reported by others for social media than for news media? How does geographic distribution of deletion differ between the two?

Sonya: The deletion rates of online news are apparently far lower than those of Sina Weibo posts. The overall deletion rates on NetEase and Sina Beijing were 0.05% and 0.17%, compared to 16.25% on the social media platform (Bamman et al., 2012). Several reasons may help explain this gap. First, social media confronts enduring spam that has to be cleaned up constantly, whereas it is not a problem at all for professional news aggregators. Second, self-censorship practiced by news media plays an important role, because Chinese journalists are more obliged and prepared to self-censor sensitive information, compared to ordinary Chinese citizens. Subsequently, news media rarely mention “crown prince party” or “democracy movement”, which were among the most frequently deleted terms on Sina Weibo.

Geographically, the deletion rates across China have distinct patterns on news media and social media. Regarding Sina Weibo, deletion rates increase when the messages are published near the fringe or in the west where the economy is less developed. Regarding news websites, the deletion rates rise as they approach the center and east, where the economy is better developed. In addition, the provinces surrounding Beijing also have more news deleted, meaning that political concerns are a driving force behind content control.

Ed: Can you tell if the censorship process mostly relies on searching for sensitive keywords, or on more semantic analysis of the actual content? ie can you (or the censors..) distinguish sensitive “opinions” as well as sensitive topics?

Sonya: First, too sensitive topics will never survive pre-censorship or be published on news websites, such as the Tiananmen Square protest, although they may sneak in on social media with deliberate typos or other circumvention techniques. However, it is clear that censors use keywords to locate articles on sensitive topics. For instance, after the Fukushima earthquake in 2011, rumors spread in the Chinese Cyberspace that radiation was rising from the Japanese nuclear plant and iodine would help protect against its harmful effects; this was followed by panic-buying of iodized salt. During this period, “nuclear defense”, “iodized salt” and “radioactive iodine”–among other generally neutral terms–became politically charged overnight, and were censored in the Chinese web sphere. The taboo list of post-censorship keywords evolves continuously to handle breaking news. Beyond keywords, party organs and other online media are trying to automate sentiment analysis and discern more subtle context. People’s Daily, for instance, has been working with elite Chinese universities in this field and already developed a generic product for other institutes to monitor “public sentiment”.

Another way to sort out sensitive information is to keep an eye on most popular stories, because a popular story would represent a greater “threat” to the existing political and social order. In our study, about 47% of the deleted stories were listed as top 100 mostly read/discussed at some point. This indicates that the more readership a story gains, the more attention it draws from censors.

Although news websites self-censor (therefore experiencing under 1% post-censorship), they are also required to monitor and “clean” comments following each news article. According to my very conservative estimate–if a censor processes 100 comments per minute and works eight hours per day–reviewing comments on Sina Beijing from 11-16 September 2012, would have required 336 censors working full time. In fact, Charles Cao, CEO of Sina, mentioned to Forbes that at least 100 censors were “devoted to monitoring content 24 hours a day”. As new sensitive issues emerge and new circumvention techniques are developed continuously, it is an ongoing battle between the collective intelligence of Chinese netizens and the mechanical work conducted (and artificial intelligence implemented) by a small group of censors.

Ed: It must be a cause of considerable anxiety for journalists and editors to have their material removed. Does censorship lead to sanctions? Or is the censorship more of an annoyance that must be negotiated?

Sonya: Censorship does indeed lead to sanctions. However, I don’t think “anxiety” would be the right word to describe their feelings, because if they are really anxious they could always choose self-censorship and avoid embarrassing the authorities. Considering it is fairly easy to predict whether a news report will please or irritate officials, I believe what fulfills the whistleblowers when they disclose inconvenient facts is a strong sense of justice and tremendous audacity. Moreover, I could barely discern any “negotiation” in the process of censorship. Negotiation is at least a two-way communication, whereas censorship follows continual orders sent from the authorities to the mass media, and similarly propaganda is a one-way communication from the authorities to the masses via the media. As such, it is common to see disobedient journalists threatened or punished for “defying” censorial orders.

Southern Metropolis Daily is one of China’s most aggressive and punished newspapers. In 2003, the newspaper broke the epidemic of SARS that local officials had wished to hide from the public. Soon after this report, it covered a university graduate beaten to death in policy custody because he carried no proper residency papers. Both cases received enormous attention from Chinese authorities and the international community, seriously embarrassing local officials. It is alleged and widely believed that some local officials demanded harsh penalties for the Daily; the director and the deputy editor were sentenced to 11 and 12 years in jail for “taking briberies” and “misappropriating state-owned assets” and the chief editor was dismissed.

Not only professional journalists but also (broadly defined) citizen journalists could face similar penalties. For instance, Xu Zhiyong, a lawyer who defended journalists on trial, and Ai Weiwei, an artist who tried to investigate collapsed schools after the Sichuan earthquake, have experienced similar penalties: fines for tax evasion, physical attacks, house arrest, and secret detainment; exactly the same censorship tactics that states carried out before the advent of the Internet, as described in Ilan Peleg’s (1993) book Patterns of Censorship Around the World.

Ed: What do you think explains the lack of censorship in the overseas portal? (Could there be a certain value for the government in having some news items accessible to an external audience, but unavailable to the internal one?)

Sonya: It is more costly to control content by searching for and deleting individual news stories than simply blocking a whole website. For this reason, when a website outside the Great Firewall carries embarrassing content to the Chinese government, Chinese censors will simply block the whole website rather than request deletions. Overseas branches of Chinese media may comply but foreign media may simply drop such a deletion request.

Given online users’ behavior, it is effective and efficient to strictly control domestic content. In general, there are two types of Chinese online users, those who only visit Chinese websites operating inside China and those who also consume content from outside the country. Regarding this second type, it is really hard to prescribe what they do and don’t read, because they may be well equipped with circumvention tools and often obtain access to Chinese media published in Hong Kong and Taiwan but blocked in China. In addition, some Western media, such as the BBC, the New York Times, and Deutsche Welle, make media consumption easy for Chinese readers by publishing in Chinese. Of course, this type of Chinese user may be well educated and able to read English and other foreign languages directly. Facing these people, Chinese authorities would see their efforts in vain if they tried to censor overseas branches of Chinese media, because, outside the Great Firewall, there are too many sources for information that lie beyond the reach of Chinese censors.

Chinese authorities are in fact strategically wise in putting their efforts into controlling domestic online media, because this first type of Chinese user accounts for 99.9% of the whole online population, according to Google’s 2010 estimate. In his 2013 book Rewire, Ethan Zuckerman summarizes this phenomenon: “none of the top ten nations [in terms of online population] looks at more than 7 percent international content in its fifty most popular news sites” (p. 56). Since the majority of the Chinese populace perceives the domestic Internet as “the entire cyberspace”, manipulating the content published inside the Great Firewall means that (according to Chinese censors) many of the time bombs will have been defused.


Read the full paper: Sonya Yan Song, Fei Shen, Mike Z. Yao, Steven S. Wildman (2013) Unmasking News in Cyberspace: Examining Censorship Patterns of News Portal Sites in China. Presented at “China and the New Internet World”, International Communication Association (ICA) Preconference, Oxford Internet Institute, University of Oxford, June 2013.

Sonya Y. Song led this study as a Google Policy Fellow in 2012. Currently, she is a Knight-Mozilla OpenNews Fellow and a Ph.D. candidate in media and information studies at Michigan State University. Sonya holds a bachelor’s and master’s degree in computer science from Tsinghua University in Beijing and master of philosophy in journalism from the University of Hong Kong. She is also an avid photographer, a devotee of literature, and a film buff.

Sonya Yan Song was talking to blog editor David Sutcliffe.

]]>
Staying free in a world of persuasive technologies https://ensr.oii.ox.ac.uk/staying-free-in-a-world-of-persuasive-technologies/ Mon, 29 Jul 2013 10:11:17 +0000 http://blogs.oii.ox.ac.uk/policy/?p=1541 iPhone apps
We’re living through a crisis of distraction. Image: “What’s on my iPhone” by Erik Mallinson

Ed: What persuasive technologies might we routinely meet online? And how are they designed to guide us towards certain decisions?

There’s a broad spectrum, from the very simple to the very complex. A simple example would be something like Amazon’s “one-click” purchase feature, which compresses the entire checkout process down to a split-second decision. This uses a persuasive technique known as “reduction” to minimise the perceived cost to a user of going through with a purchase, making it more likely that they’ll transact. At the more complex end of the spectrum, you have the whole set of systems and subsystems that is online advertising. As it becomes easier to measure people’s behaviour over time and across media, advertisers are increasingly able to customise messages to potential customers and guide them down the path toward a purchase.

It isn’t just commerce, though: mobile behavior-change apps have seen really vibrant growth in the past couple years. In particular, health and fitness: products like Nike+, Map My Run, and Fitbit let you monitor your exercise, share your performance with friends, use social motivation to help you define and reach your fitness goals, and so on. One interesting example I came across recently is called “Zombies, Run!” which motivates by fright, spawning virtual zombies to chase you down the street while you’re on your run.

As one final example, If you’ve ever tried to deactivate your Facebook account, you’ve probably seen a good example of social persuasive technology: the screen that comes up saying, “If you leave Facebook, these people will miss you” and then shows you pictures of your friends. Broadly speaking, most of the online services we think we’re using for “free” — that is, the ones we’re paying for with the currency of our attention — have some sort of persuasive design goal. And this can be particularly apparent when people are entering or exiting the system.

Ed: Advertising has been around for centuries, so we might assume that we have become clever about recognizing and negotiating it — what is it about these online persuasive technologies that poses new ethical questions or concerns?

The ethical questions themselves aren’t new, but the environment in which we’re asking them makes them much more urgent. There are several important trends here. For one, the Internet is becoming part of the background of human experience: devices are shrinking, proliferating, and becoming more persistent companions through life. In tandem with this, rapid advances in measurement and analytics are enabling us to more quickly optimise technologies to reach greater levels of persuasiveness. That persuasiveness is further augmented by applying knowledge of our non-rational psychological biases to technology design, which we are doing much more quickly than in the design of slower-moving systems such as law or ethics. Finally, the explosion of media and information has made it harder for people to be intentional or reflective about their goals and priorities in life. We’re living through a crisis of distraction. The convergence of all these trends suggests that we could increasingly live our lives in environments of high persuasive power.

To me, the biggest ethical questions are those that concern individual freedom and autonomy. When, exactly, does a “nudge” become a “push”? When we call these types of technology “persuasive,” we’re implying that they shouldn’t cross the line into being coercive or manipulative. But it’s hard to say where that line is, especially when it comes to persuasion that plays on our non-rational biases and impulses. How persuasive is too persuasive? Again, this isn’t a new ethical question by any means, but it is more urgent than ever.

These technologies also remind us that the ethics of attention is just as important as the ethics of information. Many important conversations are taking place across society that deal with the tracking and measurement of user behaviour. But that information is valuable largely because it can be used to inform some sort of action, which is often persuasive in nature. But we don’t talk nearly as much about the ethics of the persuasive act as we do about the ethics of the data. If we did, we might decide, for instance, that some companies have a moral obligation to collect more of a certain type of user data because it’s the only way they could know if they were persuading a person to do something that was contrary to their well-being, values, or goals. Knowing a person better can be the basis not only for acting more wrongly toward them, but also more rightly.

As users, then, persuasive technologies require us to be more intentional about how we define and express our own goals. The more persuasion we encounter, the clearer we need to be about what it is we actually want. If you ask most people what their goals are, they’ll say things like “spending more time with family,” “being healthier,” “learning piano,” etc. But we don’t all accomplish the goals we have — we get distracted. The risk of persuasive technology is that we’ll have more temptations, more distractions. But its promise is that we can use it to motivate ourselves toward the things we find fulfilling. So I think what’s needed is more intentional and habitual reflection about what our own goals actually are. To me, the ultimate question in all this is how we can shape technology to support human goals, and not the other way around.

Ed: What if a persuasive design or technology is simply making it easier to do something we already want to do: isn’t this just ‘user centered design’? (ie a good thing?)

Yes, persuasive design can certainly help motivate a user toward their own goals. In these cases it generally resonates well with user-centered design. The tension really arises when the design leads users toward goals they don’t already have. User-centered design doesn’t really have a good way to address persuasive situations, where the goals of the user and the designer diverge.

To reconcile this tension, I think we’ll probably need to get much better at measuring people’s intentions and goals than we are now. Longer-term, we’ll probably need to rethink notions like “design” altogether. When it comes to online services, it’s already hard to talk about “products” and “users” as though they were distinct entities, and I think this will only get harder as we become increasingly enmeshed in an ongoing co-evolution.

Governments and corporations are increasingly interested in “data-driven” decision-making: isn’t that a good thing? Particularly if the technologies now exist to collect ‘big’ data about our online actions (if not intentions)?

I don’t think data ever really drives decisions. It can definitely provide an evidentiary basis, but any data is ultimately still defined and shaped by human goals and priorities. We too often forget that there’s no such thing as “pure” or “raw” data — that any measurement reflects, before anything else, evidence of attention.

That being said, data-based decisions are certainly preferable to arbitrary ones, provided that you’re giving attention to the right things. But data can’t tell you what those right things are. It can’t tell you what to care about. This point seems to be getting lost in a lot of the fervour about “big data,” which as far as I can tell is a way of marketing analytics and relational databases to people who are not familiar with them.

The psychology of that term, “big data,” is actually really interesting. On one hand, there’s a playful simplicity to the word “big” that suggests a kind of childlike awe where words fail. “How big is the universe? It’s really, really big.” It’s the unknown unknowns at scale, the sublime. On the other hand, there’s a physicality to the phrase that suggests an impulse to corral all our data into one place: to contain it, mould it, master it. Really, the term isn’t about data abundance at all – it reflects our grappling with a scarcity of attention.

The philosopher Luciano Floridi likens the “big data” question to being at a buffet where you can eat anything, but not everything. The challenge comes in the choosing. So how do you choose? Whether you’re a government, a corporation, or an individual, it’s your ultimate aims and values — your ethical priorities — that should ultimately guide your choosiness. In other words, the trick is to make sure you’re measuring what you value, rather than just valuing what you already measure.


James Williams is a doctoral student at the Oxford Internet Institute. He studies the ethical design of persuasive technology. His research explores the complex boundary between persuasive power and human freedom in environments of high technological persuasion.

James Williams was talking to blog editor Thain Simon.

]]>
Did Libyan crisis mapping create usable military intelligence? https://ensr.oii.ox.ac.uk/did-libyan-crisis-mapping-create-usable-military-intelligence/ Thu, 14 Mar 2013 10:45:22 +0000 http://blogs.oii.ox.ac.uk/policy/?p=817 The Middle East has recently witnessed a series of popular uprisings against autocratic rulers. In mid-January 2011, Tunisian President Zine El Abidine Ben Ali fled his country, and just four weeks later, protesters overthrew the regime of Egyptian President Hosni Mubarak. Yemen’s government was also overthrown in 2011, and Morocco, Jordan, and Oman saw significant governmental reforms leading, if only modestly, toward the implementation of additional civil liberties.

Protesters in Libya called for their own ‘day of rage’ on February 17, 2011, marked by violent protests in several major cities, including the capitol Tripoli. As they transformed from ‘protestors’ to ‘Opposition forces’ they began pushing information onto Twitter, Facebook, and YouTube, reporting their firsthand experiences of what had turned into a civil war virtually overnight. The evolving humanitarian crisis prompted the United Nations to request the creation of the Libya Crisis Map, which was made public on March 6, 2011. Other, more focused crisis maps followed, and were widely distributed on Twitter.

While the map was initially populated with humanitarian information pulled from the media and online social networks, as the imposition of an internationally enforced No Fly Zone (NFZ) over Libya became imminent, information began to appear on it that appeared to be of a tactical military nature. While many people continued to contribute conventional humanitarian information to the map, the sudden shift toward information that could aid international military intervention was unmistakable.

How useful was this information, though? Agencies in the U.S. Intelligence Community convert raw data into useable information (incorporated into finished intelligence) by utilizing some form of the Intelligence Process. As outlined in the U.S. military’s joint intelligence manual, this consists of six interrelated steps all centered on a specific mission. It is interesting that many Twitter users, though perhaps unaware of the intelligence process, replicated each step during the Libyan civil war; producing finished intelligence adequate for consumption by NATO commanders and rebel leadership.

It was clear from the beginning of the Libyan civil war that very few people knew exactly what was happening on the ground. Even NATO, according to one of the organization’s spokesmen, lacked the ground-level informants necessary to get a full picture of the situation in Libya. There is no public information about the extent to which military commanders used information from crisis maps during the Libyan civil war. According to one NATO official, “Any military campaign relies on something that we call ‘fused information’. So we will take information from every source we can… We’ll get information from open source on the internet, we’ll get Twitter, you name any source of media and our fusion centre will deliver all of that into useable intelligence.”

The data in these crisis maps came from a variety of sources, including journalists, official press releases, and civilians on the ground who updated blogs and/or maintaining telephone contact. The @feb17voices Twitter feed (translated into English and used to support the creation of The Guardian’s and the UN’s Libya Crisis Map) included accounts of live phone calls from people on the ground in areas where the Internet was blocked, and where there was little or no media coverage. Twitter users began compiling data and information; they tweeted and retweeted data they collected, information they filtered and processed, and their own requests for specific data and clarifications.

Information from various Twitter feeds was then published in detailed maps of major events that contained information pertinent to military and humanitarian operations. For example, as fighting intensified, @LibyaMap’s updates began to provide a general picture of the battlefield, including specific, sourced intelligence about the progress of fighting, humanitarian and supply needs, and the success of some NATO missions. Although it did not explicitly state its purpose as spreading mission-relevant intelligence, the nature of the information renders alternative motivations highly unlikely.

Interestingly, the Twitter users featured in a June 2011 article by the Guardian had already explicitly expressed their intention of affecting military outcomes in Libya by providing NATO forces with specific geographical coordinates to target Qadhafi regime forces. We could speculate at this point about the extent to which the Intelligence Community might have guided Twitter users to participate in the intelligence process; while NATO and the Libyan Opposition issued no explicit intelligence requirements to the public, they tweeted stories about social network users trying to help NATO, likely leading their online supporters to draw their own conclusions.

It appears from similar maps created during the ongoing uprisings in Syria that the creation of finished intelligence products by crisis mappers may become a regular occurrence. Future study should focus on determining the motivations of mappers for collecting, processing, and distributing intelligence, particularly as a better understanding of their motivations could inform research on the ethics of crisis mapping. It is reasonable to believe that some (or possibly many) crisis mappers would be averse to their efforts being used by military commanders to target “enemy” forces and infrastructure.

Indeed, some are already questioning the direction of crisis mapping in the absence of professional oversight (Global Brief 2011): “[If] crisis mappers do not develop a set of best practices and shared ethical standards, they will not only lose the trust of the populations that they seek to serve and the policymakers that they seek to influence, but (…) they could unwittingly increase the number of civilians being hurt, arrested or even killed without knowing that they are in fact doing so.”


Read the full paper: Stottlemyre, S., and Stottlemyre, S. (2012) Crisis Mapping Intelligence Information During the Libyan Civil War: An Exploratory Case Study. Policy and Internet 4 (3-4).

]]>