Wikipedia – The Policy and Internet Blog https://ensr.oii.ox.ac.uk Understanding public policy online Mon, 07 Dec 2020 14:25:41 +0000 en-GB hourly 1 Our knowledge of how automated agents interact is rather poor (and that could be a problem) https://ensr.oii.ox.ac.uk/our-knowledge-of-how-automated-agents-interact-is-rather-poor-and-that-could-be-a-problem/ Wed, 14 Jun 2017 15:12:05 +0000 http://blogs.oii.ox.ac.uk/policy/?p=4191 Recent years have seen a huge increase in the number of bots online — including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overally, and more than 50% in certain language editions.)

While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful.

In their PLOS ONE article “Even good bots fight: The case of Wikipedia“, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyze the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia — identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc. — the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years.

They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash..).

We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings:

Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading): i.e. is this just (another) example of us not always being able to anticipate how code interacts in the wild?

Taha: There are similarities and differences. The most notable difference is that here bots are not competing. They all work based on same rules and more importantly to achieve the same goal that is to increase the quality of the encyclopedia. Considering these features, the rather antagonistic interactions between the bots come as a surprise.

Ed.: Wikipedia have said that they know about it, and that it’s a minor problem: but I suppose Wikipedia presents a nice, open, benevolent system to make a start on examining and understanding bot interactions. What other bot-systems are you aware of, or that you could have looked at?

Taha: In terms of content generating bots, Twitter bots have turned out to be very important in terms of online propaganda. The crawlers bots that collect information from social media or the web (such as personal information or email addresses) are also being heavily deployed. In fact we have come up with a first typology of the Internet bots based on their type of action and their intentions (benevolent vs malevolent), that is presented in the article.

Ed.: You’ve also done work on human collaborations (e.g. in the citizen science projects of the Zooniverse) — is there any work comparing human collaborations with bot collaborations — or even examining human-bot collaborations and interactions?

Taha: In the present work we do compare bot-bot interactions with human-human interactions to observe similarities and differences. The most striking difference is in the dynamics of negative interactions. While human conflicts heat up very quickly and then disappear after a while, bots undoing each others’ contribution comes as a steady flow which might persist over years. In the HUMANE project, we discuss the co-existence of humans and machines in the digital world from a theoretical point of view and there we discuss such ecosystems in details.

Ed.: Humans obviously interact badly, fairly often (despite being a social species) .. why should we be particularly worried about how bots interact with each other, given humans seem to expect and cope with social inefficiency, annoyances, conflict and break-down? Isn’t this just more of the same?

Luciano: The fact that bots can be as bad as humans is far from reassuring. The fact that this happens even when they are programmed to collaborate is more disconcerting than what happens among humans when these compete, or fight each other. Here are very elementary mechanisms that through simple interactions generate messy and conflictual outcomes. One may hope this is not evidence of what may happen when more complex systems and interactions are in question. The lesson I learnt from all this is that without rules or some kind of normative framework that promote collaboration, not even good mechanisms ensure a good outcome.

Read the full article: Tsvetkova M, Garcia-Gavilanes R, Floridi, L, Yasseri T (2017) Even good bots fight: The case of Wikipedia. PLoS ONE 12(2): e0171774. doi:10.1371/journal.pone.0171774


Taha Yasseri and Luciano Floridi were talking to blog editor David Sutcliffe.

]]>
Can we predict electoral outcomes from Wikipedia traffic? https://ensr.oii.ox.ac.uk/can-we-predict-electoral-outcomes-from-wikipedia-traffic/ Tue, 06 Dec 2016 15:34:31 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3881 As digital technologies become increasingly integrated into the fabric of social life their ability to generate large amounts of information about the opinions and activities of the population increases. The opportunities in this area are enormous: predictions based on socially generated data are much cheaper than conventional opinion polling, offer the potential to avoid classic biases inherent in asking people to report their opinions and behaviour, and can deliver results much quicker and be updated more rapidly.

In their article published in EPJ Data Science, Taha Yasseri and Jonathan Bright develop a theoretically informed prediction of election results from socially generated data combined with an understanding of the social processes through which the data are generated. They can thereby explore the predictive power of socially generated data while enhancing theory about the relationship between socially generated data and real world outcomes. Their particular focus is on the readership statistics of politically relevant Wikipedia articles (such as those of individual political parties) in the time period just before an election.

By applying these methods to a variety of different European countries in the context of the 2009 and 2014 European Parliament elections they firstly show that the relative change in number of page views to the general Wikipedia page on the election can offer a reasonable estimate of the relative change in election turnout at the country level. This supports the idea that increases in online information seeking at election time are driven by voters who are considering voting.

Second, they show that a theoretically informed model based on previous national results, Wikipedia page views, news media mentions, and basic information about the political party in question can offer a good prediction of the overall vote share of the party in question. Third, they present a model for predicting change in vote share (i.e., voters swinging towards and away from a party), showing that Wikipedia page-view data provide an important increase in predictive power in this context.

This relationship is exaggerated in the case of newer parties — consistent with the idea that voters don’t seek information uniformly about all parties at election time. Rather, they behave like ‘cognitive misers’, being more likely to seek information on new political parties with which they do not have previous experience and being more likely to seek information only when they are actually changing the way they vote.

In contrast, there was no evidence of a ‘media effect’: there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared to be biased towards different things: with the news favouring incumbent parties, and Wikipedia favouring new ones.

Read the full article: Yasseri, T. and Bright, J. (2016) Wikipedia traffic data and electoral prediction: towards theoretically informed models. EPJ Data Science. 5 (1).

We caught up with the authors to explore the implications of the work.

Ed: Wikipedia represents a vast amount of not just content, but also user behaviour data. How did you access the page view stats — but also: is anyone building dynamic visualisations of Wikipedia data in real time?

Taha and Jonathan: Wikipedia makes its page view data available for free (in the same way as it makes all of its information available!). You can find the data here, along with some visualisations

Ed: Why did you use Wikipedia data to examine election prediction rather than (the I suppose the more fashionable) Twitter? How do they compare as data sources?

Taha and Jonathan: One of the big problems with using Twitter to predict things like elections is that contributing on social media is a very public thing and people are quite conscious of this. For example, some parties are seen as unfashionable so people might not make their voting choice explicit. Hence overall social media might seem to be saying one thing whereas actually people are thinking another.

By contrast, looking for information online on a website like Wikipedia is an essentially private activity so there aren’t these social biases. In other words, on Wikipedia we can directly have access to transactional data on what people do, rather than what they say or prefer to say.

Ed: How did these results and findings compare with the social media analysis done as part of our UK General Election 2015 Election Night Data Hack? (long title..)

Taha and Jonathan: The GE2015 data hack looked at individual politicians. We found that having a Wikipedia page is becoming increasingly important — over 40% of Labour and Conservative Party candidates had an individual Wikipedia page. We also found that this was highly correlated with Twitter presence — being more active on one network also made you more likely to be active on the other one. And we found some initial evidence that social media reaction was correlated with votes, though there is a lot more work to do here!

Ed: Can you see digital social data analysis replacing (or maybe just complementing) opinion polling in any meaningful way? And what problems would need to be addressed before that happened: e.g. around representative sampling, data cleaning, and weeding out bots?

Taha and Jonathan: Most political pundits are starting to look at a range of indicators of popularity — for example, not just voting intention, but also ratings of leadership competence, economic performance, etc. We can see good potential for social data to become part of this range of popularity indicator. However we don’t think it will replace polling just yet; the use of social media is limited to certain demographics. Also, the data collected from social media are often very shallow, not allowing for validation. In the case of Wikipedia, for example, we only know how many times each page is viewed, but we don’t know by how many people and from where.

Ed: You do a lot of research with Wikipedia data — has that made you reflect on your own use of Wikipedia?

Taha and Jonathan: It’s interesting to think about this activity of getting direct information about politicians — it’s essentially a new activity, something you couldn’t do in the pre-digital age. I know that I personally [Jonathan] use it to find out things about politicians and political parties — it would be interesting to know more about why other people are using it as well. This could have a lot of impacts. One thing Wikipedia has is a really long memory, in a way that other means of getting information on politicians (such as newspapers) perhaps don’t. We could start to see this type of thing becoming more important in electoral politics.

[Taha] .. since my research has been mostly focused on Wikipedia edit wars between human and bot editors, I have naturally become more cautious about the information I find on Wikipedia. When it comes to sensitive topics, sach as politics, Wikipedia is a good point to start, but not a great point to end the search!


Taha Yasseri and Jonathan Bright were talking to blog editor David Sutcliffe.

]]>
Edit wars! Examining networks of negative social interaction https://ensr.oii.ox.ac.uk/edit-wars-examining-networks-of-negative-social-interaction/ Fri, 04 Nov 2016 10:05:06 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3893
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). More details:
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). Read the full article for details.
While network science has significantly advanced our understanding of the structure and dynamics of the human social fabric, much of the research has focused on positive relations and interactions such as friendship and collaboration. Considerably less is known about networks of negative social interactions such as distrust, disapproval, and disagreement. While these interactions are less common, they strongly affect people’s psychological well-being, physical health, and work performance.

Negative interactions are also rarely explicitly declared and recorded, making them hard for scientists to study. In their new article on the structural and temporal features of negative interactions in the community, Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri use complex network methods to analyze patterns in the timing and configuration of reverts of article edits to Wikipedia. In large online collaboration communities like Wikipedia, users sometimes undo or downrate contributions made by other users; most often to maintain and improve the collaborative project. However, it is also possible that these actions are social in nature, with previous research acknowledging that they could also imply negative social interactions.

The authors find evidence that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. However, they don’t find evidence that editors “pay forward” a revert, coordinate with others to revert an editor, or revert different editors serially. These interactions can be related to the status of the editors. Even though the individual reverts might not necessarily be negative social interactions, their analysis points to the existence of certain patterns of negative social dynamics within the editorial community. Some of these patterns have not been previously explored and certainly carry implications for Wikipedia’s own knowledge collection practices — and can also be applied to other large-scale collaboration networks to identify the existence of negative social interactions.

Read the full article: Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri (2016) Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration. Scientific Reports 6 doi:10.1038/srep36333

We caught up with the authors to explore the implications of the work.

Ed: You find that certain types of negative social interactions and status considerations interfere with knowledge production on Wikipedia. What could or should Wikipedia do about it — or is it not actually a significant problem?

Taha: We believe it is an issue to consider. While the Wikipedia community might not be able to directly cope with it, as negative social interactions are intrinsic to human societies, an important consequence of our report would be to use the information in Wikipedia articles with extra care — and also to bear in mind that Wikipedia content might carry more subjectivity compared to a professionally written encyclopaedia.

Ed: Does reverting behaviour correlate with higher quality articles (i.e. with a lot of editorial attention) or simply with controversial topics — i.e. do you see reverting behaviour as generally a positive or negative thing?

Taha: In a different project we looked at the correlation between controversy and quality. We observed that controversy, up to a certain level, is correlated with higher quality of the article, specifically as far as the completeness of the article is concerned. However, the articles with very high scores of controversy, started to show less quality. In short, a certain amount of controversy helps the articles to become more complete, but too much controversy is a bad sign.

Ed: Do you think your results say more about the structure of Wikipedia, the structure of crowds, or about individuals?

Taha: Our results shed light on some of the most fundamental patterns in human behavior. It is one of the few examples in which a large dataset of negative interactions is analysed and the dynamics of negativity are studied. In this sense, this article is more about human behavior in interaction with other community members in a collaborative environment. However, because our data come from Wikipedia, I believe there are also lessons to be learnt about Wikipedia itself.

Ed: You note that by focusing on the macro-level you miss the nuanced understanding that thick ethnographic descriptions can produce. How common is it for computational social scientists to work with ethnographers? What would you look at if you were to work with ethnographers on this project?

Taha: One of the drawbacks in big data analysis in computational social science is the small depth of the analysis. We are lacking any demographic information about the individuals that we study. We can draw conclusions about the community of Wikipedia editors in a certain language, but that is by no means specific enough. An ethnographic approach, which would benefit our research tremendously, would go deeper in analyzing individuals and studying the features and attributes which lead to certain behavior. For example, we report, at a high level, that “status” determines editors’ actions to a good extend, but of course the mechanisms behind this observation can only be explained based on ethnographic analysis.

Ed: I guess Wikipedia (whether or not unfairly) is commonly associated with edit wars — while obviously also being a gigantic success: how about other successful collaborative platforms — how does Wikipedia differ from Zooniverse, for example?

Taha: There is no doubt that Wikipedia is a huge success and probably the largest collaborative project in the history of mankind. Our research mostly focuses on its dark side, but it does not question its success and value. Compared to other collaborative projects, such as Zooniverse, the main difference is in the management model. Wikipedia is managed and run by the community of editors. Very little top-down management is employed in Wikipedia. Whereas in Zooniverse for instance, the overall structure of the project is designed by a few researchers and the crowd can only act within a pre-determined framework. For more of these sort of comparisons, I suggest to look at our HUMANE project, in which we provide a typology and comparison for a wide range of Human-Machine Networks.

Ed: Finally — do you edit Wikipedia? And have you been on the receiving end of reverts yourself?

Taha: I used to edit Wikipedia much more. And naturally I have had my own share of reverts, at both ends!


Taha Yasseri was talking to blog editor David Sutcliffe.

]]>
Why global contributions to Wikipedia are so unequal https://ensr.oii.ox.ac.uk/why-global-contributions-to-wikipedia-are-so-unequal/ Mon, 08 Sep 2014 12:11:51 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3410 The geography of knowledge has always been uneven. Some people and places have always been more visible and had more voices than others. Reposted from The Conversation.

 

The geography of knowledge has always been uneven. Some people and places have always been more visible and had more voices than others. But the internet seemed to promise something different: a greater diversity of voices, opinions and narratives from more places. Unfortunately, this has not come to pass in quite the manner some expected it to. Many parts of the world remain invisible or under-represented on important websites and services.

All of this matters because as geographic information becomes increasingly integral to our lives, places that are not represented on platforms like Wikipedia will be absent from many of our understandings of, and interactions with, the world.

Mapping the differences

Until now, there has been no large-scale analysis of the factors that explain the wide geographical spread of online information. This is something we have aimed to address in our research project on the geography of Wikipedia. Our focus areas were the Middle East and North Africa.

Using statistical models of geotagged Wikipedia data, we identified the necessary conditions to make countries “visible”. This allowed us to map the countries that fare considerably better or worse than expected. We found that a large part of the variation between countries could be explained by just three factors: population, availability of broadband internet, and the number of edits originating in that country.

Areas of Wikipedia hegemony and uneven geographic coverage. Oxford Internet Institute
Areas of Wikipedia hegemony and uneven geographic coverage. Oxford Internet Institute

While these three variables help to explain the sparse amount of content written about much of sub-Saharan Africa, most of the Middle East and North Africa have much less geographic information than might be expected. For example, despite high levels of wealth and connectivity, Qatar and the United Arab Emirates have far fewer articles than we might expect.

Constraints to creating content

These three factors matter independently, but they will also be subject to other constraints. A country’s population will probably affect the number of activities, places, and practices of interest (that is, the number of things one might want to write about). The size of the potential audience might also be influential, encouraging editors in more densely populated regions and those writing in major languages. And social attitudes towards information sharing will probably also change how some people contribute content.

We might also be seeing a principle of increasing informational poverty. Not only is a broad base of source material, such as books, maps, and images, needed to generate any Wikipedia article, but it is also likely that having content online will lead to the production of more content.

There are strict guidelines on how knowledge can be created and represented in Wikipedia, including the need to source key assertions. Editing incentives and constraints probably also encourage work around existing content – which is relatively straightforward to edit – rather than creating entirely new material. So it may be that the very policies and norms that govern the encyclopedia’s structure make it difficult to populate the white space with new content.

We need to recognise that none of the three conditions can ever be sufficient for generating geographic knowledge. As well as highlighting the presences and absences on Wikipedia, we also need to ask what factors encourage or limit production of that content.

Because of the constraints of the Wikipedia model, increasing representation on pages can’t occur in a linear manner. Instead it accelerates in a virtuous cycle, benefiting those with strong cultures of collecting and curating information in local languages. That is why, even after adjusting for their levels of connectivity, population and editors, Britain, Sweden, Japan and Germany are extensively referenced on Wikipedia, but the Middle East and North Africa haven’t kept pace.

If this continues, then those on the periphery might fail to reach a critical mass of editors, needed to make content. Worse still, they may even dismiss Wikipedia as a legitimate site for user-generated geographic content. This is a problem that will need to be addressed if Wikipedia is indeed to take steps towards its goal of being the “sum of all human knowledge”.

]]>
What explains the worldwide patterns in user-generated geographical content? https://ensr.oii.ox.ac.uk/what-explains-the-worldwide-patterns-in-user-generated-geographical-content/ Mon, 08 Sep 2014 07:20:05 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2908 The geographies of codified knowledge have always been uneven, affording some people and places greater voice and visibility than others. While the rise of the geosocial Web seemed to promise a greater diversity of voices, opinions, and narratives about places, many regions remain largely absent from the websites and services that represent them to the rest of the world. These highly uneven geographies of codified information matter because they shape what is known and what can be known. As geographic content and geospatial information becomes increasingly integral to our everyday lives, places that are left off the ‘map of knowledge’ will be absent from our understanding of, and interaction with, the world.

We know that Wikipedia is important to the construction of geographical imaginations of place, and that it has immense power to augment our spatial understandings and interactions (Graham et al. 2013). In other words, the presences and absences in Wikipedia matter. If a person’s primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world will look fundamentally different from the world presented through the lens of the English Wikipedia. The capacity to represent oneself to outsiders is especially important in those parts of the world that are characterized by highly uneven power relationships: Brunn and Wilson (2013) and Graham and Zook (2013) have already demonstrated the power of geospatial content to reinforce power in a South African township and Jerusalem, respectively.

Until now, there has been no large-scale empirical analysis of the factors that explain information geographies at the global scale; this is something we have aimed to address in this research project on Mapping and measuring local knowledge production and representation in the Middle East and North Africa. Using regression models of geolocated Wikipedia data we have identified what are likely to be the necessary conditions for representation at the country level, and have also identified the outliers, i.e. those countries that fare considerably better or worse than expected. We found that a large part of the variation could be explained by just three factors: namely, (1) country population, (2) availability of broadband Internet, and (3) the number of edits originating in that country. [See the full paper for an explanation of the data and the regression models.]

But how do we explain the significant inequalities in the geography of user-generated information that remain after adjusting for differing conditions using our regression model? While these three variables help to explain the sparse amount of content written about much of Sub-Saharan Africa, most of the Middle East and North Africa have quantities of geographic information below their expected values. For example, despite high levels of wealth and connectivity, Qatar and the United Arab Emirates have far fewer articles than we might expect from the model.

These three factors independently matter, but they will also be subject to a number of constraints. A country’s population will probably affect the number of human sites, activities, and practices of interest; ie the number of things one might want to write about. The size of the potential audience might also be influential, encouraging editors in denser-populated regions and those writing in major languages. However, societal attitudes towards learning and information sharing will probably also affect the propensity of people in some places to contribute content. Factors discouraging the number of edits to local content might include a lack of local Wikimedia chapters, the attractiveness of writing content about other (better-represented) places, or contentious disputes in local editing communities that divert time into edit wars and away from content generation.

We might also be seeing a principle of increasing informational poverty. Not only is a broader base of traditional source material (such as books, maps, and images) needed for the generation of any Wikipedia article, but it is likely that the very presence of content itself is a generative factor behind the production of further content. This makes information produced about information-sparse regions most useful for people in informational cores — who are used to integrating digital information into their everyday practices — rather than those in informational peripheries.

Various practices and procedures of Wikipedia editing likely amplify this effect. There are strict guidelines on how knowledge can be created and represented in Wikipedia, including a ban on original research, and the need to source key assertions. Editing incentives and constraints probably also encourage work around existing content (which is relatively straightforward to edit) rather than creation of entirely new material. In other words, the very policies and norms that govern the encyclopedia’s structure make it difficult to populate the white space with new geographic content. In addressing these patterns of increasing informational poverty, we need to recognize that no one of these three conditions can ever be sufficient for the generation of geographic knowledge. As well as highlighting the presences and absences in user-generated content, we also need to ask what factors encourage or limit production of that content.

In interpreting our model, we have come to a stark conclusion: increasing representation doesn’t occur in a linear fashion, but it accelerates in a virtuous cycle, benefitting those with strong editing cultures in local languages. For example, Britain, Sweden, Japan and Germany are extensively georeferenced on Wikipedia, whereas much of the MENA region has not kept pace, even accounting for their levels of connectivity, population, and editors. Thus, while some countries are experiencing the virtuous cycle of more edits and broadband begetting more georeferenced content, those on the periphery of these information geographies might fail to reach a critical mass of editors, or even dismiss Wikipedia as a legitimate site for user-generated geographic content: a problem that will need to be addressed if Wikipedia is indeed to be considered as the “sum of all human knowledge”.

Read the full paper: Graham, M., Hogan, B., Straumann, R.K., and Medhat, A. (2014) Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers.

References

Brunn S. D., and M. W. Wilson. 2013. Cape Town’s million plus black township of Khayelitsha: Terrae incognitae and the geographies and cartographies of silence, Habitat International. 39 284-294.

Graham M., and M. Zook. (2013) Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web. Environment and Planning A 45(1): 77–99.

Graham M, M. Zook, and A. Boulton. 2013. Augmented Reality in the Urban Environment: Contested Content and the Duplicity of Code. Transactions of the Institute of British Geographers. 38(3) 464-479.


Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

]]>
Geotagging reveals Wikipedia is not quite so equal after all https://ensr.oii.ox.ac.uk/geotagging-reveals-wikipedia-is-not-quite-so-equal-after-all/ Mon, 18 Aug 2014 12:25:39 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3416 Wikipedia is often seen as a great equaliser. But it’s starting to look like global coverage on Wikipedia is far from equal. Reposted from The Conversation.

 

Wikipedia is often seen as a great equaliser. Every day, hundreds of thousands of people collaborate on a seemingly endless range of topics by writing, editing and discussing articles, and uploading images and video content. But it’s starting to look like global coverage on Wikipedia is far from equal. This now ubiquitous source of information offers everything you could want to know about the US and Europe but far less about any other parts of the world.

This structural openness of Wikipedia is one of its biggest strengths. Academic and activist Lawrence Lessig even describes the online encyclopedia as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing”.

But despite Wikipedia’s openness, there are fears that the platform is simply reproducing the most established worldviews. Knowledge created in the developed world appears to be growing at the expense of viewpoints coming from developing countries. Indeed, there are indications that global coverage in the encyclopedia is far from “equal”, with some parts of the world heavily represented on the platform, and others largely left out.

For a start, if you look at articles published about specific places such as monuments, buildings, festivals, battlefields, countries, or mountains, the imbalance is striking. Europe and North America account for a staggering 84% of these “geotagged” articles. Almost all of Africa is poorly represented in the encyclopedia, too. In fact, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa. And while there are just over 94,000 geotagged articles related to Japan, there are only 88,342 on the entire Middle East and North Africa region.

Total number of geotagged Wikipedia articles across 44 surveyed languages. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).
Total number of geotagged Wikipedia articles across 44 surveyed languages. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

When you think of the spread in terms of the way the world’s population is spread, the picture is equally startling. Even though 60% of the world’s population is concentrated in Asia, less than 10% of Wikipedia articles relate to the region. The same is true in reverse for Europe, which is home to around 10% of the world’s population but accounts for nearly 60% of geotagged Wikipedia articles.

Number of regional geotagged articles and population. Graham, M., S. Hale & M. Stephens. 2011. Geographies of the World's Knowledge. Convoco! Edition.
Number of regional geotagged articles and population. Graham, M., S. Hale & M. Stephens. 2011. Geographies of the World’s Knowledge. Convoco! Edition.

There is an imbalance in the languages used on Wikipedia too. Most articles written about European and East Asian countries are written in their dominant languages. Articles about the Czech Republic, for example, are mostly written in Czech. But for much of the Global South we see a dominance of articles written in English. English dominates across much of Africa and the Middle East and even parts of South and Central America.

Dominant language of Wikipedia articles (by country). Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).
Dominant language of Wikipedia articles (by country). Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

There more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. And there are more English articles about North Korea than there are Arabic articles about either Saudi Arabia, Libya, or the United Arab Emirates. In total, there are more than 928,000 geotagged articles written in English, but only 3.23% of them are about Africa and 1.67% are about the Middle East and North Africa.

Number of geotagged articles in the English Wikipedia by country. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).
Number of geotagged articles in the English Wikipedia by country. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

All this matters because fundamentally different narratives can be, and are, created about places and topics in different languages.

Beyond English

Even on the Arabic Wikipedia, there are geographical imbalances. There are a relatively high number of articles about Algeria and Syria, as well as about the US, Italy, Spain, Russia and Greece but substantially fewer about a number of Arabic speaking countries, including Egypt, Morocco, and Saudi Arabia. Indeed, there are only 433 geotagged articles about Egypt on the Arabic Wikipedia, but 2,428 about Italy and 1,988 about Spain.

Total number of geotagged articles in the Arabic Wikipedia by country Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).
Total number of geotagged articles in the Arabic Wikipedia by country Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

By mapping the geography of Wikipedia articles in both global and regional languages, we can begin to examine the layers of representation that “augment” the world we live in. Some parts of the world, including the Middle East, are massively underrepresented – not just in major world languages, but their own. We like to think of Wikipedia as an opportunity for anyone, anywhere to contribute information about our world but that doesn’t seem to be happening in practice. Wikipedia might not just be reflecting the world, but also reproducing new, uneven, geographies of information.

]]>
What is stopping greater representation of the MENA region? https://ensr.oii.ox.ac.uk/what-is-stopping-greater-representation-of-the-mena-region/ Wed, 06 Aug 2014 08:35:52 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2575 Caption
Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. Image of the Dome of the Rock (Qubbat As-Sakhrah), Jerusalem, by 1yen

Wikipedia has famously been described as a project that “ works great in practice and terrible in theory”. One of the ways in which it succeeds is through its extensive consensus-based governance structure. While this has led to spectacular success –over 4.5 million articles in the English Wikipedia alone — the governance structure is neither obvious nor immediately accessible, and can present a barrier for those seeking entry. Editing Wikipedia can be a tough challenge – an often draining and frustrating task, involving heated disputes and arguments where it is often the most tenacious, belligerent, or connected editor who wins out in the end.

Broadband access and literacy are not the only pre-conditions for editing Wikipedia; ‘digital literacy’ is also crucial. This includes the ability to obtain and critically evaluate online sources, locate Wikipedia’s editorial and governance policies, master Wiki syntax, and confidently articulate and assert one’s views about an article or topic. Experienced editors know how to negotiate the rules, build a consensus with some editors to block others, and how to influence administrators during dispute resolution. This strict adherence to the word (if not the spirit) of Wikipedia’s ‘law’ can lead to marginalization or exclusion of particular content, particularly when editors are scared off by unruly mobs who ‘weaponize’ policies to fit a specific agenda.

Governing such a vast collaborative platform as Wikipedia obviously presents a difficult balancing act between being open enough to attract volume of contributions, and moderated enough to ensure their quality. Many editors consider Wikipedia’s governance structure (which varies significantly between the different language versions) essential to ensuring the quality of its content, even if it means that certain editors can (for example) arbitrarily ban other users, lock down certain articles, and exclude moderate points of view. One of the editors we spoke to noted that: “A number of articles I have edited with quality sources, have been subjected to editors cutting information that doesn’t fit their ideas […] I spend a lot of time going back to reinstate information. Today’s examples are in the ‘Battle of Nablus (1918)’ and the ‘Third Transjordan attack’ articles. Bullying does occur from time to time […] Having tried the disputes process I wouldn’t recommend it.” Community building might help support MENA editors faced with discouragement or direct opposition as they try to build content about the region, but easily locatable translations of governance materials would also help. Few of the extensive Wikipedia policy discussions have been translated into Arabic, leading to replication of discussions or ambiguity surrounding correct dispute resolution.

Beyond arguments with fractious editors over minutiae (something that comes with the platform), negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. It would be an understatement to say that the Middle East is a politically sensitive region, with more than its fair share of apparently unresolvable disputes, competing ideologies (it’s the birthplace of three world religions…), repressive governments, and ongoing and bloody conflicts. Editors shared stories with us about meddling from state actors (eg Tunisia, Iran) and a lack of trust with a platform that is generally considered to be a foreign, and sometimes explicitly American, tool. Rumors abound that several states (eg Israel, Iran) have concerted efforts to work on Wikipedia content, creating a chilling effect for new editors who might feel that editing certain pages might prove dangerous, or simply frustrating or impossible. Some editors spoke of being asked by Syrian government officials for advice on how to remove critical content, or how to identify the editors responsible for putting it there. Again: the effect is chilling.

A lack of locally produced and edited content about the region clearly can’t be blamed entirely on ‘outsiders’. Many editors in the Arabic Wikipedia have felt snubbed by the creation of an explicitly “Egyptian Arabic” Wikipedia, which has not only forked the content and editorial effort, but also stymied any ‘pan-Arab’ identity on the platform. There is a culture of administrators deleting articles they do not think are locally appropriate; often relating to politically (or culturally) sensitive topics. Due to Arabic Wikipedia’s often vicious edit wars, it is heavily moderated (unlike for example the English version), and anonymous edits do not appear instantly.

Some editors at the workshops noted other systemic and cultural issues, for example complaining of an education system that encourages rote learning, reinforcing the notion that only experts should edit (or moderate) a topic, rather than amateurs with local familiarity. Editors also noted the notable gender disparities on the site; a longstanding issue for other Wikipedia versions as well. None of these discouragements are helped by what some editors noted as a larger ‘image problem’ with editing in the Arabic Wikipedia, given it would always be overshadowed by the dominant English Wikipedia, one editor commenting that: “the English Wikipedia is vastly larger than its Arabic counterpart, so it is not unthinkable that there is more content, even about Arab-world subjects, in English. From my (unscientific) observation, many times, content in Arabic about a place or a tribe is not very encyclopedic, but promotional, and lacks citations”. Translating articles into Arabic might be seen as menial and unrewarding work, when the exciting debates about an article are happening elsewhere.

When we consider the coming-together of all of these barriers, it might be surprising that Wikipedia is actually as large as it is. However, the editors we spoke with were generally optimistic about the site, considering it an important activity that serves the greater good. Wikipedia is without doubt one of the most significant cultural and political forces on the Internet. Wikipedians are remarkably generous with their time, and it’s their efforts that are helping to document, record, and represent much of the world – including places where documentation is scarce. Most of the editors at our workshop ultimately considered Wikipedia a path to a more just society; through not just consensus, voting, and an aspiration to record certain truths — seeing it not just as a site of conflict, but also a site of regional (and local) pride. When asked why he writes geographic content, one editor simply replied: “It’s my own town”.


Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

]]>
How well represented is the MENA region in Wikipedia? https://ensr.oii.ox.ac.uk/how-well-represented-is-the-mena-region-in-wikipedia/ Tue, 22 Jul 2014 08:13:02 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2811
There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.
There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.
Wikipedia is often seen to be both an enabler and an equalizer. Every day hundreds of thousands of people collaborate on an (encyclopaedic) range of topics; writing, editing and discussing articles, and uploading images and video content. This structural openness combined with Wikipedia’s tremendous visibility has led some commentators to highlight it as “a technology to equalize the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing” (Lessig 2003). However, despite Wikipedia’s openness, there are also fears that the platform is simply reproducing worldviews and knowledge created in the Global North at the expense of Southern viewpoints (Graham 2011; Ford 2011). Indeed, there are indications that global coverage in the encyclopaedia is far from ‘equal’, with some parts of the world heavily represented on the platform, and others largely left out (Hecht and Gergle 2009; Graham 2011, 2013, 2014).

These second-generation digital divides are not merely divides of Internet access (so discussed in the late 1990s), but gaps in representation and participation (Hargittai and Walejko 2008). Whereas most Wikipedia articles written about most European and East Asian countries are written in their dominant languages, for much of the Global South we see a dominance of articles written in English. These geographic differences in the coverage of different language versions of Wikipedia matter, because fundamentally different narratives can be (and are) created about places and topics in different languages (Graham and Zook 2013; Graham 2014).

If we undertake a ‘global analysis’ of this pattern by examining the number of geocoded articles (ie about a specific place) across Wikipedia’s main language versions (Figure 1), the first thing we can observe is the incredible human effort that has gone into describing ‘place’ in Wikipedia. The second is the clear and highly uneven geography of information, with Europe and North America home to 84% of all geolocated articles. Almost all of Africa is poorly represented in the encyclopaedia — remarkably, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa, and more geotagged articles relating to Japan (94,022) than the entire MENA region (88,342). In Figure 2 it is even more obvious that Europe and North America lead in terms of representation on Wikipedia.

Figure 1. Total number of geotagged Wikipedia articles across all 44 surveyed languages.
Figure 1. Total number of geotagged Wikipedia articles across all 44 surveyed languages.
Figure 2. Number of regional geotagged articles and population.
Figure 2. Number of regional geotagged articles and population.

Knowing how many articles describe a place only tells a part of the ‘representation story’. Figure 3 adds the linguistic element, showing the dominant language of Wikipedia articles per country. The broad pattern is that some countries largely define themselves in their own languages, and others appear to be largely defined from outside. For instance, almost all European countries have more articles about themselves in their dominant language; that is, most articles about the Czech Republic are written in Czech. Most articles about Germany are written in German (not English).

Figure 3. Language with the most geocoded articles by country (across 44 top languages on Wikipedia).
Figure 3. Language with the most geocoded articles by country (across 44 top languages on Wikipedia).

We do not see this pattern across much of the South, where English dominates across much of Africa, the Middle East, South and East Asia, and even parts of South and Central America. French dominates in five African countries, and German is dominant in one former German colony (Namibia) and a few other countries (e.g. Uruguay, Bolivia, East Timor).

The scale of these differences is striking. Not only are there more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East, but there are more English articles about North Korea than there are Arabic articles about Saudi Arabia, Libya, and the UAE. Not only do we see most of the world’s content written about global cores, but it is largely dominated by a relatively few languages.

Figure 4 shows the total number of geotagged Wikipedia articles in English per country. The sheer density of this layer of information over some parts of the world is astounding (with 928,542 articles about places in English), nonetheless, in this layer of geotagged English content, only 3.23% of the articles are about Africa, and 1.67% are about the MENA region.

Figure 4. Number of geotagged articles in the English Wikipedia by country.
Figure 4. Number of geotagged articles in the English Wikipedia by country.

We see a somewhat different pattern when looking at the global geography of the 22,548 geotagged articles of the Arabic Wikipedia (Figure 5). Algeria and Syria are both defined by a relatively high number of articles in Arabic (as are the US, Italy, Spain, Russia and Greece). These information densities are substantially greater than what we see for many other MENA countries in which Arabic is an official language (such as Egypt, Morocco, and Saudi Arabia). This is even more surprising when we realise that the Italian and Spanish populations are smaller than the Egyptian, but there are nonetheless far more geotagged articles in Arabic about Italy (2,428) and Spain (1,988) than about Egypt (433).

Figure 5. Total number of geotagged articles in the Arabic Wikipedia by country.
Figure 5. Total number of geotagged articles in the Arabic Wikipedia by country.

By mapping the geography of Wikipedia articles in both global and regional languages, we can begin to examine the layers of representation that ‘augment’ the world we live in. We have seen that, notable exceptions aside (e.g. ‘Iran’ in Farsi and ‘Israel’ in Hebrew) the MENA region tends to be massively underrepresented — not just in major world languages, but also in its own: Arabic. Clearly, much is being left unsaid about that part of the world. Although we entered the project anticipating that the MENA region would be under-represented in English, we did not anticipate the degree to which it is under-represented in Arabic.

References

Ford, H. (2011) The Missing Wikipedians. In Critical Point of View: A Wikipedia Reader, ed. G. Lovink and N. Tkacz, 258-268. Amsterdam: Institute of Network Cultures.

Graham, M. (2014) The Knowledge Based Economy and Digital Divisions of Labour. In Companion to Development Studies, 3rd edition, eds v. Desai, and R. Potter. Hodder, pp. 189-195.

Graham, M. (2013) The Virtual Dimension. In Global City Challenges: Debating a Concept, Improving the Practice. Eds. Acuto, M. and Steele, W. London: Palgrave.

Graham, M. (2011) Wiki Space: Palimpsests and the Politics of Exclusion. In Critical Point of View: A Wikipedia Reader. Eds. Lovink, G. and Tkacz, N. Amsterdam: Institute of Network Cultures, pp. 269-282.

Graham M., and M. Zook (2013) Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web. Environment and Planning A 45 (1) 77–99.

Hargittai, E. and G. Walejko (2008) The Participation Divide: Content Creation and Sharing in the Digital Age. Information, Communication and Society 11 (2) 239–256.

Hecht B., and D. Gergle (2009) Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the 4th International Conference on Communities and Technologies, Penn State University, 2009, pp. 11–20. New York: ACM.

Lessig, L. (2003) An Information Society: Free or Feudal. Talk given at the World Summit on the Information Society, Geneva, 2003.


Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

]]>
The sum of (some) human knowledge: Wikipedia and representation in the Arab World https://ensr.oii.ox.ac.uk/the-sum-of-some-human-knowledge-wikipedia-and-representation-in-the-arab-world/ Mon, 14 Jul 2014 09:00:14 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2555 Caption
Arabic is one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. Image of the Umayyad Mosque (Damascus) by Travel Aficionado

Wikipedia currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. Being the first point of contact for most general topics (therefore an effective site for framing any subsequent representations) it is an important platform from which we can learn whether the Internet facilitates increased open participation across cultures — or reinforces existing global hierarchies and power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, entire groups of people (and regions) may be marginalized without their knowledge.

This process is important to understand, for the simple reason that Wikipedia content has begun to form a central part of services offered elsewhere on the Internet. When you look for information about a place on Facebook, the description of that place (including its geographic coordinates) comes from Wikipedia. If you want to “check in” to a museum in Doha to signify you were there to their friends, the place you check in to was created with Wikipedia data. When you Google “House of Saud” you are presented not only with a list of links (with Wikipedia at the top) but also with a special ‘card’ summarising the House. This data comes from Wikipedia. When you look for people or places, Google now has these terms inside its ‘knowledge graph’, a network of related concepts with data coming directly from Wikipedia. Similarly, on Google maps, Wikipedia descriptions for landmarks are presented as part of the default information.

Ironically, Wikipedia editorship is actually on a slow and steady decline, even as its content and readership increases year on year. Since 2007 and the introduction of significant devolution of administrative powers to volunteers, Wikipedia has not been able to effectively retain newcomers, something which has been noted as a concern by many at the Wikimedia Foundation. Some think Wikipedia might be levelling off because there’s only so much to write about. This is extremely far from the truth; there are still substantial gaps in geographic content in English and overwhelming gaps in other languages. Wikipedia often brands itself as aspiring to contain “the sum of human knowledge”, but behind this mantra lie policy pitfalls, tedious editor debates and delicate sourcing issues that hamper greater representation of the region. Of course these challenges form part of Wikipedia’s continuing evolution as the de facto source for online reference information, but they also (disturbingly) act to entrench particular ways of “knowing” — and ways of validating what is known.

There are over 260,000 articles in Arabic, receiving 240,000 views per hour. This actually translates as one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. This relative lack of MENA voice and representation means that the tone and content of this globally useful resource, in many cases, is being determined by outsiders with a potential misunderstanding of the significance of local events, sites of interest and historical figures. In an area that has seen substantial social conflict and political upheaval, greater participation from local actors would help to ensure balance in content about contentious issues. Unfortunately, most research on MENA’s Internet presence has so far been drawn from anecdotal evidence, and no comprehensive studies currently exist.

In this project we wanted to understand where place-based content comes from, to explain reasons for the relative lack of Wikipedia articles in Arabic and about the MENA region, and to understand which parts of the region are particularly underrepresented. We also wanted to understand the relationship between Wikipedia’s administrative structure and the treatment of new editors; in particular, we wanted to know whether editors from the MENA region have less of a voice than their counterparts from elsewhere, and whether the content they create is considered more or less legitimate, as measured through the number of reverts; ie the overriding of their work by other editors.

Our practical objectives involved a consolidation of Middle Eastern Wikipedians though a number of workshops focusing on how to create more equitable and representative content, with the ultimate goal of making Wikipedia a more generative and productive site for reference information about the region. Capacity building among key Wikipedians can create greater understanding of barriers to participation and representation and offset much of the (often considerable) emotional labour required to sustain activity on the site in the face of intense arguments and ideological biases. Potential systematic structures of exclusion that could be a barrier to participation include such competitive practices as content deletion, indifference to content produced by MENA authors, and marginalization through bullying and dismissal.

However, a distinct lack of sources — owing both to a lack of legitimacy for MENA journalism and a paucity of open access government documents — is also inhibiting further growth of content about the region. When inclusion of a topic is contested by editors it is typically because there is not enough external source material about it to establish “notability”. As Ford (2011) has already discussed, notability is often culturally mediated. For example, a story in Al Jazeera would not have been considered a sufficient criterion of notability a couple of years ago. However, this has changed dramatically since its central role in reporting on the Arab Spring.

Unfortunately, notability can create a feedback loop. If an area of the world is underreported, there are no sources. If there are no sources, then journalists do not always have enough information to report about that part of the world. ‘Correct’ sourcing trumps personal experience on Wikipedia; even if an author is from a place, and is watching a building being destroyed, their Wikipedia edit will not be accepted by the community unless the event is discussed in another ‘official’ medium. Often the edit will either be branded with a ‘citation needed’ tag, eliminated, or discussed in the talk page. Particularly aggressive editors and administrators will nominate the page for ‘speedy deletion’ (ie deletion without discussion), a practice that makes responses from an author difficult

Why does any of this matter in practical terms? For the simple reason that biases, absences and contestations on Wikipedia spill over into numerous other domains that are in regular and everyday use (Graham and Zook, 2013). If a place is not on Wikipedia, this might have a chilling effect on business and stifle journalism; if a place is represented poorly on Wikipedia this can lead to misunderstandings about the place. Wikipedia is not a legislative body. However, in the court of public opinion, Wikipedia represents one of the world’s strongest forces, as it quietly inserts itself into representations of place worldwide (Graham et. al 2013; Graham 2013).

Wikipedia is not merely a site of reference information, but is rapidly becoming the de facto site for representing the world to itself. We need to understand more about that representation.

Further Reading

Allagui, I., Graham, M., and Hogan, B. 2014. Wikipedia Arabe et la Construction Collective du Savoir In Wikipedia, objet scientifique non identifie. eds. Barbe, L., and Merzeau, L. Paris: Presses Universitaries du Paris Ouest (in press).

Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

Graham, M. 2012. Die Welt in Der Wikipedia Als Politik der Exklusion: Palimpseste des Ortes und selective Darstellung. In Wikipedia. eds. S. Lampe, and P. Bäumer. Bundeszentrale für politische Bildung/bpb, Bonn.

Graham, M. 2011. Wiki Space: Palimpsests and the Politics of Exclusion. In Critical Point of View: A Wikipedia Reader. Eds. Lovink, G. and Tkacz, N. Amsterdam: Institute of Network Cultures, 269-282.

References

Ford, H. (2011) The Missing Wikipedians. In Geert Lovink and Nathaniel Tkacz (eds), Critical Point of View: A Wikipedia Reader, Amsterdam: Institute of Network Cultures, 2011. ISBN: 978-90-78146-13-1.

Graham, M., M. Zook., and A. Boulton. 2013. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. 38(3), 464-479.

Graham, M and M. Zook. 2013. Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web. Environment and Planning A 45(1) 77-99.

Graham, M. 2013. The Virtual Dimension. In Global City Challenges: debating a concept, improving the practice. eds. M. Acuto and W. Steele. London: Palgrave. 117-139.


Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

]]>
Edit wars! Measuring and mapping society’s most controversial topics https://ensr.oii.ox.ac.uk/edit-wars-measuring-mapping-societys-most-controversial-topics/ Tue, 03 Dec 2013 08:21:43 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2339 Ed: How did you construct your quantitative measure of ‘conflict’? Did you go beyond just looking at content flagged by editors as controversial?

Taha: Yes we did … actually, we have shown that controversy measures based on “controversial” flags are not inclusive at all and although they might have high precision, they have very low recall. Instead, we constructed an automated algorithm to locate and quantify the editorial wars taking place on the Wikipedia platform. Our algorithm is based on reversions, i.e. when editors undo each other’s contributions. We focused specifically on mutual reverts between pairs of editors and we assigned a maturity score to each editor, based on the total volume of their previous contributions. While counting the mutual reverts, we used more weight for those ones committed by/on editors with higher maturity scores; as a revert between two experienced editors indicates a more serious problem. We always validated our method and compared it with other methods, using human judgement on a random selection of articles.

Ed: Was there any discrepancy between the content deemed controversial by your own quantitative measure, and what the editors themselves had flagged?

Taha: We were able to capture all the flagged content, but not all the articles found to be controversial by our method are flagged. And when you check the editorial history of those articles, you soon realise that they are indeed controversial but for some reason have not been flagged. It’s worth mentioning that the flagging process is not very well implemented in smaller language editions of Wikipedia. Even if the controversy is detected and flagged in English Wikipedia, it might not be in the smaller language editions. Our model is of course independent of the size and editorial conventions of different language editions.

Ed: Were there any differences in the way conflicts arose / were resolved in the different language versions?

Taha: We found the main differences to be the topics of controversial articles. Although some topics are globally debated, like religion and politics, there are many topics which are controversial only in a single language edition. This reflects the local preferences and importances assigned to topics by different editorial communities. And then the way editorial wars initiate and more importantly fade to consensus is also different in different language editions. In some languages moderators interfere very soon, while in others the war might go on for a long time without any moderation.

Ed: In general, what were the most controversial topics in each language? And overall?

Taha: Generally, religion, politics, and geographical places like countries and cities (sometimes even villages) are the topics of debates. But each language edition has also its own focus, for example football in Spanish and Portuguese, animations and TV series in Chinese and Japanese, sex and gender-related topics in Czech, and Science and Technology related topics in French Wikipedia are very often behind editing wars.

Ed: What other quantitative studies of this sort of conflict -ie over knowledge and points of view- are there?

Taha: My favourite work is one by researchers from Barcelona Media Lab. In their paper Jointly They Edit: Examining the Impact of Community Identification on Political Interaction in Wikipedia they provide quantitative evidence that editors interested in political topics identify themselves more significantly as Wikipedians than as political activists, even though they try hard to reflect their opinions and political orientations in the articles they contribute to. And I think that’s the key issue here. While there are lots of debates and editorial wars between editors, at the end what really counts for most of them is Wikipedia as a whole project, and the concept of shared knowledge. It might explain how Wikipedia really works despite all the diversity among its editors.

Ed: How would you like to extend this work?

Taha: Of course some of the controversial topics change over time. While Jesus might stay a controversial figure for a long time, I’m sure the article on President (W) Bush will soon reach a consensus and most likely disappear from the list of the most controversial articles. In the current study we examined the aggregated data from the inception of each Wikipedia-edition up to March 2010. One possible extension that we are working on now is to study the dynamics of these controversy-lists and the positions of topics in them.

Read the full paper: Yasseri, T., Spoerri, A., Graham, M. and Kertész, J. (2014) The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: P.Fichman and N.Hara (eds) Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press.


Taha was talking to blog editor David Sutcliffe.

Taha Yasseri is the Big Data Research Officer at the OII. Prior to coming to the OII, he spent two years as a Postdoctoral Researcher at the Budapest University of Technology and Economics, working on the socio-physical aspects of the community of Wikipedia editors, focusing on conflict and editorial wars, along with Big Data analysis to understand human dynamics, language complexity, and popularity spread. He has interests in analysis of Big Data to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

]]>
Who represents the Arab world online? https://ensr.oii.ox.ac.uk/arab-world/ Tue, 01 Oct 2013 07:09:58 +0000 http://blogs.oii.ox.ac.uk/policy/?p=2190 Caption
Editors from all over the world have played some part in writing about Egypt; in fact, only 13% of all edits actually originate in the country (38% are from the US). More: Who edits Wikipedia? by Mark Graham.

Ed: In basic terms, what patterns of ‘information geography’ are you seeing in the region?

Mark: The first pattern that we see is that the Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected. Second, of the content that exists, a lot of it is in European and French rather than in Arabic (or Farsi or Hebrew). In other words, there is even less in local languages.

And finally, if we look at contributions (or edits), not only do we also see a relatively small number of edits originating in the region, but many of those edits are being used to write about other parts of the word rather than their own region. What this broadly seems to suggest is that the participatory potentials of Wikipedia aren’t yet being harnessed in order to even out the differences between the world’s informational cores and peripheries.

Ed: How closely do these online patterns in representation correlate with regional (offline) patterns in income, education, language, access to technology (etc.) Can you map one to the other?

Mark: Population and broadband availability alone explain a lot of the variance that we see. Other factors like income and education also play a role, but it is population and broadband that have the greatest explanatory power here. Interestingly, it is most countries in the MENA region that fail to fit well to those predictors.

Ed: How much do you think these patterns result from the systematic imposition of a particular view point – such as official editorial policies – as opposed to the (emergent) outcome of lots of users and editors acting independently?

Mark: Particular modes of governance in Wikipedia likely do play a factor here. The Arabic Wikipedia, for instance, to combat vandalism has a feature whereby changes to articles need to be reviewed before being made public. This alone seems to put off some potential contributors. Guidelines around sourcing in places where there are few secondary sources also likely play a role.

Ed: How much discussion (in the region) is there around this issue? Is this even acknowledged as a fact or problem?

Mark: I think it certainly is recognised as an issue now. But there are few viable alternatives to Wikipedia. Our goal is hopefully to identify problems that lead to solutions, rather than simply discouraging people from even using the platform.

Ed: This work has been covered by the Guardian, Wired, the Huffington Post (etc.) How much interest has there been from the non-Western press or bloggers in the region?

Mark: There has been a lot of coverage from the non-Western press, particularly in Latin America and Asia. However, I haven’t actually seen that much coverage from the MENA region.

Ed: As an academic, do you feel at all personally invested in this, or do you see your role to be simply about the objective documentation and analysis of these patterns?

Mark: I don’t believe there is any such thing as ‘objective documentation.’ All research has particular effects in and on the world, and I think it is important to be aware of the debates, processes, and practices surrounding any research project. Personally, I think Wikipedia is one of humanity’s greatest achievements. No previous single platform or repository of knowledge has ever even come close to Wikipedia in terms of its scale or reach. However, that is all the more reason to critically investigate what exactly is, and isn’t, contained within this fantastic resource. By revealing some of the biases and imbalances in Wikipedia, I hope that we’re doing our bit to improving it.

Ed: What factors do you think would lead to greater representation in the region? For example: is this a matter of voices being actively (or indirectly) excluded, or are they maybe just not all that bothered?

Mark: This is certainly a complicated question. I think the most important step would be to encourage participation from the region, rather than just representation of the region. Some of this involves increasing some of the enabling factors that are the prerequisites for participation; factors like: increasing broadband access, increasing literacy, encouraging more participation from women and minority groups.

Some of it is then changing perceptions around Wikipedia. For instance, many people that we spoke to in the region framed Wikipedia as an American our outside project rather than something that is locally created. Unfortunately we seem to be currently stuck in a vicious cycle in which few people from the region participate, therefore fulfilling the very reason why some people think that they shouldn’t participate. There is also the issue of sources. Not only does Wikipedia require all assertions to be properly sourced, but secondary sources themselves can be a great source of raw informational material for Wikipedia articles. However, if few sources about a place exist, then it adds an additional burden to creating content about that place. Again, a vicious cycle of geographic representation.

My hope is that by both working on some of the necessary conditions to participation, and engaging in a diverse range of initiatives to encourage content generation, we can start to break out of some of these vicious cycles.

Ed: The final moonshot question: How would you like to extend this work; time and money being no object?

Mark: Ideally, I’d like us to better understand the geographies of representation and participation outside of just the MENA region. This would involve mixed-methods (large scale big data approaches combined with in-depth qualitative studies) work focusing on multiple parts of the world. More broadly, I’m trying to build a research program that maintains a focus on a wide range of Internet and information geographies. The goal here is to understand participation and representation through a diverse range of online and offline platforms and practices and to share that work through a range of publicly accessible media: for instance the ‘Atlas of the Internet’ that we’re putting together.


Mark Graham was talking to blog editor David Sutcliffe.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

]]>