topic modelling – The Policy and Internet Blog https://ensr.oii.ox.ac.uk Understanding public policy online Mon, 07 Dec 2020 14:24:54 +0000 en-GB hourly 1 “If you’re on Twitter then you’re asking for it” — responses to sexual harassment online and offline https://ensr.oii.ox.ac.uk/if-youre-on-twitter-then-youre-asking-for-it-responses-to-sexual-harassment-online-and-offline/ Fri, 24 Feb 2017 14:00:28 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3952 To encourage new ways of thinking about the problem of sexism in daily life, the OII’s recent Everyday Sexism Datahack brought together twenty people from a range of disciplinary backgrounds to analyse the written accounts of sexism and harassment gathered by the Everyday Sexism project. Founded by Laura Bates in 2012, Everyday Sexism has gathered more than 120,000 accounts submitted by members of the public.

A research team at the OII has already been analysing the content, and provided cleaned data to the datahack participants that could be analysed through qualitative and quantitative methods. Following an introduction to the project by Laura Bates, an outline of the dataset by Taha Yasseri, and a speed-networking session led by Kathryn Eccles we fell into two teams to work with the data.

Our own group wanted to examine the question of how people interact with the threat of public space. We were also interested in how public space is divided between online and offline, and the social perception of being online versus offline. We wanted to explore what sorts of reactions people might have to examples of assault, or strategies or things they might do in response to something happening to them — and how they might differ online and offline.

We spent the first hour collecting keywords that might indicate reactions to either online or offline harassment, including identifying a perceived threat and coping with it. We then searched the raw data for responses like “I tried to ignore it” “I felt safe / unsafe” “I identified a risk” “I was feeling worried, feeling anxious or nervous“; and also looked at online versus offline actions. So for online action we were looking for specific platforms being named, and people saying things like “comment, response, delete, remove” in relation to social media posts. For offline we were looking for things like “I carried a [specific item]” or “I hid or avoided certain areas“ or “I walked faster” (etc.).

We wanted to know if we could apply ideas of responses to offline space back to online spaces, and how these online spaces fall short. Offline responses are often very individual, whereas you might not have such a direct and individual response to something like a Facebook ad. Taking into account the important caveat that this was just a quick exploration of the data — and that the data were indicative rather than representative (so should in no way be used to extrapolate or infer anything concrete) one of the biggest things we found was that while in the offline examples of responses to harassment there was quite a lot of action, like running away, or hiding in shops and restaurants, there were very few examples to responses in the online examples.

Though it actually turned out to be difficult to identify a clear division between online/offline contexts in the data: we saw accounts of people who were online on social media encountering something sexist and logging off, and then walking in the street and getting harassed. But it seemed like people were more likely to report something offline to the police than in online forums. And this contrast is very interesting, in terms of whether you can be an active agent in response to something, or whether there’s something about being online that positions you as being passive and unable to respond — and what we can do about that.

While we found it difficult to quantify, we did wonder if people might not be giving themselves credit for the kinds of responses they have to examples of sexism online — maybe they aren’t thinking about what they do. Whereas offline they might say “I ran away, because I was so scared” perhaps when it’s online, people just read it and not respond; or at least not report responses to the same extent. There were lots of complaints about images, or hypocrisy about Facebook’s enforcement of community standards (such as allowing rape jokes, but deleting pictures of breast-feeding), and other things like that. But the accounts don’t say if they reported it or took action.

This is strange because in cases of offline harassment in the street, where it escalates into something physical like a fight, women are often at a disadvantage: whereas in the online context women ought to have more leverage — but it does’t seem like reporting is being done. When we examined the themes of how people reacted online, we further differentiated between removing the source of a sexist comment (such as unfriending, unfollowing, muting, deleting) and removing the self (such as going offline, or removing yourself from the platform). It seemed that removing the source was generally more common than removing the self.

So people might simply be normalising the idea that misogyny and sexism is going to exist in forums. In the data someone had reported someone on Twitter saying “Well if you’re on Twitter you’re asking for it” — indicative of a “short-skirt” line of thinking about engaging on social media. In this environment people might see unfollowing and unfriending as a form of management and negotiation, as opposed to a fundamental problem with the site itself. It would be interesting to explore the self-censoring that happens before anything happens: quite a few of the examples we read opened with “I wasn’t even wearing anything provocative, but [this] happened..”. And it would be interesting to know if people also think like that in the online context: “I wasn’t even participating in a controversial way, but this still happened”. It’s an interesting parallel, maybe.

]]>
Topic modelling content from the “Everyday Sexism” project: what’s it all about? https://ensr.oii.ox.ac.uk/topic-modelling-content-from-the-everyday-sexism-project-whats-it-all-about/ Thu, 03 Mar 2016 09:19:23 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3552 We recently announced the start of an exciting new research project that will involve the use of topic modelling in understanding the patterns in submitted stories to the Everyday Sexism website. Here, we briefly explain our text analysis approach, “topic modelling”.

At its very core, topic modelling is a technique that seeks to automatically discover the topics contained within a group of documents. ‘Documents’ in this context could refer to text items as lengthy as individual books, or as short as sentences within a paragraph. Let’s take the idea of sentences-as-documents as an example:

  • Document 1: I like to eat kippers for breakfast.
  • Document 2: I love all animals, but kittens are the cutest.
  • Document 3: My kitten eats kippers too.

Assuming that each sentence contains a mixture of different topics (and that a ‘topic’ can be understood as a collection of words (of any part of speech) that have different probabilities of appearance in passages discussing the topic), how does the topic modelling algorithm ‘discover’ the topics within these sentences?

The algorithm is initiated by setting the number of topics that it needs to extract. Of course, it is hard to guess this number without having an insight on the topics, but one can think of this as a resolution tuning parameter. The smaller the number of topics is set, the more general the bag of words in each topic would be, and the looser the connections between them.

The algorithm loops through all of the words in each document, assigning every word to one of our topics in a temporary and semi-random manner. This initial assignment is arbitrary and it is easy to show that different initializations lead to the same results in long run. Once each word has been assigned a temporary topic, the algorithm then re-iterates through each word in each document to update the topic assignment using two criteria: 1) How prevalent is the word in question across topics? And 2) How prevalent are the topics in the document?

To quantify these two, the algorithm calculates the likelihood of the words appearing in each document assuming the assignment of words to topics and topics to documents. 

Of course words can appear in different topics and more than one topic can appear in a document. But the iterative algorithm seeks to maximize the self-consistency of the assignment by maximizing the likelihood of the observed word-document statistics. 

We can illustrate this process and its outcome by going back to our example. A topic modelling approach might use the process above to discover the following topics across our documents:

  • Document 1: I like to eat kippers for breakfast[100% Topic A]
  • Document 2: I love all animals, but kittens are the cutest. [100% Topic B]
  • Document 3: My kitten eats kippers too. [67% Topic A, 33% Topic B]

Topic modelling defines each topic as a so-called ‘bag of words’, but it is the researcher’s responsibility to decide upon an appropriate label for each topic based on their understanding of language and context. Going back to our example, the algorithm might classify the underlined words under Topic A, which we could then label as ‘food’ based on our understanding of what the words mean. Similarly the italicised words might be classified under a separate topic, Topic B, which we could label ‘animals’. In this simple example the word “eat” has appeared in a sentence dominated by Topic A, but also in a sentence with some association to Topic B. Therefore it can also be seen as a connector of the two topics. Of course animals eat too and they like food!

We are going to use a similar approach to first extract the main topics reflected on the reports to the Everyday Sexism Project website and extract the relation between the sexism-related topics and concepts based on the overlap between the bags of words of each topic. Finally we can also look into the co-appearance of topics in the same document.  This way we try to draw a linguistic picture of the more than 100,000 submitted reports.

As ever, be sure to check back for further updates on our progress!

]]>
Creating a semantic map of sexism worldwide: topic modelling of content from the “Everyday Sexism” project https://ensr.oii.ox.ac.uk/creating-a-semantic-map-of-sexism-topic-modelling-of-everyday-sexism-content/ Wed, 07 Oct 2015 10:56:05 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3430
The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

When barrister Charlotte Proudman recently spoke out regarding a sexist comment that she had received on the professional networking website LinkedIn, hundreds of women praised her actions in highlighting the issue of workplace sexism – and many of them began to tell similar stories of their own. It soon became apparent that Proudman was not alone in experiencing this kind of sexism, a fact further corroborated by Laura Bates of the Everyday Sexism Project, who asserted that workplace harassment is “the most reported kind of incident” on the project’s UK website.

Proudman’s experience and Bates’ comments on the number of submissions to her site concerning harassment at work provokes a conversation about the nature of sexism, not only in the UK but also at a global level. We know that since its launch in 2012, the Everyday Sexism Project has received over 100,000 submissions in more than 13 different languages, concerning a variety of topics. But what are these topics? As Bates has stated, in the UK, workplace sexism is the most commonly discussed subject on the website – but is this also the case for the Everyday Sexism sites in France, Japan, or Brazil? What are the most common types of sexism globally, and (how) do they relate to each other? Do experiences of sexism change from one country to another?

The multi-lingual reports submitted to the Everyday Sexism project are undoubtedly a gold mine of crowdsourced information with great potential for answering important questions about instances of sexism worldwide, as well as drawing an overall picture of how sexism is experienced in different societies. So far much of the research relating to the Everyday Sexism project has focused on qualitative content analysis, and has been limited to the submissions written in English. Along with Principal Investigators Taha Yasseri and Kathryn Eccles, I will be acting as Research Assistant on a new project funded by the John Fell Oxford University Press Research Fund, that hopes to expand the methods used to investigate Everyday Sexism submission data, by undertaking a large-scale computational study that will enrich existing qualitative work in this area.

Entitled “Semantic Mapping of Sexism: Topic Modelling of Everyday Sexism Content”, our project will take a Natural Language Processing approach, analysing the content of Everyday Sexism reports in different languages, and using topic-modelling techniques to extract the most commonly occurring sexism-related topics and concepts from the submissions. We will map the semantic relations between those topics within and across different languages, comparing and contrasting the ways in which sexism is experienced in everyday life in different cultures and geographies. Ultimately, we hope to create the first data-driven map of sexism on a global scale, forming a solid framework for future studies in growing fields such as online inequality, cyber bullying, and social well being.

We’re very excited about the project and will be charting our progress via the Policy and Internet Blog, so make sure to check back for further updates!

]]>