Elena Hensinger, University of Bristol
Ilias Flaounas, University of Bristol
Nello Cristianini, University of Bristol
Abstract
We analysed the choices of online readers of newspapers in order to model their preferences, by using automated methods operating on a very large scale. We were able to obtain models which are predictive of users' choices, and which we applied to explore the relationships between audience preferences and topics of news articles. We found that for 12 of 14 modelled audiences, the presence of “Public Affairs” content, such as “Politics”, reduced the appeal of an article.
The models, describing the appeal of a given article to each audience, are formed by linear functions of word frequencies, and are obtained by comparing articles that became “Most Popular” on a given day in a given outlet with articles that did not. We make use of 2,432,148 such article pairs, collected over a period of over 1.5 years.
Those models are shown to be predictive of user choices, and in the next step, they are used to compare both the audiences and the contents of various news outlets. First, we visualise the information contained in the models themselves – via word clouds. Next, we use a dataset of half a million articles from one year of time, and we compute for each article its appeal score for each modelled audience. Next, we determine an article's topic affiliation and compare it to its appeals. For an average audience, we find significantly less interest in “Public Affairs” topics, such as “Politics” and “Business”, than in “Non-Public Affairs” topics such as “Sport” or “Crime”.



