Alex Rosenblat, Karen Levy, Solon Barocas, Tim Hwang: Discriminating Tastes: Customer Ratings as Vehicles for Bias | The Internet, Policy & Politics Conferences

Alex Rosenblat, Data & Society

Karen Levy, Solon Barocas, and Tim Hwang

On-demand companies like Uber, Lyft, Handy, Airbnb, etc. have scaled rapidly in part by automating the management of large, disaggregated workforces. Many of these companies prompt consumers to evaluate their experiences with workers through a rating system. This paper will use the Uber system as a case study to explore how discriminatory consumer biases may be folded into worker evaluations through rating systems, and we suggest potential avenues for discoveries and remedies.

In the Uber system, passengers are prompted to rate drivers on a 1 to 5 star scale, and drivers must maintain an overall rating that hovers around 4.6 out of 5 stars. Ratings function as Uber’s primary metric for maintaining quality control in its drivers. Uber’s policy is that drivers who fall below the performance target in their market and for their tier of service (such as uberX or uberSUV) risk deactivation (temporary suspension or permanent firing) from the system. Their overall rating generally reflects an average of their last 500 rated trips. In this model, consumers are empowered to act, in part, as middle-managers of workers, both through the design of the app and in the evaluation functions they perform (Rosenblat & Stark, 2015; Stark & Levy, 2015). Arguably, since each singular rated trip constitutes a small percentage of a driver’s evaluation ‘and a percentage that decreases in relation to the increase in trips a driver completes, such that the most active drivers are the least impacted by a singular bad rating ‘one biased passenger does not have a significant impact on drivers’evaluations. It would be easy to dismiss these issues as unimportant because they are aberrations, but *specific* drivers might be subject to systematic bias (Barocas & Selbst, 2015).

The dynamics of implicit and explicit bias has been addressed in a host of social science research demonstrating evidence of racial bias in performance evaluations by supervisors who render more negative scrutiny in evaluations of workers with protected-class characteristics (Stauffer & Buckley, 2005). Additionally, research in online marketplaces indicates we should expect bias to creep into consumer-driven contexts (Nunley, Owens, & Howard, 2011; Doleac & Stein, 2010; Dallarocas, 2015. The rating system thus potentially enables systemic discrimination against minorities and women from consumers. In Uber’s case, the biases held by passengers may be funneled through the ratings model feedback mechanism (Brishen, 2015) and they could have a disproportionate adverse impact on drivers who, for example, are women or people of color. While there isn’t sufficient data to demonstrate that riders are likely to be less generous with or more critical of drivers who happen to be members of a protected class, there is no way to evaluate whether these concerns have any merit by a third-party ‘and that is itself a problem. In the 1970s-80s, courts in the U.S. roundly rejected arguments from companies that their discriminatory employment processes reflected the preferences of their consumers, not the companies (e.g. Diaz v. Pan Am. World Airways, Inc., 1971).

Through the rating system, consumers can directly assert their preferences and their biases in ways that companies are prohibited from doing on their behalf. The fact that customers may be racist, for example, does not license a company to consciously or even implicitly consider race in its hiring decisions. The problem here is that Uber can cater to racists, for example, without ever having to consider race, and so never engage in behavior that amounts to disparate treatment. In effect, companies may be able to perpetuate bias without being liable for it. This paper will examine questions of discrimination through the lens of disparate impact doctrine.

Potential broader implications:

Public discourse surrounding the impact of growing sophistication in automation tends focus on the issue of displacement: i.e. the types of job that we anticipate will be replaced by machines in the near future. Less discussed is the impact of a nearer-term/present-day scenario: hybrid organizations that blend automation and human work, particularly in the design of platforms that automate management and coordination of workers. Uber is a template for this type of system: a semi-autonomous system which relies on collected user ratings as signals to replace the work of a middle manager in deciding whether to hire or fire human drivers. To that end, the discrimination issues raised in this provocation piece are broader than just Uber. They are potentially latent in the proliferation of automated systems that employ an ad hoc, distributed labor force regulated largely by consumer feedback. The application of Title VII to these situations is significant in part because it spurs the discussion of what less discriminatory alternatives could and should be in these business models. Each intervention will distribute costs and benefits across all players in the system: riders, drivers, and the platform itself. As Uber-like models continue to multiply, employment discrimination might become hotly contested political ground, which joins the current employee v. contractor debate. From a policy and law perspective - issues like these may in fact be the most immediate need when considering interventions, which address the impact of automation on work and jobs, rather than the raw question of displacement.

Documents:

Discriminating Tastes: Customer Ratings as Vehicles for Bias

Authors:

Alex Rosenblat, Karen Levy, Solon Barocas, Tim Hwang