While prior work has generally found that data from Amazon’s Mechanical Turk (MTurk) is of reasonable quality, researchers are frequently encouraged to use its reputation system to exclude poorly performing workers. The purpose of this research is to investigate researchers’ beliefs about the reputation system and the information it contains about worker quality, and the extent to which these beliefs are supported by observable differences in quality. I collect data on 23,136 research-focused HITs and from 1976 MTurk workers in six studies designed to investigate the informative of the reputation system. The findings indicate that MTurk’s reputation system does differentiate worker quality, but only on the extreme end, and that the workers for which differences in quality are observed represent a small minority of the overall population. While there is evidence for quality sorting based on the reputation system, these differences emerge only at the extremity, beyond the thresholds used in almost all research. Further, these differences are relatively minor, and given the distribution of workers by reputation, they would be unlikely to be meaningful for most research.

Share this Project