skip to main content
10.1145/3391403.3399455acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
abstract
Public Access

Designing Informative Rating Systems: Evidence from an Online Labor Market

Published: 13 July 2020 Publication History

Abstract

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, and therefore not very informative. In this paper, we investigate whether the platform can obtain less inflated ratings by altering the meaning and relative importance of the levels in the rating system. We then seek a principled approach to make these choices in the design of the rating system.
First, we analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices; in particular, the treatment conditions include several positive-skewed verbal rating scales with descriptive phrases or adjectives providing specific interpretation for each rating level. The online labor market test reveals that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, the positive-skewed verbal rating scales yield rating distributions that significantly reduce rating inflation and are much more informative about seller quality.
Second, we develop a model-based framework to compare and select among rating system designs. We apply this framework to the data obtained from the online labor market test, demonstrating that our model-based framework for scale design and optimization can identify the most informative rating system and substantially improve the quality of information obtained over baseline designs.
Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

References

[1]
Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. 2017. Fast and Slow Learning from Reviews. Working Paper 24046. National Bureau of Economic Research. https://doi.org/10.3386/w24046
[2]
Gediminas Adomavicius, Jesse Bockstedt, Shawn Curley, and Jingjng Zhang. 2019. Reducing Recommender Systems Biases: An Investigation of Rating Display Designs. Forthcoming, MIS Quarterly (2019), 19--18.
[3]
Leman Akoglu, Rishi Chandy, and Christos Faloutsos. 2013. Opinion fraud detection in online reviews by network effects. In Seventh international AAAI conference on weblogs and social media .
[4]
Anonymous. 2019. Designing Optimal Binary Rating Systems.
[5]
Christina Aperjis and Ramesh Johari. 2010. Optimal windows for aggregating ratings in electronic marketplaces. Management Science, Vol. 56, 5 (2010), 864--880.
[6]
David Blum. 2017. Nine potential solutions to abate grade inflation at regionally accredited online US universities: An intrinsic case study. The Qualitative Report, Vol. 22, 9 (2017), 2288--2311.
[7]
Gary Bolton, Ben Greiner, and Axel Ockenfels. 2013. Engineering trust: reciprocity in the production of reputation information. Management science, Vol. 59, 2 (2013), 265--285.
[8]
Luis Cabral and Ali Hortacsu. 2010. The dynamics of seller reputation: Evidence from eBay. The Journal of Industrial Economics, Vol. 58, 1 (2010), 54--78.
[9]
Luis Cabral and Lingfang Li. 2015. A dollar for your thoughts: Feedback-conditional rebates on eBay. Management Science, Vol. 61, 9 (2015), 2052--2063.
[10]
Yeon-Koo Che and Johannes Horner. 2015. Optimal design for social learning. (2015).
[11]
Chuansheng Chen, Shin-ying Lee, and Harold W Stevenson. 1995. Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, Vol. 6, 3 (1995), 170--175.
[12]
Pei-Yu Chen, Yili Hong, and Ying Liu. 2017. The value of multidimensional rating systems: Evidence from a natural experiment and randomized experiments. Management Science, Vol. 64, 10 (2017), 4629--4647.
[13]
James Cook. 2015. Uber's internal charts show how its driver-rating system actually works. http://www.businessinsider.com/leaked-charts-show-how-ubers-driver-rating-system-works-2015--2
[14]
Amir Dembo and Ofer Zeitouni. 2010. Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability, Vol. 38. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978--3--642-03311--7
[15]
Educational Testing Service. 2005. TOEFL iBT Writing Sample Responses. http://toefl.uobabylon.edu.iq/papers/ibt_2014_12148630.pdf
[16]
Apostolos Filippas, John J Horton, and Joseph M Golden. 2019. Reputation Inflation. Working Paper 25857. National Bureau of Economic Research. https://doi.org/10.3386/w25857
[17]
Andrey Fradkin, Elena Grewal, and David Holtz. 2018. The determinants of online review informativeness: Evidence from field experiments on AirBnb. (2018).
[18]
Snehalkumar (Neil) S. Gaikwad, Mark Whiting, Karolina Ziulkoski, Alipta Ballav, et al. 2016. Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms. ACM Press, 625--637.
[19]
Scott Gines. 2017. Tastes for True Talent: How Professional Baseball Scouts Define Talent and Decide Who Gets to Play. (2017).
[20]
Peter Glynn and Sandeep Juneja. 2004. A large deviations perspective on ordinal optimization. In Simulation Conference, 2004. Proceedings of the 2004 Winter, Vol. 1. IEEE.
[21]
Takeshi Hamamura, Steven J. Heine, and Delroy L. Paulhus. 2008. Cultural differences in response styles: The role of dialectical thinking. Personality and Individual Differences, Vol. 44, 4 (March 2008), 932--942. https://doi.org/10.1016/j.paid.2007.10.034
[22]
Fred Hicks, Lee Valentine, John Morrow, and Ian McDonald. 2000. Choosing Natural Adjective Ladders. http://www.mcdonald.me.uk/storytelling/lichert_article.htm
[23]
Bryan Hooi, Neil Shah, Alex Beutel, Stephan G�nnemann, Leman Akoglu, Mohit Kumar, Disha Makhija, and Christos Faloutsos. 2016. Birdnest: Bayesian inference for ratings-fraud detection. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 495--503.
[24]
Nan Hu, Ling Liu, and Vallabh Sambamurthy. 2011. Fraud detection in online consumer reviews. Decision Support Systems, Vol. 50, 3 (2011), 614--626.
[25]
Nan Hu, Jie Zhang, and Paul A Pavlou. 2009. Overcoming the J-shaped distribution of product reviews. Commun. ACM, Vol. 52, 10 (2009), 144--147.
[26]
Bar Ifrach, Costis Maglaras, Marco Scarsini, and Anna Zseleva. 2017. Bayesian Social Learning from Consumer Reviews. SSRN Scholarly Paper ID 2293158. Social Science Research Network, Rochester, NY.
[27]
Nicole Immorlica, Brendan Lucier, Brian Rogers, et al. 2010. Emergence of cooperation in anonymous social networks through social capital. In Proceedings of the 11th ACM Conference on Electronic Commerce .
[28]
Ramesh Johari, Vijay Kamble, and Yash Kanoria. 2016. Matching While Learning. arXiv preprint arXiv:1603.04549 (2016).
[29]
Valen E Johnson. 2006. Grade inflation: A crisis in college education .Springer Science & Business Media.
[30]
Alan J Klockars and Midori Yamagishi. 1988. The Influence of Labels and Positions in Rating Scales. Journal of Educational Measurement, Vol. 25, 2 (1988), 85--96.
[31]
Noi Sian Koh, Nan Hu, and Eric K Clemons. 2010. Do online reviews reflect a product's true perceived quality? An investigation of online movie reviews across cultures. Electronic Commerce Research and Applications, Vol. 9, 5 (2010), 374--385.
[32]
Marios Kokkodis. 2019. Reputation Deflation Through Dynamic Expertise Assessment in Online Labor Markets. In The World Wide Web Conference. ACM, 896--905.
[33]
Marios Kokkodis and Panagiotis G Ipeirotis. 2015. Reputation transferability in online labor markets. Management Science, Vol. 62, 6 (2015), 1687--1706.
[34]
Jon A Krosnick. 1999. Survey research. Annual review of psychology, Vol. 50, 1 (1999), 537--567.
[35]
Laura W Lackey and W JACK Lackey. 2006. Grade inflation: potential causes and solutions. International Journal of Engineering Education, Vol. 22, 1 (2006), 130.
[36]
Richard N Landers and Tara S Behrend. 2015. An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, Vol. 8, 2 (2015), 142--164.
[37]
Lingfang Li and Erte Xiao. 2014. Money Talks: Rebate Mechanisms in Reputation System Design. Management Science, Vol. 60, 8 (2014), 2054--2072.
[38]
Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, Vol. 62, 12 (2016), 3412--3427.
[39]
Nina Mazar, On Amir, and Dan Ariely. 2008. The Dishonesty of Honest People: A Theory of Self-Concept Maintenance. Journal of marketing research, Vol. 45, 6 (2008), 633--644.
[40]
Chris Nosko and Steven Tadelis. 2015. The Limits of Reputation in Platform Markets: An Empirical Analysis and Field Experiment. Working Paper. National Bureau of Economic Research.
[41]
Yiangos Papanastasiou, Kostas Bimpikis, and Nicos Savva. 2017. Crowdsourcing exploration. Management Science, Vol. 64, 4 (2017), 1727--1746.
[42]
A. Parasuraman, Dhruv Grewal, and R. Krishnan. 2006. Marketing Research. Cengage Learning.
[43]
Ben Reiter. 2018. Astroball: The New Way to Win It All. Crown Archetype.
[44]
Adnan Shaout and Mohamed K Yousif. 2014. Performance evaluation--Methods and techniques survey. International Journal of Computer and Information Technology, Vol. 3, 5 (2014), 966--979.
[45]
Steven Tadelis. 2016. Reputation and feedback systems in online platform markets. Annual Review of Economics, Vol. 8 (2016), 321--340.
[46]
Dimitrios Tsekouras. 2017. The effect of rating scale design on extreme response tendency in consumer product ratings. International Journal of Electronic Commerce, Vol. 21, 2 (2017), 270--296.
[47]
Hao-Chuan Wang, Tau-Heng Yeo, Syavash Nobarany, and Gary Hsieh. 2015. Problem with Cross-Cultural Comparison of User-Generated Ratings on Mechanical Turk. In Proceedings of the Third International Symposium of Chinese CHI. ACM, 9--12.
[48]
Georgios Zervas, Davide Proserpio, and John Byers. 2015. A First Look at Online Reputation on Airbnb, Where Every Stay is Above Average. Technical Report ID 2554500. Social Science Research Network.
[49]
Yu Zhang, Jing Bian, and Weixiang Zhu. 2013. Trust fraud: A crucial challenge for China's e-commerce market. Electronic Commerce Research and Applications, Vol. 12, 5 (2013), 299--308.

Cited By

View all
  • (2024)Picky Eaters Make for Better RatersCornell Hospitality Quarterly10.1177/19389655241226557Online publication date: 7-Feb-2024
  • (2024)Meeting Effectiveness and Inclusiveness: Large-scale Measurement, Identification of Key Features, and Prediction in Real-world Remote MeetingsProceedings of the ACM on Human-Computer Interaction10.1145/36373708:CSCW1(1-39)Online publication date: 26-Apr-2024
  • (2021)The Invisible Cage: Workers’ Reactivity to Opaque Algorithmic EvaluationsAdministrative Science Quarterly10.1177/0001839221101011866:4(945-988)Online publication date: 21-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EC '20: Proceedings of the 21st ACM Conference on Economics and Computation
July 2020
937 pages
ISBN:9781450379755
DOI:10.1145/3391403
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2020

Check for updates

Author Tags

  1. empirical research
  2. experiments
  3. labor markets
  4. market design
  5. rating systems
  6. stochastic methods

Qualifiers

  • Abstract

Funding Sources

  • National Science Foundation Graduate Research Fellowship
  • National Science Foundation

Conference

EC '20
Sponsor:
EC '20: The 21st ACM Conference on Economics and Computation
July 13 - 17, 2020
Virtual Event, Hungary

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Picky Eaters Make for Better RatersCornell Hospitality Quarterly10.1177/19389655241226557Online publication date: 7-Feb-2024
  • (2024)Meeting Effectiveness and Inclusiveness: Large-scale Measurement, Identification of Key Features, and Prediction in Real-world Remote MeetingsProceedings of the ACM on Human-Computer Interaction10.1145/36373708:CSCW1(1-39)Online publication date: 26-Apr-2024
  • (2021)The Invisible Cage: Workers’ Reactivity to Opaque Algorithmic EvaluationsAdministrative Science Quarterly10.1177/0001839221101011866:4(945-988)Online publication date: 21-Apr-2021
  • (2021)When Does Dispute Resolution Substitute for a Reputation System? Empirical Evidence from a Service Procurement PlatformProduction and Operations Management10.1111/poms.1334130:6(1565-1582)Online publication date: 1-Jun-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media