skip to main content
10.1145/3411764.3445735acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Planning for Natural Language Failures with the AI Playbook

Published: 07 May 2021 Publication History

Abstract

Prototyping AI user experiences is challenging due in part to probabilistic AI models making it difficult to anticipate, test, and mitigate AI failures before deployment. In this work, we set out to support practitioners with early AI prototyping, with a focus on natural language (NL)-based technologies. Our interviews with 12 NL practitioners from a large technology company revealed that, in addition to challenges prototyping AI, prototyping was often not happening at all or focused only on idealized scenarios due to a lack of tools and tight timelines. These findings informed our design of the AI Playbook, an interactive and low-cost tool we developed to encourage proactive and systematic consideration of AI errors before deployment. Our evaluation of the AI Playbook demonstrates its potential to 1) encourage product teams to prioritize both ideal and failure scenarios, 2) standardize the articulation of AI failures from a user experience perspective, and 3) act as a boundary object between user experience designers, data scientists, and engineers.

References

[1]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery, New York, New York, USA, 1–13. https://doi.org/10.1145/3290605.3300233
[2]
Dan Bohus and Alexander I Rudnicky. 2005. Sorry, I Didn’t Catch That! - An Investigation of Non-Understanding Errors and Recovery Strategies. In 6th SIGdial workshop on discourse and dialogue.
[3]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
[4]
Bill Buxton. 2010. Sketching user experiences: getting the design right and the right design. Morgan kaufmann.
[5]
Herbert H Clark. 1996. Using language. Cambridge university press.
[6]
Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, and Andrés Monroy-Hernández. 2017. Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2017-May. Association for Computing Machinery, New York, NY, USA, 2382–2393. https://doi.org/10.1145/3025453.3025780
[7]
Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX design innovation: Challenges for working with machine learning as a design material. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2017-May. Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739
[8]
Rahhal Errattahi, Asmaa El Hannani, and Hassan Ouahmane. 2018. Automatic speech recognition errors detection and correction: A review. In Procedia Computer Science, Vol. 128. Elsevier B.V., 32–37. https://doi.org/10.1016/j.procs.2018.03.005
[9]
Jerry Alan Fails and Dan R. Olsen. 2003. A design tool for camera-based interaction. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery (ACM), New York, New York, USA, 449–456. https://doi.org/10.1145/642611.642690
[10]
Norman M. Fraser and G. Nigel Gilbert. 1991. Simulating Speech Systems. Computer Speech and Language 5, 1 (1 1991), 81–99. https://doi.org/10.1016/0885-2308(91)90019-M
[11]
Google PAIR. 2019. People + AI Guidebook. https://pair.withgoogle.com/guidebook/
[12]
John D Gould, John Conti, and Todd Hovanyecz. 1983. Composing Letters With a Simulated Listening Typewriter. Commun. ACM 26, 4 (4 1983), 295–308. https://doi.org/10.1145/2163.358100
[13]
Ryuichiro Higashinaka, Kotaro Funakoshi, Masahiro Araki, Hiroshi Tsukahara, Yuka Kobayashi, and Masahiro Mizukami. 2015. Towards taxonomy of errors in chat-oriented dialogue systems. In Proceedings of the 16th annual meeting of the special interest group on discourse and dialogue. 87–95.
[14]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3290605.3300830
[15]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Conference on Human Factors in Computing Systems - Proceedings. ACM Press, New York, New York, USA, 159–166. https://doi.org/10.1145/302979.303030
[16]
IBM. 2019. IBM Design for AI: Conversation planning. https://www.ibm.com/design/ai/conversation/planning/
[17]
J. F. Kelley. 1984. An Iterative Design Methodology for User-Friendly Natural Language Office Information Applications. ACM Transactions on Information Systems (TOIS) 2, 1 (1 1984), 26–41. https://doi.org/10.1145/357417.357420
[18]
Walter S. Lasecki, Juho Kim, Nicholas Rafter, Onkur Sen, Jeffrey P. Bigham, and Michael S. Bernstein. 2015. Apparition: Crowdsourced user interfaces that come to life as you sketch them. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2015-April. Association for Computing Machinery, 1925–1934. https://doi.org/10.1145/2702123.2702565
[19]
Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376445
[20]
Sean McGregor. 2020. Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. arXiv preprint arXiv:2011.08512(2020).
[21]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Inc, New York, New York, USA, 220–229. https://doi.org/10.1145/3287560.3287596
[22]
Dan Moldovan, Marius Paşca, Sanda Harabagiu, and Mihai Surdeanu. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems 21, 2 (4 2003), 133–154. https://doi.org/10.1145/763693.763694
[23]
Tim Paek. 2003. Toward a taxonomy of communication errors. In ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems.
[24]
Tim Paek and Eric Horvitz. 2000. Conversation as Action under Uncertainty. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence(UAI’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 455–464.
[25]
Laurel Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. Journal of Human-Robot Interaction (8 2012), 119–136. https://doi.org/10.5898/jhri.1.1.riek
[26]
Stephan Schlögl, Gavin Doherty, and Saturnino Luz. 2015. Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools. Interacting with Computers 27, 6 (11 2015), 592–615. https://doi.org/10.1093/iwc/iwu016
[27]
Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without libraries, toolkits or training: A $1 recognizer for user interface prototypes. In UIST: Proceedings of the Annual ACM Symposium on User Interface Softaware and Technology. ACM Press, New York, New York, USA, 159–168. https://doi.org/10.1145/1294211.1294238
[28]
Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T Iqbal, and Jaime Teevan. 2019. Sketching Nlp: A Case Study of Exploring the Right Things to Design With Language Intelligence. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery, New York, New York, USA, 1–12. https://doi.org/10.1145/3290605.3300415
[29]
Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating how experienced UX designers effectively work with machine learning. In DIS 2018 - Proceedings of the 2018 Designing Interactive Systems Conference. Association for Computing Machinery, Inc, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730
[30]
Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
[31]
Qian Yang, Jina Suh, Nan Chen Chen, and Gonzalo Ramos. 2018. Grounding interactive machine learning tool design in how non-experts actually build models. In DIS 2018 - Proceedings of the 2018 Designing Interactive Systems Conference. Association for Computing Machinery, Inc, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729
[32]
Tomáš Zemčík. 2020. Failure of chatbot Tay was evil, ugliness and uselessness in its nature or do we judge it through cognitive shortcuts and biases?AI and Society (2020). https://doi.org/10.1007/s00146-020-01053-4

Cited By

View all
  • (2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
  • (2024)PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ WorkflowsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661613(743-756)Online publication date: 1-Jul-2024
  • (2024)AINeedsPlanner: A Workbook to Support Effective Collaboration Between AI Experts and ClientsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661577(728-742)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
10862 pages
ISBN:9781450380966
DOI:10.1145/3411764
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI failures
  2. Human-AI interaction
  3. natural language technologies
  4. prototyping

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)20
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
  • (2024)PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ WorkflowsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661613(743-756)Online publication date: 1-Jul-2024
  • (2024)AINeedsPlanner: A Workbook to Support Effective Collaboration Between AI Experts and ClientsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661577(728-742)Online publication date: 1-Jul-2024
  • (2024)Understanding Practices around Computational News Discovery Tools in the Domain of Science JournalismProceedings of the ACM on Human-Computer Interaction10.1145/36374198:CSCW1(1-36)Online publication date: 26-Apr-2024
  • (2024)Seamful XAI: Operationalizing Seamful Design in Explainable AIProceedings of the ACM on Human-Computer Interaction10.1145/36373968:CSCW1(1-29)Online publication date: 26-Apr-2024
  • (2024)Unlocking Creator-AI Synergy: Challenges, Requirements, and Design Opportunities in AI-Powered Short-Form Video ProductionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642476(1-23)Online publication date: 11-May-2024
  • (2024)A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness EvaluationsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642398(1-24)Online publication date: 11-May-2024
  • (2024)Farsight: Fostering Responsible AI Awareness During AI Application PrototypingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642335(1-40)Online publication date: 11-May-2024
  • (2023)Acceptability and Effectiveness Analysis of Large Language Model-Based Artificial Intelligence Chatbot Among Arabic LearnersMantiqu Tayr: Journal of Arabic Language10.25217/mantiqutayr.v4i1.39514:1(1-20)Online publication date: 5-Dec-2023
  • (2023)The AI Incident Database as an Educational Tool to Raise Awareness of AI Harms: A Classroom Exploration of Efficacy, Limitations, & Future ImprovementsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623223(1-11)Online publication date: 30-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media