research-article

Planning for Natural Language Failures with the AI Playbook

Authors:

Matthew K. Hong,

Derek DeBellis,

Saleema AmershiAuthors Info & Claims

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Article No.: 386, Pages 1 - 11

https://doi.org/10.1145/3411764.3445735

Published: 07 May 2021 Publication History

Abstract

Prototyping AI user experiences is challenging due in part to probabilistic AI models making it difficult to anticipate, test, and mitigate AI failures before deployment. In this work, we set out to support practitioners with early AI prototyping, with a focus on natural language (NL)-based technologies. Our interviews with 12 NL practitioners from a large technology company revealed that, in addition to challenges prototyping AI, prototyping was often not happening at all or focused only on idealized scenarios due to a lack of tools and tight timelines. These findings informed our design of the AI Playbook, an interactive and low-cost tool we developed to encourage proactive and systematic consideration of AI errors before deployment. Our evaluation of the AI Playbook demonstrates its potential to 1) encourage product teams to prioritize both ideal and failure scenarios, 2) standardize the articulation of AI failures from a user experience perspective, and 3) act as a boundary object between user experience designers, data scientists, and engineers.

References

[1]

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery, New York, New York, USA, 1–13. https://doi.org/10.1145/3290605.3300233

Digital Library

[2]

Dan Bohus and Alexander I Rudnicky. 2005. Sorry, I Didn’t Catch That! - An Investigation of Non-Understanding Errors and Recovery Strategies. In 6th SIGdial workshop on discourse and dialogue.

[3]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.

[4]

Bill Buxton. 2010. Sketching user experiences: getting the design right and the right design. Morgan kaufmann.

[5]

Herbert H Clark. 1996. Using language. Cambridge university press.

[6]

Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, and Andrés Monroy-Hernández. 2017. Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2017-May. Association for Computing Machinery, New York, NY, USA, 2382–2393. https://doi.org/10.1145/3025453.3025780

Digital Library

[7]

Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX design innovation: Challenges for working with machine learning as a design material. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2017-May. Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739

Digital Library

[8]

Rahhal Errattahi, Asmaa El Hannani, and Hassan Ouahmane. 2018. Automatic speech recognition errors detection and correction: A review. In Procedia Computer Science, Vol. 128. Elsevier B.V., 32–37. https://doi.org/10.1016/j.procs.2018.03.005

[9]

Jerry Alan Fails and Dan R. Olsen. 2003. A design tool for camera-based interaction. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery (ACM), New York, New York, USA, 449–456. https://doi.org/10.1145/642611.642690

Digital Library

[10]

Norman M. Fraser and G. Nigel Gilbert. 1991. Simulating Speech Systems. Computer Speech and Language 5, 1 (1 1991), 81–99. https://doi.org/10.1016/0885-2308(91)90019-M

[11]

Google PAIR. 2019. People + AI Guidebook. https://pair.withgoogle.com/guidebook/

[12]

John D Gould, John Conti, and Todd Hovanyecz. 1983. Composing Letters With a Simulated Listening Typewriter. Commun. ACM 26, 4 (4 1983), 295–308. https://doi.org/10.1145/2163.358100

Digital Library

[13]

Ryuichiro Higashinaka, Kotaro Funakoshi, Masahiro Araki, Hiroshi Tsukahara, Yuka Kobayashi, and Masahiro Mizukami. 2015. Towards taxonomy of errors in chat-oriented dialogue systems. In Proceedings of the 16th annual meeting of the special interest group on discourse and dialogue. 87–95.

[14]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3290605.3300830

Digital Library

[15]

Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Conference on Human Factors in Computing Systems - Proceedings. ACM Press, New York, New York, USA, 159–166. https://doi.org/10.1145/302979.303030

Digital Library

[16]

IBM. 2019. IBM Design for AI: Conversation planning. https://www.ibm.com/design/ai/conversation/planning/

[17]

J. F. Kelley. 1984. An Iterative Design Methodology for User-Friendly Natural Language Office Information Applications. ACM Transactions on Information Systems (TOIS) 2, 1 (1 1984), 26–41. https://doi.org/10.1145/357417.357420

Digital Library

[18]

Walter S. Lasecki, Juho Kim, Nicholas Rafter, Onkur Sen, Jeffrey P. Bigham, and Michael S. Bernstein. 2015. Apparition: Crowdsourced user interfaces that come to life as you sketch them. In Conference on Human Factors in Computing Systems - Proceedings, Vol. 2015-April. Association for Computing Machinery, 1925–1934. https://doi.org/10.1145/2702123.2702565

Digital Library

[19]

Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376445

Digital Library

[20]

Sean McGregor. 2020. Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. arXiv preprint arXiv:2011.08512(2020).

[21]

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Inc, New York, New York, USA, 220–229. https://doi.org/10.1145/3287560.3287596

Digital Library

[22]

Dan Moldovan, Marius Paşca, Sanda Harabagiu, and Mihai Surdeanu. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems 21, 2 (4 2003), 133–154. https://doi.org/10.1145/763693.763694

Digital Library

[23]

Tim Paek. 2003. Toward a taxonomy of communication errors. In ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems.

[24]

Tim Paek and Eric Horvitz. 2000. Conversation as Action under Uncertainty. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence(UAI’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 455–464.

Digital Library

[25]

Laurel Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. Journal of Human-Robot Interaction (8 2012), 119–136. https://doi.org/10.5898/jhri.1.1.riek

Digital Library

[26]

Stephan Schlögl, Gavin Doherty, and Saturnino Luz. 2015. Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools. Interacting with Computers 27, 6 (11 2015), 592–615. https://doi.org/10.1093/iwc/iwu016

[27]

Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without libraries, toolkits or training: A $1 recognizer for user interface prototypes. In UIST: Proceedings of the Annual ACM Symposium on User Interface Softaware and Technology. ACM Press, New York, New York, USA, 159–168. https://doi.org/10.1145/1294211.1294238

Digital Library

[28]

Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T Iqbal, and Jaime Teevan. 2019. Sketching Nlp: A Case Study of Exploring the Right Things to Design With Language Intelligence. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery, New York, New York, USA, 1–12. https://doi.org/10.1145/3290605.3300415

Digital Library

[29]

Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating how experienced UX designers effectively work with machine learning. In DIS 2018 - Proceedings of the 2018 Designing Interactive Systems Conference. Association for Computing Machinery, Inc, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730

Digital Library

[30]

Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301

Digital Library

[31]

Qian Yang, Jina Suh, Nan Chen Chen, and Gonzalo Ramos. 2018. Grounding interactive machine learning tool design in how non-experts actually build models. In DIS 2018 - Proceedings of the 2018 Designing Interactive Systems Conference. Association for Computing Machinery, Inc, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729

Digital Library

[32]

Tomáš Zemčík. 2020. Failure of chatbot Tay was evil, ugliness and uselessness in its nature or do we judge it through cognitive shortcuts and biases?AI and Society (2020). https://doi.org/10.1007/s00146-020-01053-4

Digital Library

Cited By

Madaio MChen JWallach HWortman Vaughan J(2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653705
Petridis STerry MCai C(2024)PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ WorkflowsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661613(743-756)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661613
Kim DShin HYadgarova SSon JSubramonyam HKim J(2024)AINeedsPlanner: A Workbook to Support Effective Collaboration Between AI Experts and ClientsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661577(728-742)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661577
Show More Cited By

Index Terms

Planning for Natural Language Failures with the AI Playbook

Index terms have been assigned to the content through auto-classification.

Recommendations

Questioning the AI: Informing Design Practices for Explainable AI User Experiences
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

A surge of interest in explainable AI (XAI) has led to a vast collection of algorithmic work on the topic. While many recognize the necessity to incorporate explainability features in AI systems, how to address real-world user needs for understanding AI ...
HINT: Integration Testing for AI-based features with Humans in the Loop
IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

The dynamic nature of AI technologies makes testing human-AI interaction and collaboration challenging – especially before such features are deployed in the wild. This presents a challenge for designers and AI practitioners as early feedback for ...
The effects of domain knowledge on trust in explainable AI and task performance: A case of peer-to-peer lending
Highlights
- We explored a human-centered approach to explainable AI by integrating expert domain knowledge.
Abstract
Increasingly, artificial intelligence (AI) is being used to assist complex decision-making such as financial investing. However, there are concerns regarding the black-box nature of AI algorithms. The field of explainable AI (XAI) has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

May 2021

10862 pages

ISBN:9781450380966

DOI:10.1145/3411764

General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bj�rn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA

Copyright � 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CHI '21

Sponsor:

SIGCHI

CHI '21: CHI Conference on Human Factors in Computing Systems

May 8 - 13, 2021

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
1,065
Total Downloads

Downloads (Last 12 months)201
Downloads (Last 6 weeks)20

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Madaio MChen JWallach HWortman Vaughan J(2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653705
Petridis STerry MCai C(2024)PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ WorkflowsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661613(743-756)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661613
Kim DShin HYadgarova SSon JSubramonyam HKim J(2024)AINeedsPlanner: A Workbook to Support Effective Collaboration Between AI Experts and ClientsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661577(728-742)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661577
Nishal SSinchai JDiakopoulos N(2024)Understanding Practices around Computational News Discovery Tools in the Domain of Science JournalismProceedings of the ACM on Human-Computer Interaction10.1145/36374198:CSCW1(1-36)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637419
Ehsan ULiao QPassi SRiedl MDaumé H(2024)Seamful XAI: Operationalizing Seamful Design in Explainable AIProceedings of the ACM on Human-Computer Interaction10.1145/36373968:CSCW1(1-29)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637396
Kim JKim H(2024)Unlocking Creator-AI Synergy: Challenges, Requirements, and Design Opportunities in AI-Powered Short-Form Video ProductionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642476(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642476
Berman GGoyal NMadaio M(2024)A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness EvaluationsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642398(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642398
Wang ZKulkarni CWilcox LTerry MMadaio M(2024)Farsight: Fostering Responsible AI Awareness During AI Application PrototypingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642335(1-40)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642335
Nely Rahmawati Zaimah Eko Budi Hartanto Fatchiatu Zahro (2023)Acceptability and Effectiveness Analysis of Large Language Model-Based Artificial Intelligence Chatbot Among Arabic LearnersMantiqu Tayr: Journal of Arabic Language10.25217/mantiqutayr.v4i1.39514:1(1-20)Online publication date: 5-Dec-2023
https://doi.org/10.25217/mantiqutayr.v4i1.3951
Feffer MMartelaro NHeidari H(2023)The AI Incident Database as an Educational Tool to Raise Awareness of AI Harms: A Classroom Exploration of Efficacy, Limitations, & Future ImprovementsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623223(1-11)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623223
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents