skip to main content
research-article
Open access

X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model

Published: 28 March 2023 Publication History

Abstract

End-to-end deep learning models are increasingly applied to safety-critical human activity recognition (HAR) applications, e.g., healthcare monitoring and smart home control, to reduce developer burden and increase the performance and robustness of prediction models. However, integrating HAR models in safety-critical applications requires trust, and recent approaches have aimed to balance the performance of deep learning models with explainable decision-making for complex activity recognition. Prior works have exploited the compositionality of complex HAR (i.e., higher-level activities composed of lower-level activities) to form models with symbolic interfaces, such as concept-bottleneck architectures, that facilitate inherently interpretable models. However, feature engineering for symbolic concepts-as well as the relationship between the concepts-requires precise annotation of lower-level activities by domain experts, usually with fixed time windows, all of which induce a heavy and error-prone workload on the domain expert. In this paper, we introduce X-CHAR, an eXplainable Complex Human Activity Recognition model that doesn't require precise annotation of low-level activities, offers explanations in the form of human-understandable, high-level concepts, while maintaining the robust performance of end-to-end deep learning models for time series data. X-CHAR learns to model complex activity recognition in the form of a sequence of concepts. For each classification, X-CHAR outputs a sequence of concepts and a counterfactual example as the explanation. We show that the sequence information of the concepts can be modeled using Connectionist Temporal Classification (CTC) loss without having accurate start and end times of low-level annotations in the training dataset-significantly reducing developer burden. We evaluate our model on several complex activity datasets and demonstrate that our model offers explanations without compromising the prediction accuracy in comparison to baseline models. Finally, we conducted a mechanical Turk study to show that the explanations provided by our model are more understandable than the explanations from existing methods for complex activity recognition.

Supplemental Material

ZIP File - jeyakumar
Supplemental movie, appendix, image and software files for, X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model

References

[1]
2016. Defense Advanced Research Projects Agency. Broad Agency Announcement, Explainable Artificial Intelligence (XAI). https://www.darpa.mil/attachments/DARPA-BAA-16-53.pdf. Online; accessed 14-November-2022.
[2]
2018. Article 15 EU GDPR "Right of access by the data subject". https://www.privacy-regulation.eu/en/article-15-right-of-access-by-the-data-subject-GDPR.htm. Online; accessed 04-March-2022.
[3]
2018. Recital 71 EU GDPR. https://www.privacy-regulation.eu/en/r71.htm. Online; accessed 04-March-2022.
[4]
Sayma Akther, Nazir Saleheen, Shahin Alan Samiei, Vivek Shetty, Emre Ertin, and Santosh Kumar. 2019. mORAL: An mHealth Model for Inferring Oral Hygiene Behaviors in-the-wild Using Wrist-worn Inertial Sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (2019), 1.
[5]
Luca Arrotta, Gabriele Civitarese, and Claudio Bettini. 2022. DeXAR: Deep Explainable Sensor-Based Activity Recognition in Smart-Home Environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1--30.
[6]
Emre Ates, Burak Aksar, Vitus J Leung, and Ayse K Coskun. 2021. Counterfactual Explanations for Multivariate Time Series. In 2021 International Conference on Applied Artificial Intelligence (ICAPAI). IEEE, 1--8.
[7]
Dieter Brughmans and David Martens. 2021. Nice: an algorithm for nearest instance counterfactual explanations. arXiv preprint arXiv:2104.07411 (2021).
[8]
Carlos Carvalho, Ricardo Masini, and Marcelo C Medeiros. 2018. Arco: an artificial counterfactual approach for high-dimensional panel time-series data. Journal of econometrics 207, 2 (2018), 352--380.
[9]
Chun-Hao Chang, Elliot Creager, Anna Goldenberg, and David Duvenaud. 2018. Explaining image classifiers by counterfactual generation. arXiv preprint arXiv:1807.08024 (2018).
[10]
Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar-Lezama, Yisong Yue, et al. 2021. Neurosymbolic Programming. Foundations and Trends� in Programming Languages 7, 3 (2021), 158--243.
[11]
Ling Chen, Xiaoze Liu, Liangying Peng, and Menghan Wu. 2021. Deep learning based multimodal complex human activity recognition using wearable devices. Applied Intelligence 51, 6 (2021), 4029--4042.
[12]
Yuwen Chen, Kunhua Zhong, Ju Zhang, Qilong Sun, Xueliang Zhao, et al. 2016. LSTM networks for mobile human activity recognition. In Proceedings of the 2016 International Conference on Artificial Intelligence: Technologies and Applications, Bangkok, Thailand. 24--25.
[13]
Heeryon Cho and Sang Min Yoon. 2018. Divide and conquer-based 1D CNN human activity recognition using test data sharpening. Sensors 18, 4 (2018), 1055.
[14]
Brian Clarkson and Alex Pentland. 1999. Unsupervised clustering of ambulatory audio and video. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), Vol. 6. IEEE, 3037--3040.
[15]
Fred J Damerau. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM 7, 3 (1964), 171--176.
[16]
Devleena Das, Yasutaka Nishimura, Rajan P Vivek, Naoto Takeda, Sean T Fish, Thomas Ploetz, and Sonia Chernova. 2021. Explainable Activity Recognition for Smart Home Systems. arXiv preprint arXiv:2105.09787 (2021).
[17]
Sanjoy Dasgupta, Nave Frost, and Michal Moshkovitz. 2022. Framework for Evaluating Faithfulness of Local Explanations. arXiv preprint arXiv:2202.00734 (2022).
[18]
Jeffrey De Fauw, Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O'Donoghue, Daniel Visentin, et al. 2018. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine 24, 9 (2018), 1342--1350.
[19]
Thomas J DiCiccio and Bradley Efron. 1996. Bootstrap confidence intervals. Statistical science 11, 3 (1996), 189--228.
[20]
Yegang Du, Yuto Lim, and Yasuo Tan. 2019. A novel human activity recognition and prediction in smart home based on interaction. Sensors 19, 20 (2019), 4474.
[21]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4438--4446.
[22]
Alex Graves. 2012. Connectionist temporal classification. In Supervised sequence labelling with recurrent neural networks. Springer, 61--93.
[23]
Rafay Hammid, Siddhartha Maddi, Amos Johnson, Aaron Bobick, Irfan Essa, and Charles Lee Isbell. 2012. Unsupervised activity discovery and characterization from event-streams. arXiv preprint arXiv:1207.1381 (2012).
[24]
T�m Huynh, Mario Fritz, and Bernt Schiele. 2008. Discovery of activity patterns using topic models. In Proceedings of the 10th international conference on Ubiquitous computing. 10--19.
[25]
Sozo Inoue, Paula Lago, Tahera Hossain, Tittaya Mairittha, and Nattaya Mairittha. 2019. Integrating activity recognition and nursing care records: The system, deployment, and a verification study. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--24.
[26]
Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? arXiv preprint arXiv:2004.03685 (2020).
[27]
Jeya Vikranth Jeyakumar, Liangzhen Lai, Naveen Suda, and Mani Srivastava. 2019. SenseHAR: a robust virtual activity sensor for smartphones and wearables. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems. 15--28.
[28]
Jeya Vikranth Jeyakumar, Eun Sun Lee, Zhengxu Xia, Sandeep Singh Sandha, Nathan Tausik, and Mani Srivastava. 2018. Deep convolutional bidirectional LSTM based transportation mode recognition. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers. 1606--1615.
[29]
Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng, Luis Garcia, and Mani Srivastava. 2020. How can i explain this to you? an empirical study of deep neural network explanation methods. Advances in Neural Information Processing Systems 33 (2020), 4211--4222.
[30]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning. PMLR, 1885--1894.
[31]
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept bottleneck models. In International Conference on Machine Learning. PMLR, 5338--5348.
[32]
Hyeokhyen Kwon, Gregory D Abowd, and Thomas Pl�tz. 2019. Handling annotation uncertainty in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers. 109--117.
[33]
Paula Lago, Sayeda Shamma Alia, Shingo Takeda, Tittaya Mairittha, Nattaya Mairittha, Farina Faiz, Yusuke Nishimura, Kohei Adachi, Tsuyoshi Okita, Fran�ois Charpillet, et al. 2019. Nurse care activity recognition challenge: summary and results. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers. 746--751.
[34]
Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[35]
Xie Lingyun and Du Limin. 2004. Efficient Viterbi beam search algorithm using dynamic pruning. In Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP'04. 2004., Vol. 1. IEEE, 699--702.
[36]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
[37]
Fen Miao, Yi He, Jinlei Liu, Ye Li, and Idowu Ayoola. 2015. Identifying typical physical activity on smartphone with varying positions and orientations. Biomedical engineering online 14, 1 (2015), 32.
[38]
Marc Moreaux, Michael Garcia Ortiz, Isabelle Ferran�, and Fr�d�ric Lerasle. 2019. Benchmark for kitchen20, a daily life dataset for audio-based human action recognition. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 1--6.
[39]
Nuria Oliver, Eric Horvitz, and Ashutosh Garg. 2002. Layered representations for human activity recognition. In Proceedings. Fourth IEEE International Conference on Multimodal Interfaces. IEEE, 3--8.
[40]
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1717--1724.
[41]
Francisco Javier Ord��ez and Daniel Roggen. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016), 115.
[42]
Venet Osmani, Sasitharan Balasubramaniam, and Dmitri Botvich. 2008. Human activity recognition in pervasive health-care: Supporting efficient remote collaboration. Journal of network and computer applications 31, 4 (2008), 628--655.
[43]
Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. 2020. Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of The Web Conference 2020. 3126- 3132.
[44]
Liangying Peng, Ling Chen, Zhenan Ye, and Yi Zhang. 2018. Aroma: A deep multi-task learning based simple and complex human activity recognition method using wearable sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1--16.
[45]
Karol J Piczak. 2015. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia. 1015--1018.
[46]
Zhen Qin, Yibo Zhang, Shuyu Meng, Zhiguang Qin, and Kim-Kwang Raymond Choo. 2020. Imaging and fusing time series for wearable sensor-based human activity recognition. Information Fusion 53 (2020), 80--87.
[47]
Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D Lane, Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. 2018. Multimodal deep learning for activity and context recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 157.
[48]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[49]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206--215.
[50]
Udo Schlegel, Hiba Arnout, Mennatallah El-Assady, Daniela Oelke, and Daniel A Keim. 2019. Towards a rigorous evaluation of xai methods on time series. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 4197--4201.
[51]
Christian Seeger, Alejandro Buchmann, and Kristof Van Laerhoven. 2011. myHealthAssistant: a phone-based body sensor network that captures the wearer's exercises throughout the day. In Proceedings of the 6th International Conference on Body Area Networks. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, 1--7.
[52]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[53]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
[54]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Vi�gas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[55]
Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596 (2020).
[56]
Hongyi Wen, Julian Ramos Rojas, and Anind K Dey. 2016. Serendipity: Finger gesture recognition using an off-the-shelf smartwatch. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3847--3851.
[57]
Qingxin Xia, Joseph Korpela, Yasuo Namioka, and Takuya Maekawa. 2020. Robust unsupervised factory activity recognition with body-worn accelerometer using temporal structure of multiple sensor data motifs. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--30.
[58]
Tianwei Xing, Luis Garcia, Marc Roig Vilamala, Federico Cerutti, Lance Kaplan, Alun Preece, and Mani Srivastava. 2020. Neuroplex: learning to detect complex events in sensor networks through knowledge injection. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 489--502.
[59]
Linyi Yang, Eoin M Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. 2020. Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv preprint arXiv:2010.12512 (2020).
[60]
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 351--360.
[61]
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum. 2018. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding. Advances in neural information processing systems 31 (2018).
[62]
Jeffrey M Zacks and Barbara Tversky. 2001. Event structure in perception and conception. Psychological bulletin 127, 1 (2001), 3.
[63]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.
[64]
Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8827--8836.
[65]
Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision. 5209--5217.

Cited By

View all
  • (2025)IoT-FAR: A multi-sensor fusion approach for IoT-based firefighting activity recognitionInformation Fusion10.1016/j.inffus.2024.102650113(102650)Online publication date: Jan-2025
  • (2024)XAI for U: Explainable AI for Ubiquitous, Pervasive and Wearable ComputingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677571(992-995)Online publication date: 5-Oct-2024
  • (2024)CrossHAR: Generalizing Cross-dataset Human Activity Recognition via Hierarchical Self-Supervised PretrainingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595978:2(1-26)Online publication date: 15-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 1
March 2023
1243 pages
EISSN:2474-9567
DOI:10.1145/3589760
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2023
Published in IMWUT Volume 7, Issue 1

Check for updates

Author Tags

  1. Activity recognition
  2. Explainable AI
  3. Interpretability
  4. Neural networks

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)687
  • Downloads (Last 6 weeks)88
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2025)IoT-FAR: A multi-sensor fusion approach for IoT-based firefighting activity recognitionInformation Fusion10.1016/j.inffus.2024.102650113(102650)Online publication date: Jan-2025
  • (2024)XAI for U: Explainable AI for Ubiquitous, Pervasive and Wearable ComputingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677571(992-995)Online publication date: 5-Oct-2024
  • (2024)CrossHAR: Generalizing Cross-dataset Human Activity Recognition via Hierarchical Self-Supervised PretrainingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595978:2(1-26)Online publication date: 15-May-2024
  • (2024)exHARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435008:1(1-30)Online publication date: 6-Mar-2024
  • (2024)ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's DiseaseProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649370(404-419)Online publication date: 29-May-2024
  • (2024)CAvatarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314247:4(1-24)Online publication date: 12-Jan-2024
  • (2024)LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML)10.1109/SenSys-ML62579.2024.00007(9-14)Online publication date: 13-May-2024
  • (2024)An Efficient and Optimized CNN-LSTM Framework for Complex Human Activity Recognition System Using Surface EMG Physiological Sensors and Feature Engineering2024 IEEE Students Conference on Engineering and Systems (SCES)10.1109/SCES61914.2024.10652396(1-6)Online publication date: 21-Jun-2024
  • (2024)Improving Performance and Explainability of Sensor-Based Human Activity Recognition2024 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON)10.1109/ECTIDAMTNCON60518.2024.10480073(562-566)Online publication date: 31-Jan-2024
  • (2024)Improved Coyote Optimization Algorithm and Deep Learning Driven Activity Recognition in HealthcareIEEE Access10.1109/ACCESS.2024.335798912(22158-22166)Online publication date: 2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media