short-paper

'Choose your Data Wisely': Active Learning based Selection with Multi-Objective Optimisation for Mitigating Stereotypes

Authors:

Manish Chandra,

Debasis Ganguly,

Iadh OunisAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 3768 - 3772

https://doi.org/10.1145/3583780.3615261

Published: 21 October 2023 Publication History

Abstract

Data-driven (deep) learning methods has led to parameterised abstractions of the data, often leading to stereotype societal biases in their predictions, e.g., predicting more frequently that women are weaker than men, or that African Americans are more likely to commit crimes than Caucasians. Standard approaches of mitigating such stereotypical biases from deep neural models include modifying the training dataset (pre-processing), or adjusting the model parameters with a bias-specific objective (in-processing). In our work, we approach this bias mitigation from a different perspective - that of an active learning-based selection of a subset of data instances towards training a model optimised for both effectiveness and fairness. Specifically speaking, the imbalances in the attribute value priors can be alleviated by constructing a balanced subset of the data instances with two selection objectives - first, of improving the model confidence of the primary task itself (a standard practice in active learning), and the second, of taking into account the parity of the model predictions with respect to the sensitive attributes, such as gender and race etc. We demonstrate that our proposed selection function achieves better results in terms of both the primary task effectiveness and fairness. The results are further shown to improve when this active learning-based data selection is combined with an in-process method of multi-objective training.

References

[1]

Mohsan Alvi, Andrew Zisserman, and Christoffer Nellraker. 2019. Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings. In ECCV 2018 Workshops. 556--572.

Digital Library

[2]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias There's software used across the country to predict future criminals. And it's biased against blacks. (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

[3]

Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. ArXiv, Vol. abs/1707.00075 (2017).

[4]

Chandan Biswas, Debasis Ganguly, Partha Sarathi Mukherjee, Ujjwal Bhattacharya, and Yufang Hou. 2022. Privacy-Aware Supervised Classification: An Informative Subspace Based Multi-Objective Approach. Pattern Recogn., Vol. 122, C (feb 2022), 8 pages. https://doi.org/10.1016/j.patcog.2021.108301

Digital Library

[5]

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer As Woman is to Homemaker? Debiasing Word Embeddings. In Proc. of NIPS 2016. 4356--4364.

[6]

Flavio du Pin Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R. Varshney. 2018. Data Pre-Processing for Discrimination Prevention: Information-Theoretic Optimization and Analysis. IEEE Journal of Selected Topics in Signal Processing, Vol. 12, 5 (2018), 1106--1119. https://doi.org/10.1109/JSTSP.2018.2865887

[7]

Simon Caton and Christian Haas. 2020. Fairness in Machine Learning: A Survey. ArXiv, Vol. abs/2010.04053 (2020).

[8]

L. Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K. Vishnoi. 2019. Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* '19). Association for Computing Machinery, New York, NY, USA, 319--328. https://doi.org/10.1145/3287560.3287586

Digital Library

[9]

Lu Cheng, Ahmadreza Mosallanezhad, Yasin Silva, Deborah Hall, and Huan Liu. 2021. Mitigating Bias in Session-based Cyberbullying Detection: A Non-Compromising Approach. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). ACL, 2158--2168.

[10]

Silvia Chiappa. 2019. Path-Specific Counterfactual Fairness. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (Honolulu, Hawaii, USA) (AAAI'19/IAAI'19/EAAI'19). AAAI Press, Article 957, 8 pages. https://doi.org/10.1609/aaai.v33i01.33017801

Digital Library

[11]

Maximin Coavoux, Shashi Narayan, and Shay B. Cohen. 2018. Privacy-preserving Neural Representations of Text. In EMNLP. Association for Computational Linguistics, 1--10.

[12]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic Decision Making and the Cost of Fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada) (KDD '17). Association for Computing Machinery, New York, NY, USA, 797--806. https://doi.org/10.1145/3097983.3098095

Digital Library

[13]

Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (New Orleans, LA, USA) (AIES '18). Association for Computing Machinery, New York, NY, USA, 67--73. https://doi.org/10.1145/3278721.3278729

Digital Library

[14]

Marina Drosou, H. V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. 2017. Diversity in Big Data: A Review. Big data, Vol. 5 2 (2017), 73--84.

[15]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, New York, NY, USA, 214--226. https://doi.org/10.1145/2090236.2090255

Digital Library

[16]

Hila Gonen and Yoav Goldberg. 2019. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. In Proc. of NAACL 2019. 609--614.

[17]

Sara Hajian and Josep Domingo-Ferrer. 2013. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE Transactions on Knowledge and Data Engineering, Vol. 25 (2013), 1445--1459.

Digital Library

[18]

Moritz Hardt, Eric Price, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc.

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90

[20]

Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, Vol. 33 (2011), 1 -- 33.

Digital Library

[21]

Svetlana Kiritchenko and Saif Mohammad. 2018a. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proc. of Joint Conference on Lexical and Computational Semantics. 43--53.

[22]

Svetlana Kiritchenko and Saif M. Mohammad. 2018b. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. CoRR, Vol. abs/1805.04508 (2018). showeprint[arXiv]1805.04508 http://arxiv.org/abs/1805.04508

[23]

Anja Lambrecht and Catherine Tucker. 2019. Algorithmic Bias? An Empirical Study of Apparent Gender-Based Discrimination in the Display of STEM Career Ads. Management Science, Vol. 65, 7 (2019), 2966--2981.

Digital Library

[24]

David D. Lewis and Jason Catlett. 1994. Heterogeneous Uncertainty Sampling for Supervised Learning. In Machine Learning Proceedings 1994, William W. Cohen and Haym Hirsh (Eds.). Morgan Kaufmann, San Francisco (CA), 148--156. https://doi.org/10.1016/B978--1--55860--335--6.50026-X

[25]

Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2020. Towards Debiasing Sentence Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 5502--5515.

[26]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[27]

Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. K-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). ACM, 502--510.

Digital Library

[28]

Ishani Mondal and Debasis Ganguly. 2020. ALEX: Active Learning Based Enhancement of a Classification Model's EXplainability. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 3309--3312. https://doi.org/10.1145/3340531.3417456

Digital Library

[29]

Ishani Mondal, Procheta Sen, and Debasis Ganguly. 2021. Multi-Objective Few-Shot Learning for Fair Classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 3338--3342. https://doi.org/10.1145/3459637.3482146

Digital Library

[30]

Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-Aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 560--568.

Digital Library

[31]

Amirarsalan Rajabi, Mehdi Yazdani-Jahromi, Ozlem Ozmen Garibay, and Gita Sukthankar. 2022. Through a fair looking-glass: mitigating bias in image datasets. https://doi.org/10.48550/ARXIV.2209.08648

[32]

Navid Rekabsaz, Simone Kopeinik, and Markus Schedl. 2021. Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation of BERT Rankers. In SIGIR. ACM, 306--316.

[33]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proc. of KDD'16. 1135--1144.

Digital Library

[34]

Procheta Sen and Debasis Ganguly. 2020. Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning. In AAAI Conference on Artificial Intelligence.

[35]

Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin--Madison. http://axon.cs.byu.edu/ martinez/classes/778/Papers/settles.activelearning.pdf

[36]

Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii, 1070--1079. https://aclanthology.org/D08--1112

[37]

Latanya Sweeney. 2013. Discrimination in Online Ad Delivery. Commun. ACM, Vol. 56, 5 (may 2013), 44--54.

Digital Library

[38]

Zeyu Wang, Klint Qinami, Yannis Karakozis, Kyle Genova, Prem Qu Nair, Kenji Hata, and Olga Russakovsky. 2019. Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 8916--8925.

[39]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 325--333. https://proceedings.mlr.press/v28/zemel13.html

[40]

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. ACM, 335--340. https://doi.org/10.1145/3278721.3278779showDOI

Index Terms

'Choose your Data Wisely': Active Learning based Selection with Multi-Objective Optimisation for Mitigating Stereotypes
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Initial training data selection for active learning
ICUIMC '11: Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication

The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled training data. Active learning is one method which addresses this issue by selecting the most informative data for ...
Column subset selection for active learning in image classification

Image classification is an important task in computer vision and machine learning. However, it is known that manually labeling images is time-consuming and expensive, but the unlabeled images are easily available. Active learning is a mechanism which ...
Multiple-view multiple-learner active learning

Generally, collecting a large quantity of unlabeled examples is feasible, but labeling them all is not. Active learning can reduce the number of labeled examples needed to train a good classifier. Existing active learning algorithms can be roughly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright � 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
92
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents