Logo image
Inherent Discriminability of BERT Towards Racial Minority Associated Data
Conference proceeding   Peer reviewed

Inherent Discriminability of BERT Towards Racial Minority Associated Data

Maryam Taeb, Hongmei Chi, Edward L. Jones and Ziheng Chi
Computational Science and Its Applications – ICCSA 2021, Volume 3, Vol.12951, pp.256-271
Lecture Notes in Computer Science
International Conference on Computational Science and Applications (ICCSA 2021), 21st (Cagliari, Italy, 09/13/2021–09/16/2021)
2021
Web of Science ID: WOS:000722406300019

Metrics

Abstract

AI and BERT (Bidirectional Encoder Representations from Transformers) have been increasingly adopted in the human resources (HR) industry for recruitment. The increased efficiency (e.g., fairness) will help remove biases in machine learning, help organizations find a qualified candidate, and remove bias in the labor market. BERT has further improved the performance of language representation models by using an auto-encoding model which incorporates larger bidirectional contexts. However, BERT's underlying mechanisms that enhance its effectiveness, such as tokenization, masking, and leveraging the attention mechanism to compute vector score, are not well understood. This research analyzes how BERT's architecture and its tokenization protocol affect the low number of occurrences of the minority-related data using the cosine similarity of its embeddings. In this project, by using a dataset of racially and gender-associated personal names and analyzing the interactions of transformers, we present the unfair prejudice of BERTs' pre-trained network and autoencoding model. Furthermore, by analyzing the distance of an initial word's token and its MASK replacement token using the cosine similarity, we will demonstrate the inherent discriminability during pre-training. Finally, this research will deliver potential solutions to mitigate discrimination and bias in BERT by examining its geometric properties.

Details

Logo image