AI and BERT (Bidirectional Encoder Representations from Transformers) have been increasingly adopted in the human resources (HR) industry for recruitment. The increased efficiency (e.g., fairness) will help remove biases in machine learning, help organizations find a qualified candidate, and remove bias in the labor market. BERT has further improved the performance of language representation models by using an auto-encoding model which incorporates larger bidirectional contexts. However, BERT's underlying mechanisms that enhance its effectiveness, such as tokenization, masking, and leveraging the attention mechanism to compute vector score, are not well understood.
This research analyzes how BERT's architecture and its tokenization protocol affect the low number of occurrences of the minority-related data using the cosine similarity of its embeddings. In this project, by using a dataset of racially and gender-associated personal names and analyzing the interactions of transformers, we present the unfair prejudice of BERTs' pre-trained network and autoencoding model. Furthermore, by analyzing the distance of an initial word's token and its MASK replacement token using the cosine similarity, we will demonstrate the inherent discriminability during pre-training. Finally, this research will deliver potential solutions to mitigate discrimination and bias in BERT by examining its geometric properties.
Related links
Details
Title
Inherent Discriminability of BERT Towards Racial Minority Associated Data
Publication Details
Computational Science and Its Applications – ICCSA 2021, Volume 3, Vol.12951, pp.256-271
Resource Type
Conference proceeding
Conference
International Conference on Computational Science and Applications (ICCSA 2021), 21st (Cagliari, Italy, 09/13/2021–09/16/2021)