Name-Based Embedding Debiasing: Analyzing Its Impact on Gender, Religious, and Ethnic Biases

Gianmarco Cafferata; Mariano G. Beiró

doi:10.24215/15146774e102

Autores/as

Gianmarco Cafferata Universidad de San Andrés, Argentina https://orcid.org/0009-0006-6606-8448
Mariano G. Beiró Universidad de San Andrés, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina https://orcid.org/0000-0002-5474-0309

DOI:

https://doi.org/10.24215/15146774e102

Palabras clave:

embeddings de palabras, mitigación de sesgos en embeddings, modelos de lenguaje, equidad

Resumen

Las representaciones vectoriales de palabras representaron el punto de inflexión técnico que dio inicio a los métodos actuales del estado del arte para diversas tareas de Procesamiento del Lenguaje Natural (PLN). Las métricas de sesgo y los métodos de mitigación para embeddings estáticos han sido objeto de estudio con un éxito moderado, logrando reducciones de sesgo para grupos y métricas específicos. No obstante, estos métodos frecuentemente no logran optimizar múltiples métricas de manera simultánea ni impactar significativamente en tareas extrínsecas. La investigación reciente en mitigación se ha reorientado principalmente hacia las representaciones contextuales y los grandes modelos de lenguaje (LLMs). En este trabajo se sostiene que las representaciones estáticas proporcionan un entorno experimental más simple y controlado para la validación de hipótesis y técnicas, las cuales pueden ser posteriormente extrapoladas a modelos de mayor complejidad. Se presenta un método que captura múltiples dimensiones demográficas (género, raza, edad, etc.) en representaciones estáticas simultáneamente, eliminando la dependencia de tareas especializadas o de vocabulario demográfico específico.

Descargas

Los datos de descarga aún no están disponibles.

Referencias

Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29. https://doi.org/10.48550/arXiv.1607.06520

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Quincy Davis, J., Demszky, D., … Liang, P. (2021). On the opportunities and risks of foundation models. arXiv, 2108.07258. https://doi.org/10.48550/arXiv.2108.07258

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Tulio Ribeiro, M., … Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv, 2303.12712. https://doi.org/10.48550/arXiv.2303.12712

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230

Condon, D. M., Coughlin, J., & Weston, S. J. (2022). Personality trait descriptors: 2,818 trait descriptive adjectives characterized by familiarity, frequency of use, and prior use in psycholexical research. Journal of Open Psychology Data, 10(1), 1. https://doi.org/10.5334/jopd.57

Dev, S., & Phillips, J. (2019). Attenuating bias in word vectors. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89, 879-887. https://proceedings.mlr.press/v89/dev19a.html

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021). Bold: Dataset and metrics for measuring biases in openended language generation. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT '21, 862–872. https://doi.org/10.1145/3442188.3445924

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001). Placing search in context: The concept revisited. ACM Transactions on Information Systems (TOIS), 20(1), 406–414. https://doi.org/10.1145/503104.503110

Gerz, D., Vulic, I., Hill, F., Reichart, R., & Korhonen, A. (2016). Simverb-3500: A large-scale evaluation set of verb similarity. In J. Su, K. Duh, & X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2173–2182). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1235

Gonen, H., & Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 609–614). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1061

Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464. https://doi.org/10.1037/00223514.74.6.1464

Hill, F., Reichart, R., & Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4), 665–695. https://doi.org/10.1162/COLI_a_00237

Huang, P.-S., Zhang, H., Jiang, R., Stanforth, R., Welbl, J., Rae, J., Maini, V., Yogatama, D., & Kohli, P. (2020). Reducing sentiment bias in language models via counterfactual evaluation. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 65–83). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.7

Luong, M.-T., Socher, R., & Manning, C. D. (2013). Better word representations with recursive neural networks for morphology. In J. Hockenmaier, & S. Riedel (Eds.), Proceedings of the seventeenth conference on computational natural language learning (pp. 104–113). Association for Computational Linguistics. https://aclanthology.org/W13-3512/

Manzini, T., Lim, Y. C., Tsvetkov, Y., & Black, A. W. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 615–621). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1062

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv, 1301.3781. https://doi.org/10.48550/arXiv.1301.3781

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26. https://doi.org/10.48550/arXiv.1310.4546

Nadeem, M., Bethke, A., & Reddy, S. (2021). Stereoset: Measuring stereotypical bias in pretrained language models. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 5356–5371). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.416

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.),Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162

Pilehvar, M. T., Kartsaklis, D., Prokhorov, V., & Collier, N. (2018). Card-660: Cambridge rare word dataset-a reliable benchmark for infrequent word representation models. In E. Riloff, D. Chiang, J. Hockenmaier, & J. Tsujii (Eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 1391–1401). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1169

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.

Sarzynska-Wawer, J., Wawer, A., Pawlak, A., Szymanowska, J., Stefaniak, I., Jarkiewicz, M., & Okruszek, L. (2021). Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research, 304, 114135. https://doi.org/10.1016/j.psychres.2021.114135

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3407–3412). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1339

Van der Maaten, L. , & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(86), 2579–2605. https://www.jmlr.org/papers/v9/vandermaaten08a.html

Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K.-W. (2018). Learning gender-neutral word embeddings. In E. Riloff, D. Chiang, J. Hockenmaier, J. Tsujii (Eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4847–4853). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1521