Decoding Semantic Ambiguity in Large Language Models: Aligning Human Behavioral Responses with GPT-2’s Internal Representations

Agustín Gianolini; Belén Paez; Facundo Totaro; Julieta Laurino; Fermín Travi; Diego Fernández Slezak; Laura Kaczer; Juan E. Kamienkowski; Bruno Bianchi

Decoding Semantic Ambiguity in Large Language Models: Aligning Human Behavioral Responses with GPT-2’s Internal Representations

Authors

Agustín Gianolini Universidad de Buenos Aires, Argentina https://orcid.org/0009-0003-7798-1134
Belén Paez Universidad de Buenos Aires, Argentina https://orcid.org/0000-0001-5252-4840
Facundo Totaro Universidad de Buenos Aires, Argentina https://orcid.org/0009-0003-0475-6892
Julieta Laurino Universidad de Buenos Aires, Argentina https://orcid.org/0000-0001-9132-2854
Fermín Travi Universidad de Buenos Aires, Argentina https://orcid.org/0009-0004-0833-3333
Diego Fernández Slezak Universidad de Buenos Aires, Argentina https://orcid.org/0000-0001-6348-1559
Laura Kaczer Universidad de Buenos Aires, Argentina https://orcid.org/0000-0001-7969-9640
Juan E. Kamienkowski Universidad de Buenos Aires, Argentina https://orcid.org/0000-0002-5725-6539
Bruno Bianchi Universidad de Buenos Aires, Argentina https://orcid.org/0000-0001-5252-4840

Keywords:

LLMs, disambiguation, neurolinguistics

Abstract

Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.

Downloads

Published

2025-10-15

Issue

Vol. 11 No. 1 (2025): ASAID – Argentine Symposium on Artificial Intelligence and Big Data

Section

ASAID - Argentine Symposium on Artificial Intelligence and Data Science

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.

How to Cite

Gianolini, A., Paez, B., Totaro, F., Laurino, J., Travi, F., Fernández Slezak, D., Kaczer, L., Kamienkowski, J. E., & Bianchi, B. (2025). Decoding Semantic Ambiguity in Large Language Models: Aligning Human Behavioral Responses with GPT-2’s Internal Representations. JAIIO, Jornadas Argentinas De Informática, 11(1), 265-282. https://revistas.unlp.edu.ar/JAIIO/article/view/19824