Decoding Semantic Ambiguity in Large Language Models: Aligning Human Behavioral Responses with GPT-2’s Internal Representations

Authors

Keywords:

LLMs, disambiguation, neurolinguistics

Abstract

Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems. 

Downloads

Published

2025-10-15

Issue

Section

ASAID - Argentine Symposium on Artificial Intelligence and Data Science

How to Cite

Gianolini, A., Paez, B., Totaro, F., Laurino, J., Travi, F., Fernández Slezak, D., Kaczer, L., Kamienkowski, J. E., & Bianchi, B. (2025). Decoding Semantic Ambiguity in Large Language Models: Aligning Human Behavioral Responses with GPT-2’s Internal Representations. JAIIO, Jornadas Argentinas De Informática, 11(1), 265-282. https://revistas.unlp.edu.ar/JAIIO/article/view/19824