Evaluation of Named Entity Recognition in Historical Argentinian Documents

Facundo Darfe; Eduardo Xamena; Carlos I. Orozco

Evaluation of Named Entity Recognition in Historical Argentinian Documents

Authors

Facundo Darfe Universidad Nacional de Salta, Argentina
Eduardo Xamena Universidad Nacional de Salta, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
Carlos I. Orozco Universidad Nacional de Salta, Argentina

Keywords:

Named Entity Recognition and Classification, Argentinian History, Pretrained Language Models

Abstract

Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General G¨uemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models. Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.

Downloads

Published

2022-12-14

Issue

Vol. 8 No. 2 (2022): ASAI 2022 - Simposio Argentino de Inteligencia Artificial

Section

ASAI - Simposio Argentino de Inteligencia Artificial

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.

How to Cite

Darfe, F., Xamena, E., & Orozco, C. I. (2022). Evaluation of Named Entity Recognition in Historical Argentinian Documents. JAIIO, Jornadas Argentinas De Informática, 8(2), 98-109. https://revistas.unlp.edu.ar/JAIIO/article/view/18395