Extraction of information from Electronic Medical Records written in Spanish to conduct epidemic intelligence

Authors

  • Javier Petri Universidad de Buenos Aires, Argentina
  • Pilar Barcena Barbeira Universidad de Buenos Aires, Argentina
  • Viviana Cotik Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina

Keywords:

recognition of named entities, BioNLP, electronic medical records in Spanish, automatic symptom detection, event based surveillance

Abstract

Automatic symptom detection from electronic health records is a valuable resource for event-driven surveillance systems. In this study, we developed tools to automatically detect symptoms associated with febrile illnesses in electronic health records written in Spanish. To do so, we used a custom corpus that includes 6,228 expert-annotated health reports and approximately 1 million unannotated reports. Our strategy consisted of fine-tuning state-of-the-art named entity recognition models, including BiLSTMCRF models and transformer-based models such as RoBERTa. We focused on domain- and task-adapted models to improve performance: the former were pre-trained on biomedical corpora, while the latter were additionally pre-trained on our unannotated health reports. Despite computational limitations, our models demonstrated promising results. In particular, RoBERTa-Clinico, a transformer-based model tailored to the task and pre-trained on our unannotated corpus, achieved the best performance for micro recall (79.30) and a micro F1 of 70.83, figures comparable to similar studies. In this way, we contribute to the limited body of work on BioNLP in Spanish.

Downloads

Published

2025-10-15

How to Cite

Petri, J., Barcena Barbeira, P., & Cotik, V. (2025). Extraction of information from Electronic Medical Records written in Spanish to conduct epidemic intelligence. JAIIO, Jornadas Argentinas De Informática, 11(1), 102. https://revistas.unlp.edu.ar/JAIIO/article/view/19758