Extraction of information from Electronic Medical Records written in Spanish to conduct epidemic intelligence
Keywords:
recognition of named entities, BioNLP, electronic medical records in Spanish, automatic symptom detection, event based surveillanceAbstract
Automatic symptom detection from electronic health records is a valuable resource for event-driven surveillance systems. In this study, we developed tools to automatically detect symptoms associated with febrile illnesses in electronic health records written in Spanish. To do so, we used a custom corpus that includes 6,228 expert-annotated health reports and approximately 1 million unannotated reports. Our strategy consisted of fine-tuning state-of-the-art named entity recognition models, including BiLSTMCRF models and transformer-based models such as RoBERTa. We focused on domain- and task-adapted models to improve performance: the former were pre-trained on biomedical corpora, while the latter were additionally pre-trained on our unannotated health reports. Despite computational limitations, our models demonstrated promising results. In particular, RoBERTa-Clinico, a transformer-based model tailored to the task and pre-trained on our unannotated corpus, achieved the best performance for micro recall (79.30) and a micro F1 of 70.83, figures comparable to similar studies. In this way, we contribute to the limited body of work on BioNLP in Spanish.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Javier Petri, Pilar Barcena Barbeira, Viviana Cotik

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.











