Conferencia Invitada: Data quality in the era of Foundational Models

Autores/as

  • Saúl Marcelo Calderón Ramírez Instituto Tecnológico de Costa Rica, Costa Rica

Resumen

Deep learning models usually need extensive amounts of data, and these data have to be labeled, becoming a concern when dealing with real-world applications. It is known that labeling a dataset is a costly task in time, money, and resource-wise. Foundational models are becoming a strong trend in different application fields, from natural language processing to image analysis. Commonly, foundational models are pre-trained with very large datasets in self-supervised fashion, with multi-modal data (text, images, audio, etc.). The usage of these models in target domains and tasks decreases the need of labeling very large target datasets, even more when using scarcely labeled data regimes: semi-supervised, self-supervised, few-shot learning, etc. However, data quality for both training and evaluation of the model even in these settings is important. In this talk, we address different data quality attributes for both training and evaluation of the model, which are still relevant for systems based upon foundational models.

Publicado

2023-10-12

Número

Sección

SAIV - Simposio Argentino de Imágenes y Visión

Cómo citar

Calderón Ramírez, S. M. (2023). Conferencia Invitada: Data quality in the era of Foundational Models. JAIIO, Jornadas Argentinas De Informática, 9(12). https://revistas.unlp.edu.ar/JAIIO/article/view/18241