Conferencia Invitada: Data quality in the era of Foundational Models
Abstract
Deep learning models usually need extensive amounts of data, and these data have to be labeled, becoming a concern when dealing with real-world applications. It is known that labeling a dataset is a costly task in time, money, and resource-wise. Foundational models are becoming a strong trend in different application fields, from natural language processing to image analysis. Commonly, foundational models are pre-trained with very large datasets in self-supervised fashion, with multi-modal data (text, images, audio, etc.). The usage of these models in target domains and tasks decreases the need of labeling very large target datasets, even more when using scarcely labeled data regimes: semi-supervised, self-supervised, few-shot learning, etc. However, data quality for both training and evaluation of the model even in these settings is important. In this talk, we address different data quality attributes for both training and evaluation of the model, which are still relevant for systems based upon foundational models.
Published
Issue
Section
License
Copyright (c) 2023 Saúl Marcelo Calderón Ramírez

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.











