Evaluating posterior probabilities: Decision theory, proper scoring rules, and calibration
Keywords:
proper scoring rules, calibration, classification systems, decision theoryAbstract
Most machine learning classifiers are designed to output posterior probabilities for the classes given the input sample. These probabilities may be used to make the categorical decision on the class of the sample; provided as input to a downstream system; or provided to a human for interpretation. Evaluating the quality of the posteriors generated by these systems is an essential problem which was addressed decades ago with the invention of proper scoring rules (PSRs). Unfortunately, much of the recent machine learning literature uses calibration metrics—most commonly, the expected calibration error (ECE)—as a proxy to assess posterior performance. The problem with this approach is that calibration metrics reflect only one aspect of the quality of the posteriors, ignoring the discrimination performance. For this reason, we argue that calibration metrics should play no role in the assessment of posterior quality and expected PSRs should instead be used for this job. While not useful for performance assessment, calibration metrics may be used as diagnostic tools during system development. With this purpose in mind, we discuss a simple and practical calibration metric, called calibration loss. We compare this metric with the ECE and with the expected score divergence metric and argue that calibration loss is superior to these two metrics.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Luciana Ferrer, Daniel Ramos

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Acorde a estos términos, el material se puede compartir (copiar y redistribuir en cualquier medio o formato) y adaptar (remezclar, transformar y crear a partir del material otra obra), siempre que a) se cite la autoría y la fuente original de su publicación (revista y URL de la obra), b) no se use para fines comerciales y c) se mantengan los mismos términos de la licencia.











