Estimating Degradation of Machine Learning Data Assets

Lara Mauri, Ernesto Damiani

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Large-scale adoption of Artificial Intelligence and Machine Learning (AI-ML) models fed by heterogeneous, possibly untrustworthy data sources has spurred interest in estimating degradation of such models due to spurious, adversarial, or low-quality data assets. We propose a quantitative estimate of the severity of classifiers' training set degradation: an index expressing the deformation of the convex hulls of the classes computed on a held-out dataset generated via an unsupervised technique. We show that our index is computationally light, can be calculated incrementally and complements well existing ML data assets' quality measures. As an experimentation, we present the computation of our index on a benchmark convolutional image classifier.

Original languageBritish English
Article number9
JournalJournal of Data and Information Quality
Volume14
Issue number2
DOIs
StatePublished - Jun 2022

Keywords

  • Data assets
  • ML models

Fingerprint

Dive into the research topics of 'Estimating Degradation of Machine Learning Data Assets'. Together they form a unique fingerprint.

Cite this