Abstract
Large-scale adoption of Artificial Intelligence and Machine Learning (AI-ML) models fed by heterogeneous, possibly untrustworthy data sources has spurred interest in estimating degradation of such models due to spurious, adversarial, or low-quality data assets. We propose a quantitative estimate of the severity of classifiers' training set degradation: an index expressing the deformation of the convex hulls of the classes computed on a held-out dataset generated via an unsupervised technique. We show that our index is computationally light, can be calculated incrementally and complements well existing ML data assets' quality measures. As an experimentation, we present the computation of our index on a benchmark convolutional image classifier.
Original language | British English |
---|---|
Article number | 9 |
Journal | Journal of Data and Information Quality |
Volume | 14 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2022 |
Keywords
- Data assets
- ML models