TY - GEN
T1 - Machine Learning Pipeline for Reusing Pretrained Models
AU - Alshehhi, Maryam
AU - Wang, Di
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/11/2
Y1 - 2020/11/2
N2 - Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
AB - Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
KW - Domain Adaptation
KW - Knowledge Transfer
KW - Model Reuse
KW - Supervised Learning
KW - Transfer Learning
UR - http://www.scopus.com/inward/record.url?scp=85097868547&partnerID=8YFLogxK
U2 - 10.1145/3415958.3433054
DO - 10.1145/3415958.3433054
M3 - Conference contribution
AN - SCOPUS:85097868547
T3 - Proceedings of the 12th International Conference on Management of Digital EcoSystems, MEDES 2020
SP - 72
EP - 75
BT - Proceedings of the 12th International Conference on Management of Digital EcoSystems, MEDES 2020
T2 - 12th International Conference on Management of Digital EcoSystems, MEDES 2020
Y2 - 2 November 2020 through 4 November 2020
ER -