TY - GEN
T1 - K-Means Clustering in Dual Space for Unsupervised Feature Partitioning in Multi-view Learning
AU - Mio, Corrado
AU - Gianini, Gabriele
AU - Damiani, Ernesto
N1 - Funding Information:
The point that we wanted to make is that one can take methods designed for working in instance space and use them in feature space. The application of such dual-space approach can be extended to many other situations, that will be the object of future works. ACKNOWLEDGEMENTS The authors acknowledge the support of the Information and Communication Technology Fund (ICT Fund) at EBTIC/Khalifa University, Abu Dhabi, UAE (Project number 88434000029). The work was partially founded also by the European Unions Horizon 2020 research and innovation programme, within the projects Toreador(grant agreement No. 688797), Evotion (grant agreement No. 727521) and Threat-Arrest (Project-ID No. 786890).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - In contrast to single-view learning, multi-view learning trains simultaneously distinct algorithms on disjoint subsets of features (the views), and jointly optimizes them, so that they come to a consensus. Multi-view learning is typically used when the data are described by a large number of features. It aims at exploiting the different statistical properties of distinct views. A task to be performed before multi-view learning - in the case where the features have no natural groupings - is multi-view generation (MVG): it consists in partitioning the feature set in subsets (views) characterized by some desired properties. Given a dataset, in the form of a table with a large number of columns, the desired solution of the MVG problem is a partition of the columns that optimizes an objective function, encoding typical requirements. If the class labels are available, one wants to minimize the inter-view redundancy in target prediction and maximize consistency. If the class labels are not available, one wants simply to minimize inter-view redundancy (minimize the information each view has about the others). In this work, we approach the MVG problem in the latter, unsupervised, setting. Our approach is based on the transposition of the data table: the original instance rows are mapped into columns (the 'pseudo-features'), while the original feature columns become rows (the 'pseudo-instances'). The latter can then be partitioned by any suitable standard instance-partitioning algorithm: the resulting groups can be considered as groups of the original features, i.e. views, solution of the MVG problem. We demonstrate the approach using k-means and the standard benchmark MNIST dataset of handwritten digits.
AB - In contrast to single-view learning, multi-view learning trains simultaneously distinct algorithms on disjoint subsets of features (the views), and jointly optimizes them, so that they come to a consensus. Multi-view learning is typically used when the data are described by a large number of features. It aims at exploiting the different statistical properties of distinct views. A task to be performed before multi-view learning - in the case where the features have no natural groupings - is multi-view generation (MVG): it consists in partitioning the feature set in subsets (views) characterized by some desired properties. Given a dataset, in the form of a table with a large number of columns, the desired solution of the MVG problem is a partition of the columns that optimizes an objective function, encoding typical requirements. If the class labels are available, one wants to minimize the inter-view redundancy in target prediction and maximize consistency. If the class labels are not available, one wants simply to minimize inter-view redundancy (minimize the information each view has about the others). In this work, we approach the MVG problem in the latter, unsupervised, setting. Our approach is based on the transposition of the data table: the original instance rows are mapped into columns (the 'pseudo-features'), while the original feature columns become rows (the 'pseudo-instances'). The latter can then be partitioned by any suitable standard instance-partitioning algorithm: the resulting groups can be considered as groups of the original features, i.e. views, solution of the MVG problem. We demonstrate the approach using k-means and the standard benchmark MNIST dataset of handwritten digits.
KW - Consensus clustering
KW - Dual space clustering
KW - K-means
KW - Multi-view learning
UR - http://www.scopus.com/inward/record.url?scp=85065919777&partnerID=8YFLogxK
U2 - 10.1109/SITIS.2018.00012
DO - 10.1109/SITIS.2018.00012
M3 - Conference contribution
AN - SCOPUS:85065919777
T3 - Proceedings - 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018
SP - 1
EP - 8
BT - Proceedings - 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018
A2 - Chbeir, Richard
A2 - di Baja, Gabriella Sanniti
A2 - Gallo, Luigi
A2 - Yetongnon, Kokou
A2 - Dipanda, Albert
A2 - Castrillon-Santana, Modesto
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018
Y2 - 26 November 2018 through 29 November 2018
ER -