Semi-supervised and un-supervised clustering: A review and experimental evaluation

Research output: Contribution to journalReview articlepeer-review

70 Scopus citations

Abstract

Retrieving, analyzing, and processing large data can be challenging. An effective and efficient mechanism for overcoming these challenges is to cluster the data into a compact and meaningful format that reflect the whole data. The learning techniques for clustering can be classified into supervised, semi-supervised, and un-supervised learning. Semi-supervised and un-supervised learning are more advantageous than supervised learning because it is laborious, and that prior knowledge is unavailable for most practical real-word problems. Towards this, we provide in this paper a review on semi-supervised and un-supervised learning methods. Unfortunately, most current survey papers categorize semi-supervised and un-supervised learning algorithms into broad clustering classes and do not drive clear boundaries between the specific techniques employed by the algorithms. That is, they do not set clear distinguishable boundaries between the specific techniques adopted by the algorithms. To overcome this, we provide a detailed methodology-based taxonomy that categorizes the algorithms into hierarchically nested, specific, and fine-grained classes. The taxonomy is hierarchically nested as follows: clustering categories → clustering methods → clustering sub-methods. First, the algorithms are classified into broad categories. In turn, each category is further classified into various methods. These methods are classified into sub-methods. We survey and describe over 200 state-of-the-art algorithms that employ the underlying principles of each clustering method/sub-method. We experimentally evaluate and rank the following: (1) the various clustering sub-methods that fall under a same clustering method, (2) the various clustering methods that fall under a same clustering category, (3) the various clustering categories.

Original languageBritish English
Article number102178
JournalInformation Systems
Volume114
DOIs
StatePublished - Mar 2023

Keywords

  • Clustering category
  • Clustering method
  • Semi-supervised learning
  • Un-supervised learning

Fingerprint

Dive into the research topics of 'Semi-supervised and un-supervised clustering: A review and experimental evaluation'. Together they form a unique fingerprint.

Cite this