Informational rescaling of PCA maps with application to genetic distance

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Moreover, we show that in certain instances our proposed scaled PCA can improve cluster identification. Rescaling principal component-based distances using MI results in a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy-rescaled PCA, while preserving order relationships (along a dimension), quantifies relative distances into information units, such as “bits”. We illustrate the effect of this rescaling using genomics data derived from world populations and describe how the interpretation of results is impacted.

Original languageBritish English
Pages (from-to)48-56
Number of pages9
JournalComputational and Structural Biotechnology Journal
Volume27
DOIs
StatePublished - Jan 2025

Keywords

  • Entropy
  • Genetic distance
  • Genetic maps
  • Information theory
  • Mutual information

Fingerprint

Dive into the research topics of 'Informational rescaling of PCA maps with application to genetic distance'. Together they form a unique fingerprint.

Cite this