An indexing scheme for fast and accurate chemical fingerprint database searching

Zeyar Aung, See Kiong Ng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Rapid chemical database searching is important for drug discovery. Chemical compounds are represented as long fixed-length bit vectors called fingerprints. The vectors record the presence or absence of particular features or substructures of the corresponding molecules. In a typical drug discovery application, several thousands of query fingerprints are screened for similarity against a database of millions of fingerprints to identify suitable drug candidates. The existing methods of full database scan and range search take considerable amounts of time for such a task. We present a new index-based search method called "ChemDex" (Chemical fingerprint inDexing) for speeding up the fingerprint database search. We propose a novel chain scoring scheme to calculate the Tanimoto (Jaccard) scores of the fingerprints using an early-termination strategy. We tested our proposed method using 1,000 randomly selected query fingerprints on the NCBI PubChem database containing about 19.5 million fingerprints. Experimental results show that ChemDex is up to 109.9 times faster than the full database scan method, and up to 2.1 times faster than the state-of-the-art range search method for memory-based retrieval. For disk-based retrieval, it is up to 145.7 times and 1.7 times faster than the full scan and the range search respectively. The speedup is achieved without any loss of accuracy as ChemDex generates exactly the same results as the full scan and the range search.

Original languageBritish English
Title of host publicationScientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings
Pages288-305
Number of pages18
DOIs
StatePublished - 2010
Event22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010 - Heidelberg, Germany
Duration: 30 Jun 20102 Jul 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6187 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010
Country/TerritoryGermany
CityHeidelberg
Period30/06/102/07/10

Fingerprint

Dive into the research topics of 'An indexing scheme for fast and accurate chemical fingerprint database searching'. Together they form a unique fingerprint.

Cite this