Document versioning using feature space distances

Wei Lee Woon, Kuok Shoong Daniel Wong, Zeyar Aung, Davor Svetinovic

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The automated analysis of documents is an important task given the rapid increase in availability of digital texts. In an earlier publication, we had presented a framework where the edit distances between documents was used to reconstruct the version history of a set of documents. However, one problem which we encountered was the high computational costs of calculating these edit distances. In addition, the number of document comparisons which need to be done scales quadratically with the number of documents. In this paper we propose a simple approximation which retains many of the benefits of the method, but which greatly reduces the time required to calculate these edit distances. To test the utility of this method, the accuracy of the results obtained using this approximation is compared to the original results.

Original languageBritish English
Title of host publicationNeural Information Processing - 21st International Conference, ICONIP 2014, Proceedings
EditorsChu Kiong Loo, Keem Siah Yap, Kok Wai Wong, Andrew Teoh, Kaizhu Huang
PublisherSpringer Verlag
Pages487-494
Number of pages8
ISBN (Electronic)9783319126395
DOIs
StatePublished - 2014
Event21st International Conference on Neural Information Processing, ICONIP 2014 - Kuching, Malaysia
Duration: 3 Nov 20146 Nov 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8835
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Neural Information Processing, ICONIP 2014
Country/TerritoryMalaysia
CityKuching
Period3/11/146/11/14

Keywords

  • Data mining
  • Information retrieval
  • String matching
  • Text processing
  • Versioning

Fingerprint

Dive into the research topics of 'Document versioning using feature space distances'. Together they form a unique fingerprint.

Cite this