Early detection of lung cancer based on sputum color image analysis

  • Fatma Taher

Student thesis: Doctoral Thesis

Abstract

Lung cancer continues to rank as the leading cause of cancer deaths worldwide. One of the most promising techniques for early detection of cancerous cells relies on sputum cell analysis. This was the motivation behind the design and the development of a new computer aided diagnosis (CAD) system for early detection of lung cancer based on the analysis of sputum color images. The proposed CAD system encompasses of four main processing steps. First is the preprocessing step which utilizes a heuristic rule-based algorithm and a Bayesian classification method using the histogram analysis. In this step, the region of interest (ROI) representing the sputum cell is detected and extracted. Then, in the second step, the mean shift segmentation is applied to segment the nuclei from the cytoplasm. The third step is the feature analysis. In this step, geometric and chromatic features are extracted from the nucleus region. These features are used in the diagnostic process of the sputum images. Finally, the diagnosis is done using a rule-based algorithm alongside the neural network and support vector machine (SVM) for classifying the cells into benign or malignant. For each step, different aspects of technical issues, methodologies, implemented training, testing dataset and validation methods are handled, as well as performance comparison via a series of experiments.A database of 100 sputum color images collected by the Tokyo center of lung cancer from different patients was used to test the new CAD system. In the extraction process, it was found that the Bayesian classification outperforms the heuristic rule-based classification. The Bayesian classification attains a consistent accuracy of 98%. In the segmentation process, the mean shift approach significantly outperforms the Hopfield neural network (HNN) technique. The integration of both spatial and chromatic information further improved the segmentation performance. In the classification process: it was found that the SVM achieved a higher classification rate compared to rule-based and ANN classifiers. The final results showed that the techniques used outperformed conventional methods. The proposed CAD system achieved a reasonable accuracy above 95% with high true positive rates that can basically meet the requirement of clinical diagnosis.
Date of Award2014
Original languageAmerican English
SupervisorHussain Al Ahmad (Supervisor)

Keywords

  • Lung cancer
  • sputum images
  • computer aided diagnosis (CAD) system
  • Bayesiantheorem
  • segmentation
  • neural network classifier
  • support vector machine

Cite this

'