Artificial Bee Colony With Differential Evolution Algorithm For Feature Extraction And Selection Of Mass Spectrometry Data

Mohamed Yusoff, Syarifah Adilah (2016) Artificial Bee Colony With Differential Evolution Algorithm For Feature Extraction And Selection Of Mass Spectrometry Data. PhD thesis, Universiti Sains Malaysia.

[img]
Preview
PDF
Download (866kB) | Preview

Abstract

Kemajuan dalam teknik spektrometri jisim untuk kajian proteomik telah meningkatkan penemuan pengecaman-bio daripada corak kuantitatif proteomik. Pemprosesan data yang banyak untuk molekul yang terlibat boleh meningkat kepada siri puncak saling berkait dan bertindih di dalam spektrum jisim. Spektrum ini juga mengalami data berdimensi tinggi berbanding saiz sampel yang kecil. Beberapa kajian telah memperkenalkan teknik statistik dan pembelajaran mesin seperti Analisa Komponen Asas ((PCA)), Analisa Komponen Tak Bersandar ((ICA)) dan Analisa Riak Pekali (waveletcoefficient) untuk mengekstrak data yang berpotensi. Namun, tiada satu pun daripada kaedah yang dibincangkan mengambil kira dengan serius masalah kelemahan data yang berdimensi tinggi benbanding saiz sample yang kecil. Kajian ini telah tertumpu kepada dua peringkat dalam analisa spektometri jisim. Pertama, kaedah ciri penyaringan iaitu akan menyaring puncak-puncak yang memberi inferens tentang maksud biologi bagi data tersebut. Anggaran pengecutan bagi kovarians telah di cadangkan untuk mengumpul m/z windows dan mengenalpasti pekali korelasi terbaik antara puncak-puncak bagi data spektometri jisim untuk ciri penyaringan. Kedua, kaedah ciri pemilihan yang mencari ciri-ciri terbaik berdasarkan keputusan yang paling tepat daripada model klasifikasi yang dijanakan. The advancement in mass spectrometry technique for proteomic studies has proliferated the discovery of biomarkers from quantitative proteomics pattern. Highthroughput data for a given molecule can give rise to a series of inter-related and overlapping peaks in a mass spectrum. The spectrum suffers from high dimensionality data relative to small sample size. Several studies have proposed statistical and machine learning techniques such as Principle Component Analysis (PCA), Independent Component Analysis (ICA) and wavelet-coefficient in order to extract the potential features. However, none of these methods take into account the huge number of features relative to small sample size. This study focused on two stages of mass spectrometry analysis. Firstly, feature extraction methods extract peaks as potential features to infer biological meaning of the data. Shrinkage estimation of covariance was proposed to assemble m=z windows and identify the correlation coefficient among peaks of mass spectrometry data for feature extraction. Secondly, feature selection techniques search parsimonious features through a learning model that exhibits the most accurate results.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA75.5-76.95 Electronic computers. Computer science
Divisions: Pusat Pengajian Sains Komputer (School of Computer Sciences) > Thesis
Depositing User: Mr Noorazilan Noordin
Date Deposited: 03 Mar 2017 08:08
Last Modified: 12 Apr 2019 05:25
URI: http://eprints.usm.my/id/eprint/32298

Actions (login required)

View Item View Item
Share