Title page for 93522004


[Back to Results | New Search]

Student Number 93522004
Author Jun-Li Kuo(TQ)
Author's Email Address u9152700@cc.ncu.edu.tw
Statistics This thesis had been viewed 1631 times. Download 1048 times.
Department Computer Science and Information Engineering
Year 2005
Semester 2
Degree Master
Type of Document Master's Thesis
Language English
Title Gene selection by decision tree and classification for microarray gene expression data
Date of Defense 2006-06-15
Page Count 49
Keyword
  • cancer
  • classification
  • gene selection
  • microarray
  • Abstract Gene selection can help to analyze microarray gene expression data. However, it is very difficult to classify a satisfied result by machine learning techniques because of a curse-of-dimensionality problem and an overfitting problem, i.e. the dimension of features is too large but the samples are too few. Therefore, we design a system flow to attempt to avoid the two problems and then select a small set of significant biomarker genes for diagnosis in order to classify correctly. Furthermore, we test on some microarray datasets to demonstrate that our system is useful and reliable according to the good performance.
    Table of Content Chapter 1 Introduction        1
     1.1 Background           2
     1.2 Motivation           4
     1.3 Goal              5
    Chapter 2 Related Works       6
     2.1 Other gene selection methods  6
     2.2 WEKA              8
     2.3 KEGG              9
    Chapter 3 System Flow        12
     3.1 Data input          13
     3.2 Gene Selection        14
      3.2.1 Resampling        14
      3.2.2 Tree gathering      15
      3.2.3 Gene selecting      17
     3.3 Classification        19
    Chapter 4 Materials         22
     4.1 Public datasets        22
     4.2 NTU hospital data       23
    Chapter 5 Results          26
     5.1 The performance for public datasets  26
     5.2 The performance for NTU hospital data 27
      5.2.1 Metastasis diagnosis        27
      5.2.2 Her2-positive diagnosis      30
    Chapter 6 Discussion        33
    References             35
    Appendix              38
    Reference 1.Su, A.I., et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res, 2001. 61(20): p. 7388-93.
    2.Antonov, A.V., et al., Optimization models for cancer classification: extracting gene interaction information from microarray expression data. Bioinformatics, 2004. 20(5): p. 644-52.
    3.http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html.
    4.Wang, Y., et al., HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics, 2005. 21(8): p. 1530-7.
    5.Brown, T.A., Genomes 2nd. 2002.
    6.Qiu, P., Z.J. Wang, and K.J. Liu, Ensemble dependence model for classification and prediction of cancer and normal gene expression data. Bioinformatics, 2005. 21(14): p. 3114-21.
    7.Aronow, B.J., B.D. Richardson, and S. Handwerger, Microarray analysis of trophoblast differentiation: gene expression reprogramming in key gene function categories. Physiol Genomics, 2001. 6(2): p. 105-16.
    8.Choi, J.K., et al., Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics, 2005. 21(24): p. 4348-55.
    9.Brennan, D.J., et al., Application of DNA microarray technology in determining breast cancer prognosis and therapeutic response. Expert Opin Biol Ther, 2005. 5(8): p. 1069-83.
    10.Li, T., C. Zhang, and M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 2004. 20(15): p. 2429-37.
    11.Li, X., et al., Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Res, 2004. 32(9): p. 2685-94.
    12.Bae, K. and B.K. Mallick, Gene selection using a two-level hierarchical Bayesian model. Bioinformatics, 2004. 20(18): p. 3423-30.
    13.Buturovic, L.J., PCP: a program for supervised classification of gene expression profiles. Bioinformatics, 2006. 22(2): p. 245-7.
    14.Yeung, K.Y., R.E. Bumgarner, and A.E. Raftery, Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics, 2005. 21(10): p. 2394-402.
    15.Ein-Dor, L., et al., Outcome signature genes in breast cancer: is there a unique set? Bioinformatics, 2005. 21(2): p. 171-8.
    16.Statnikov, A., et al., A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 2005. 21(5): p. 631-43.
    17.Antonov, A.V., et al., Exploiting scale-free information from expression data for cancer classification. Comput Biol Chem, 2005. 29(4): p. 288-93.
    18.Chu, W., et al., Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics, 2005. 21(16): p. 3385-93.
    19.Golub, T.R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999. 286(5439): p. 531-7.
    20.Li, J., et al., Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics, 2003. 19 Suppl 2: p. II93-II102.
    21.Kanehisa, M., The KEGG database. Novartis Found Symp, 2002. 247: p. 91-101; discussion 101-3, 119-28, 244-52.
    22.Witten, I.H. and E. Frank, Data mining: practical machine learning tools and techniques with Java implementations. 1999.
    23.Kanehisa, M., et al., The KEGG databases at GenomeNet. Nucleic Acids Res, 2002. 30(1): p. 42-6.
    24.Papaldo, P., et al., A phase II study on metastatic breast cancer patients treated with weekly vinorelbine with or without trastuzumab according to HER2 expression: changing the natural history of HER2-positive disease. Ann Oncol, 2006. 17(4): p. 630-6.
    25.King, A., Major developments in adjuvant treatment of early HER2-positive breast cancer. Nat Clin Pract Oncol, 2006. 3(1): p. 10-1.
    26.Nabholtz, J.M., et al., HER2-positive breast cancer: update on Breast Cancer International Research Group trials. Clin Breast Cancer, 2002. 3 Suppl 2: p. S75-9.
    27.Kunitomo, K., et al., A case of metastatic breast cancer with outgrowth of HER2-negative cells after eradication of HER2-positive cells by humanized anti-HER2 monoclonal antibody (trastuzumab) combined with docetaxel. Hum Pathol, 2004. 35(3): p. 379-81.
    28.Quinlan, R., C4.5: Programs for Machine Learning. 1993.
    29.Freund, Y. and L. Mason, The alternating decision tree learning algorithm. 1999.
    30.Platt, J., et al., Fast Training of Support Vector Machines using Sequential Minimal Optimization. 1998.
    31.Keerthi, S.S., et al., Improvements to Platt's SMO Algorithm for SVM Classifier Design. 2001.
    32.Mao, X., et al., Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics, 2005. 21(19): p. 3787-93.
    33.Harhay, G.P. and J.W. Keele, Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics, 2003. 19(2): p. 249-55.
    Advisor
  • Jorng-Tzong Horng(xv)
  • Files
  • 93522004.pdf
  • approve immediately
    Date of Submission 2006-07-18

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.