Title page for 89522065


[Back to Results | New Search]

Student Number 89522065
Author Shih-Hsien Wang(¤ý¥@½å)
Author's Email Address No Public.
Statistics This thesis had been viewed 1993 times. Download 953 times.
Department Computer Science and Information Engineering
Year 2001
Semester 2
Degree Master
Type of Document Master's Thesis
Language English
Title Study of Motif Correlation in Proteins by Data Mining
Date of Defense 2002-06-25
Page Count 40
Keyword
  • mining
  • motif
  • protein
  • Abstract In protein sequences, some regions are better conserved than others during evolution. These conserved regions generally play an important role in function or structure of proteins. The knowledge of the correlation between protein motifs should be important in shedding new light on the biological functions of proteins and offering a basis in analyzing the evolution in the human genome or other genomes. The aim here is to find the motif correlation in protein structures. The protein sequences used in this study are from PIR-NREF database and PROSITE database, respectively. We apply data mining approach to discover the correlation of motif in protein sequences.
    Table of Content Content
    Chapter 1 Introduction1
    Chapter 2 Related Work5
    2.1 Protein Databases5
    2.2 Protein domain family database6
    2.3 Protein Structure Related Databases7
    2.4 Association rules8
    Chapter 3 Our Approach10
    3.1 Materials11
    3.2 Preprocessing and Mapping11
    3.3 Mining Association Rules18
    Chapter 4 Results21
    4.1 Environments of Implementation21
    4.2 Mining Result21
    Chapter 5 Discussion29
    Chapter 6 Conclusions33
    References34
    Appendix A37
    Reference [1] Laurent Falquet, Marco Pagni, Philipp Bucher, Nicolas Hulo, Christian J. A. Sigrist, Kay Hofmann, and Amos Bairoch ¡§The PROSITE database, its status in 2002¡¨. Nucl. Acids. Res. 2002 30: 235-238.
    [2] K Hofmann, P Bucher, L Falquet, and A Bairoch. "The PROSITE database, its status in 1999". Nucl. Acids. Res. 1999, 27: 215-219.
    [3] A Bairoch, P Bucher, and K Hofmann. ¡§The PROSITE database, its status in 1997¡¨. Nucl. Acids. Res. 1997 25: 217-221.
    [4] A Bairoch, P Bucher, and K Hofmann. ¡§The PROSITE database, its status in 1995¡¨. Nucl. Acids. Res. 1996 24: 189-196.
    [5] Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Kevin L. Howe, and Erik L. L. Sonnhammer . "The Pfam Protein Families Database". Nucl. Acids. Res. 2000, 28: 263-266.
    [6] T. K. Attwood, M. J. Blythe, D. R. Flower, A. Gaulton, J. E. Mabey, N. Maudling, L. McGregor, A. L. Mitchell, G. Moulton, K. Paine, and P. Scordis. "PRINTS and PRINTS-S shed light on protein ancestry". Nucl. Acids. Res. 2002, 30: 239-241.
    [7] Amos Bairoch and Rolf Apweiler. "The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000". Nucl. Acids. Res. 2000, 28: 45-48.
    [8] Loredana Lo Conte, Bart Ailey, Tim J. P. Hubbard, Steven E. Brenner, Alexey G. Murzin, and Cyrus Chothia . "SCOP: a Structural Classification of Proteins database". Nucl. Acids. Res. 2000, 28: 257-259.
    [9] R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, E. Birney, M. Biswas, P. Bucher, L. Cerutti, F. Corpet, M. D. R. Croning, R. Durbin, L. Falquet, W. Fleischmann, J. Gouzy, H. Hermjakob, N. Hulo, I. Jonassen, D. Kahn, A. Kanapin, Y. Karavidopoulou, R. Lopez, B. Marx, N. J. Mulder, T. M. Oinn, M. Pagni, F. Servant, C. J. A. Sigrist, and E. M. Zdobnov. "InterPro-an integrated documentation resource for protein families, domains and functional sites". Bioinformatics. 2000, 16: 1145-1150.
    [10] A Elofsson and EL Sonnhammer . "A comparison of sequence and structure protein domain families as a basis for structural genomics". Bioinformatics. 1999, 15: 480-500.
    [11] Ernst Kretschmann, Wolfgang Fleischmann, and Rolf Apweiler. "Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT". Bioinformatics. 2001, 17: 920-926.
    [12] SR Eddy. "Profile hidden Markov models". Bioinformatics. 1998, 14: 755-763.
    [13] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Mining association rules between sets of items in large databases", in Proc. of the ACM SIGMOD Conference on Management of Data, 1993
    [14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo, "Finding Interesting Rules from Large Sets of Discovered Association Rules", CIKM, 1994, 401-407.
    [15] F.C. Tseng and C.C. Hsu, "Generating Frequent Patterns with the Frequent Pattern List", PAKDD 2001
    [16] S. J. Wheelan, A. Marchler-Bauer, and S. H. Bryant. "Domain size distributions can predict domain boundaries". Bioinformatics. 2000, 16: 613-618.
    [17] Cathy H. Wu, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Kali C. Lewis, Hans-Werner Mewes, Bruce C. Orcutt, Baris E. Suzek, Akira Tsugita, C. R. Vinayaka, Lai-Su L. Yeh, Jian Zhang, and Winona C. Barker. ¡§The Protein Information Resource: an integrated public resource of functional annotation of proteins¡¨. Nucleic Acids Res. 2002, 30,35-37.
    [18] John Westbrook, Zukang Feng, Shri Jain, T. N. Bhat, Narmada Thanki, Veerasamy Ravichandran, Gary L. Gilliland, Wolfgang Bluhm, Helge Weissig, Douglas S. Greer, Philip E. Bourne and Helen M. Berman. ¡§ The Protein Data Bank: unifying the archive¡¨. Nucleic Acids Res. 2002, 30,245-248.
    [19] K Karplus, C Barrett, and R Hughey. ¡§Hidden Markov models for detecting remote protein homologies¡¨. Nucleic Acids Res. 1998, 14,846-856.
    [20] Pearl, F.M.G, Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. ¡§Assigning genomic sequences to CATH¡¨ Nucleic Acids Res. 2000, 1. 277-282
    Advisor
  • Jorng-Tzong Horng(¬x¬³©v)
  • Files
  • 89522065.pdf
  • approve immediately
    Date of Submission 2002-06-25

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.