Title page for 93441023


[Back to Results | New Search]

Student Number 93441023
Author Yu-Min Su(Ĭ¨|¥Á)
Author's Email Address No Public.
Statistics This thesis had been viewed 1167 times. Download 368 times.
Department Business Administration
Year 2008
Semester 2
Degree Ph.D.
Type of Document Doctoral Dissertation
Language English
Title Weaving the Small Worlds with the Multi-domain Property of Authors and Keywords
Date of Defense 2009-06-17
Page Count 121
Keyword
  • Agglomerative Hierarchical Clustering (AHC)
  • Author co-citation analysis
  • Bridge-keywords
  • Clustering
  • Co-occurrence analysis
  • Co-word analysis
  • Complete author pair algorithm(CAP algorithm)
  • Complete keyword pair algorithm(CKP algorithm)
  • Complete set
  • Computing classification system (CCS)
  • K-means
  • Keyword domains
  • Recommendation systems
  • Abstract Grouping documents or authors into related domains are crucial steps in implementing Knowledge Management. Traditionally, authors and documents are grouped into one domain only. However, there are many applications, authors and documents should be grouped into multiple groups. The dissertation aims to develop a methodology to cluster data items into multiple groups based co-reference data, namely author co-citation data banks and the keywords co-reference data banks. 
    The author co-citation analysis (ACA) method is commonly used to group authors of reference papers. Since the traditional ACA method analyzes only first authors of reference papers, it disregards the contributions of other coauthors and can only group each first author into one cluster. This study proposes an innovative ACA algorithm called ¡§Complete Author Pair (CAP) algorithm¡¨, which groups complete author sets of reference papers into clusters and thus finds authors who may have expertise in more than one area. Firstly, the CAP algorithm is implemented in a data bank that collected paper references from two IS journals during 2001-2003. The results show that the CAP algorithm can identify multi-expertise authors with 70% of precision, recall, and F score when comparing against ACM CCS. The results also show that CAP algorithm with K-means method and the complete linkage method yield the best performance among six clustering methods evaluated in this experiment. Secondly, the CAP algorithm is implemented in two citation data banks that collected paper references from two ACM journals during 2002-2005. The results show that the CAP algorithm in discovering multi-expertise authors runs up to 90% of average precision in each citation bank when comparing against ACM CCS.
    The co-word analysis method is commonly used to cluster related keywords into the same keyword domain. In other words, traditional co-word analysis cannot cluster the same keywords into more than one keyword domain, and disregards the multi-domain property of keywords. This study proposes an innovative keyword co-citation algorithm called ¡§Complete Keyword Pair (CKP) algorithm¡¨, which groups complete keyword sets of reference papers into clusters, and thus finds keywords belonging to more than one keyword domain. These keywords are termed as bridge-keywords. A recommendation system based on CKP can recommend keywords in other domains through the bridge keywords to help users extend the document search area. The CKP algorithm is implemented in a JACM citation bank of source papers from JACM during 2000¡V2006. Results of this study show that the CKP algorithm can discover bridge-keywords with average precision of 80% in the JACM citation bank during 2000¡V2006 when compared against the benchmark of ACM CCS.
    Table of Content ºK­ni
    ABSTRACTiii
    TABLE OF CONTENTSv
    LIST OF FIGURESvii
    LIST OF TABLESix
    1 INTRODUCTION1
    1.1  Motivation1
    1.2  Objectives6
    1.3  Organization of the Dissertation7
    2 RELATED WORK9
    2.1  Author Co-citation Analysis9
    2.2  Co-word Analysis12
    3 METHODOLOGY16
    3.1  Complete Author Pair (CAP) Method16
    3.1.1  Procedure of CAP16
    3.1.2  Definition17
    3.1.3  Creation of co-citation frequency matrix19
    3.1.4  Generation of Pearson¡¦s correlation matrix20
    3.1.5  Generating clusters of complete author sets21
    3.1.6  Deriving author domains23
    3.1.7  Algorithms of Complete Author Pair (CAP)25
    3.1.8  Reducing number of complete author pairs with author support threshold27
    3.2  Complete Keyword Pair (CKP) Method30
    3.2.1  Procedure of CKP30
    3.2.2  Definition31
    3.2.3  Creation of co-citation frequency matrix33
    3.2.4  Generation of Pearson¡¦s correlation matrix34
    3.2.5  Generating clusters of complete keyword sets35
    3.2.6  Deriving keyword domains38
    3.2.7  Algorithms of Complete Keyword Pair (CKP)40
    3.2.8  Reducing number of complete keyword pairs with keyword support threshold41
    3.3  Prototyping of CKP Keyword Recommendation System46
    3.3.1  Query expansion46
    3.3.2  CKP Keyword Recommendation System47
    4 EXPERIMENTS50
    4.1  Benchmark of Effectiveness Evaluation: ACM CCS50
    4.2  Experiment I: Experiment with References in Two IS Journals53
    4.2.1  Citation data banks derived from two IS journals53
    4.2.2  Discovering multi-expertise authors by CAP algorithm employed six clustering methods against ACM CCS53
    4.2.3  Measures56
    4.2.3  Evaluation56
    4.2.4  Discussion57
    4.3  Experiment II: Experiment with References in Two ACM Journals59
    4.3.1  Citation data banks derived from two ACM journals59
    4.3.2  Discovering multi-expertise authors by CAP algorithm against ACM CCS62
    4.3.3  Measures65
    4.3.4  Evaluation66
    4.3.5  Discussion67
    4.4  Experiment III: Experiment with References in JACM69
    4.4.1  Citation data bank derived from JACM69
    4.4.2  Discovering bridge-keywords by CKP algorithm against ACM CCS71
    4.4.3  Measures73
    4.4.4  Evaluation73
    4.4.5  Tuning Parameter K in CKP74
    4.4.6  Analyses of length threshold of complete keyword sets77
    4.4.7  Discussion79
    5 CONCLUSION81
    5.1  Research limitation81
    5.2  Conclusion82
    5.3  Contribution85
    5.4  Future work86
    REFERENCES87
    APPENDIX: ACM CCS93
    Reference Ahlgren, P., Jarneving, B. and Rousseau, R. (2003), ¡§Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient,¡¨ Journal of the American Society for Information Science and Technology, 54(6), 550¡V560.
    Association for Computing Machinery (ACM) (2007), ACM Computing Classification System toc, http://www.acm.org/about/class.
    Association for Computing Machinery (ACM) (2007), The ACM Portal, http://portal.acm.org.
    Avancini, H. and Straccia, U. (2004), ¡§Personalization, collaboration, and recommendation in the digital libraryenvironment CYCLADES,¡¨ Proceedings of the IADIS Conference on Applied Computing, March 2004, 67¡V74.
    Chang, C.C. and Chen, R.S. (2006), ¡§Using data mining technology to solve classification problems: A case study of campus digital library,¡¨ The Electronic Library, 24(3), 307¡V321.
    Chen, H. and Lynch, K.J. (1992), ¡§Automatic construction of networks of concepts characterizing document databases,¡¨ IEEE Transactional on Systems, Man, and Cybermetics, 22(5), 885¡V902.
    Chen, H., Ng, T.D., Martinez, J. and Schatz, B.R. (1997), ¡§A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system,¡¨ Journal of the American Society for Information Science, 48(1), 17¡V31.
    Ding, Y., Chowdhury, G. and Foo, S. (1999), ¡§Mapping the intellectual structure of information retrieval studies: an author cocitation analysis, 1987¡V1997,¡¨ Journal of Information Science, 25(1), 67¡V78.
    Ding, Y., Chowdhury, G. and Foo, S. (2000), ¡§Organising keywords in a web search environment: a methodology based on co-word analysis,¡¨ Proceedings of the 6th International Society for Knowledge Organization (ISKO 6) Conference, 2000, 28¡V34, Toronto, Canada.
    Egghe, L. and Rousseau, R. (1990), Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science, Elsevier Science Publisher, Netherlands.
    Eom, S.B. (1996), ¡§Mapping the intellectual structure of research in decision support systems through author cocitation analysis (1971¡V1993),¡¨ Decision Support Systems, 16(4), 315¡V338.
    Fuhr, N., Gövert, N. and Klas, C.P. (2001), ¡§Recommendation in a collaborative digital library environment,¡¨ Technical Report, University of Dortmund, Germany.
    Gao, X., Murugesan, S. and Lo, B.W.N. (2006), ¡§A simple method to extract key terms,¡¨ Int. J. Electronic Business, 4(3/4), 221¡V238.
    Han, J. and Kamber. M. (2006), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA.
    Haruechaiyasak, C., Shyu, M.L. and Chen, S.C. (2005), ¡§A web-page recommender system via a data mining framework and the Semantic Web concept,¡¨ Int. J. Computer Applications in Technology, 27(4), 298¡V311.
    He, Y. and Hui, S.C. (2000), ¡§Mining a web citation database for author co-citation analysis,¡¨ Information Processing and Management, 38(4), 491¡V508.
    Huang, Y.P., Tsai, C.A., Sandnes, F.E. (2005), ¡§Using association rules for expanding search engine recommendation keywords in English and Chinese queries,¡¨ Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control ISC 2005, 465¡V470, Cambridge, MA, USA.
    Johnson, A.G. (1988), Statistics, Harcourt Brace Jovanovich, Orlando, FL.
    Kitamura, Y., Nanbu, T. and Tatsumi, S. (1999), ¡§A keyword recommendation system for GenBank,¡¨ Genome Informatics, 10, 206¡V207.
    LaBrie, R.C. and St. Louis, R.D. (2006) ¡§Dynamic hierarchies for business intelligence information retrieval,¡¨ Int. J. Internet and Enterprise Management, 3(1), 3¡V23.
    Liang, T.P., Yang, Y.F., Chen, D.N. and Ku, Y.C. (2007), ¡§A semantic-expansion approach to personalized knowledge recommendation,¡¨ Decision Support Systems, Available online.
    Liao, S.H. and Wen, C.H. (2007), ¡§Artificial neural networks classification and clustering of methodologies and applications ¡V literature analysis from 1995 to 2005,¡¨ Expert Systems with Applications, 32(1), 1¡V11.
    Lin, X., White, H.D. and Buzydlowski, J. (2003), ¡§Real-time author co-citation mapping for online searching,¡¨ Information Processing and Management, 39(5), 689¡V706.
    Lindsey, D. (1980), ¡§Production and citation measures in the sociology of science: the problem of multiple authorship,¡¨ Social Studies of Science, 10(2), 145¡V162.
    Lorence, D. and Abraham, J. (2006), ¡§Analysis of semantic search within the domains of uncertainty: using keyword effectiveness indexing as an evaluation tool,¡¨ Int. J. Electronic Healthcare, 2(3), 263¡V276.
    Matsuo, Y. and Ishizuka, M. (2004), ¡§Keyword extraction from a single document using word co-occurrence statistical information,¡¨ Int. J. on Artificial Intelligence Tools, 13(1), 157¡V169.
    McCain, K.W. (1990), ¡§Mapping authors in intellectual space: a technical overview,¡¨ Journal of the American Society for Information Science, 41(6), 433¡V443.
    Nichols, D.M., Twidale, M.B. and Paice, C.D. (1997), ¡§Recommendation and usage in the digital library,¡¨ Technical Report CSEG/2/97, Computing Department, Lancaster University, UK.
    Olivares-Benitez, E., Rodriguez-Salvador, M. and Scharnweber, D. (2005) ¡§Technology mapping of the scientific research in biomaterials: a trends study of years 2000¡V2002,¡¨ Int. J. Technology Intelligence and Planning, 1(3), 306¡V324.
    Persson, O. (2001), ¡§All author citations versus first author citations,¡¨ Scientometrics, 50(2), 339¡V344.
    Roussinov, D. and Zhao, J.L. (2003), ¡§Automatic discovery of similarity relationships through Web mining,¡¨ Decision Support Systems, 35(1), 149¡V166.
    Saviotti, P., de Loose, M.-A., Nesta, L. and Maupertuis, M.-A. (2003) ¡§Knowledge dynamics and the mergers of firms in the biotechnology based sectors,¡¨ Int. J. Biotechnology, 5(3/4), 371¡V401.
    Schatz, B.R., Johnson, E.H., Cochrane, P.A. and Chen, H. (1996), ¡§Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval,¡¨ Proceedings of the 1st ACM International Conference on Digital libraries (Bethesda, MD, March), 1996, 126¡V133, ACM Press, New York, NY.
    Shiri, A.A., Revie, C. and Chowdhury, G. (2002), ¡§Thesaurus-assisted search term selection and query expansion: a review of user-centred studies,¡¨ Knowledge organization, 29(1), 1¡V19.
    Tanaka, M., Nakazono, S., Matsuno, H., Tsujimoto, H., Kitamura, Y. and Miyano, S. (2000), ¡§Intelligent system for topic survey in MEDLINE by keyword recommendation and learning text characteristics,¡¨ Genome Informatics, 11, 73¡V82.
    Tunali, T. and Zincir-Heywood, N. (2004) ¡§A heuristic approach to network optimized mapping of a distributed resource discovery architecture,¡¨ Int. J. Computer Applications in Technology, 19(1), 43¡V50.
    Vezina, R. and Militaru, D. (2004) ¡§Collaborative filtering: theoretical positions and a research agenda in marketing,¡¨ Int. J. Technology Management, 28(1), 31¡V45.
    Villarroel, M., Fuente, P., Pedrero, A., Vegas, J. and Adiego, J. (2002), ¡§Obtaining feedback for indexing from highlighted text,¡¨ The Electronic Library, 20(4), 306¡V313.
    White, H.D. and Griffith, B.C. (1981), ¡§Author cocitation: a literature measure of intellectual structure,¡¨ Journal of the American Society for Information Science, 32(3), 163¡V171.
    White, H.D. and McCain, K.W. (1998), ¡§Visualizing a discipline: an author co-citation analysis of information science, 1972-1995,¡¨ Journal of the American Society for Information Science, 49(4), 327¡V355.
    Whittaker, J., Courtial, J.P. and Law, J. (1989), ¡§Creativity and conformity in science: titles, keywords and co-word analysis,¡¨ Social Studies of Science, 19(3), 473¡V496.
    Yang, C., Yang, K.C. and Yuan, H.C. (2007), ¡§Improving the search process through ontology-based adaptive semantic search,¡¨ The Electronic Library, 25(2), 234¡V248.
    Yang, Y. and Li, J.Z. (2005), ¡§Interest-based recommendation in digital library,¡¨ Journal of Computer Science, 1(1), 40¡V46.
    Zhao, D. (2006), ¡§Going beyond counting first authors in author co-citation analysis,¡¨ Proceedings 68th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 42(1).
    Advisor
  • Ping-Yu Hsu(³\ªÃ·ì)
  • Files
  • 93441023.pdf
  • approve in 3 years
    Date of Submission 2009-06-26

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.