Title page for 975202035


[Back to Results | New Search]

Student Number 975202035
Author Chi-I Kuan(官直毅)
Author's Email Address No Public.
Statistics This thesis had been viewed 356 times. Download 237 times.
Department Computer Science and Information Engineering
Year 2010
Semester 2
Degree Master
Type of Document Master's Thesis
Language zh-TW.Big5 Chinese
Title Schema Matching for Unsupervised Wrapper Maintenance
Date of Defense 2011-07-25
Page Count 47
Keyword
  • Data Integration
  • Schema Matching
  • Wrapper Maintenance
  • Abstract Wrapper refers to program which is used to extract the specific data in web page, researchers can access specific data by wrapper and use information integration to transfer the data to be useful information, then provide a set of integrated network services, systems or data analysis system.
    But the site developers often modify the website because of different needs, this making the original wrapper error that can’t extract data. At this situation, the program developer can just re-write or modify original wrapper to solve. For this reason, unsupervised wrapper induction is widely discussed in recent years. It builds extracted module automatically by the regularity of the dynamic web page and extracted data by such module, so programmer don’t need to write wrapper for specific website every time.
    The problem unsupervised wrapper induction may encounter is its maintenance. If the website changes by time, we will have two extracted data at time t and at time t’. How to identify the related information and integrate them is our goal. We use the instance and structure information which generated by FiVatech (the unsupervised wrapper induction tool we used) to match the correlation attribute.
    Table of Content 中文題要I
    英文提要II
    目錄III
    圖目錄IV
    一.緒論1
    二.相關研究探討4
    2.1.研究背景4
    2.2.綱要映對5
    2.2.1.綱要映對的類型6
    2.2.2.Dual Correlation Mining(DCM)6
    2.2.3.On-the-fly Data Integration of Homogeneous Web Data7
    2.2.4.Combining Schema and Instance Information8
    2.2.5.Improving XML schema matching performance using Prufer sequences9
    三.PRELIMINARY12
    3.1.FIVATECH12
    3.2.符號定義14
    四.系統架構15
    4.1.實例資訊15
    4.1.1.資料型別16
    4.1.2.尋找相同記錄配對18
    4.1.3.選擇候選屬性18
    4.2.結構階層資訊20
    4.2.1.節點順序相似度21
    4.2.2.相鄰節點相似度21
    4.2.3.路徑相似度22
    4.2.4.父節點型態相似度22
    4.3.實例階層相似度和結構階層相似度的結合23
    五.實驗結果24
    5.1.效能評估方法和實驗設計24
    5.2.找尋相同記錄配對時的閥值25
    5.3.實例資訊相似度在不同領域上的表現26
    5.4.測試各種結構相似度的影響29
    5.5.合併實例資訊相似度和結構資訊相似度的效能32
    六.結論與未來研究方向36
    七.參考文獻37
    Reference [1]A. Algergawy, E. Schallehn, G. Saake, A Prufer sequence-based approach for schema matching, in: BalticDB & IS2008, Estonia, 2008.
    [2]A. Algergawy, E. Schallehn, G. Saake. A Sequence-based Ontology Matching Approach. 18th European Conference on Artificial Intelligence Workshop, Greece. 2008.
    [3]A. Algergawy, E. Schallehn, G. Saake. Improving XML schema matching performance using Prufer sequences. Data & Knowledge Engineering, Volume 68, pp. 728–747. 2009.
    [4]A. Algergawy, R. Nayak, G. Saake. Element similarity measures in XML schema matching. Information Sciences Vol.180. pp. 4975-5998. 2010.
    [5]A. Gal, Managing uncertainty in schema matching with top-k schema mappings, Journal on Data Semantics Vol.6 90–114, 2006.
    [6]A. Halevy, A. Rajaraman, J. Ordille. Data Integration: The Teenage Years. Very Large Data Bases, pp. 12-15. 2006.
    [7]B. He, K. C.-C. Chang, and J. Han. Discovering Complex Matching across Web Query Interfaces: A Correlation Mining Approach. In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 148-157, 2004.
    [8]C.-C. Huang, C.-H. Chang. On-the-fly Data Integration of Homogeneous Web Data. Master dissertation, National Central University. 2004.
    [9]C.-H. Chang, M. Kayed, M. R. Girgis, K. Shaalan, A Survey of Web Information Extraction Systems, IEEE TKDE (SCI, EI), Vol. 18, No. 10, pp. 1411-1428. 2006.
    [10]E. Rah, P. A. Bernstein. A survey of approaches to automatically schema matching. The International Journal on Very Large Data Bases, Vol. 10, Issue 4, pp. 334-350. 2001.
    [11]G. Beliakov, A. Pradera, T. Calvo, Aggregation Functions: A Guide for Practitioners, Studies in Fuzziness and Soft Computing, vol. 221, Springer, 2007.
    [12]H. Zhao, Combining schema and instance information for integrating heterogeneous databases: an analytical approach and empirical evaluation, Ph.D. dissertation, University of Arizona, 2002.
    [13]H. Zhao, S. Ram, Clustering schema elements for semantic integration of heterogeneous data sources, Journal of Database Management 15, Vol. 4, pp. 88–106. 2004.
    [14]H. Zhao, S. Ram, Clustering similar schema elements across heterogeneous databases: a first step in database integration. Advanced Topics in Database Research, Vol. 5, pp. 235–256. 2006.
    [15]H. Zhao, S. Ram, Entity identification for heterogeneous database integration—a multiple classifier system approach and empirical evaluation, Information Systems, Vol. 30, pp. 119–132. 2005.
    [16]H. Zhao, S. Ram. Combining schema and instance information for integrating heterogeneous data sources. Data & Knowledge Engineering. 2006.
    [17]J.-H. Li, C.-H. Chang. Differentiating Templates and Data Values from Semi-Structured Web Pages. Master dissertation, National Central University. 2004.
    [18]L.-F. Chang, C.-H. Chung. Generation of Web page Fetchers from Navigation Records. Master dissertation, National Central University. 2005.
    [19]M. Kayed, C.-H. Chang. FiVaTech : Page-Level Web Data Extraction from Template Pages. IEEE Trans. Knowl. Data Eng. Vol. 22, No.2, pp. 249-263, 2010.
    [20]M. Kayed. C.-H. Chang, Page-Level Web Data Extraction from Template Pages IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 2, pp. 249-263, 2010.
    [21]Y.-L. Lin, C.-H. Chung. Page-level Wrapper Verification based on Structure, Semantic and Schema. Master dissertation, National Central University.2010.
    [22]Z. Zhang, B. He, and K. C.-C. Chang. On-the-fly constraint mapping across web query interfaces. In Proceedings of the Very Large Data Bases Workshop on Information Integration on the Web, 2004.
    [23]N. Kushmerick. Wrapper Verification. World Wide Web, vol. 3, no 2, pp. 79–94, 2000.
    Advisor
  • Chia-Hui Chang(張嘉惠)
  • Files
  • 975202035.pdf
  • approve immediately
    Date of Submission 2011-08-29

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.