Title page for 89443007


[Back to Results | New Search]

Student Number 89443007
Author Ching-Cheng Shen(沈清正)
Author's Email Address james@msa.vnu.edu.tw
Statistics This thesis had been viewed 2585 times. Download 1003 times.
Department Information Management
Year 2004
Semester 2
Degree Ph.D.
Type of Document Doctoral Dissertation
Language zh-TW.Big5 Chinese
Title Mining generalized knowledge from ordered data through attribute-oriented induction techniques
Date of Defense 2005-06-23
Page Count 78
Keyword
  • Attribute-Oriented Induction
  • Concept Hierarchy
  • Data Mining
  • Ordered Data Dynamic Programming
  • Relational Data
  • Abstract The attribute-oriented induction (AOI for short) method is one of the most important data mining methods. The input of the AOI method contains a relational table and a concept tree (concept hierarchy) for each attribute, and the output is a small relation summarizing the general characteristics of the task-relevant data. Although AOI is very useful for inducing general characteristics, it has the limitation that it can only be applied to relational data, where there is no order among the data items. If the data are ordered, the existing AOI methods are unable to find the generalized knowledge. In view of this weakness, this paper proposes a dynamic programming algorithm, based on AOI techniques, to find generalized knowledge from an ordered list of data. By using the algorithm, we can discover a sequence of K generalized tuples describing the general characteristics of different segments of data along the list, where K is a parameter specified by users.
    Table of Content 目錄
    摘要………………………………………………………………………………………………I
    Abstract……………………………………………………………………………………………II
    目錄………………………………………………………………………………………………IV
    圖目………………………………………………………………………………………………VI
    表目………………………………………………………………………………………………VIII
    第一章 簡介………………………………………………………………………………………1
    第二章 AOI演算法和序列歸納問題定義…………………………………………………………6
    2.1. AOI演算法簡介………………………………………………………………………………6
    2.2. 序列歸納問題的定義…………………………………………………………………………8
    第三章 序列屬性導向歸納演算法(THE ORDERED AOI ALGORITHM)………………………10
    3.1. 序列屬性導向歸納演算法的概述……………………………………………………………10
    3.2. OAOI演算法的第一個階段…………………………………………………………………12
    3.3. OAOI演算法的第二個階段…………………………………………………………………13
    3.4. OAOI演算法的第三個階段…………………………………………………………………16
    3.5. OAOI演算法的第四個階段…………………………………………………………………17
    3.6. OAOI演算法的第五個階段…………………………………………………………………23
    3.7. OAOI演算法的時間複雜度…………………………………………………………………23
    3.8. OAOI演算法的空間複雜度…………………………………………………………………25
    第四章 擴展序列屬性導向歸納演算法(THE EXTENDED ORDERED AOI ALGORITHM)……27
    4.1. 資料量的條件限制…………………………………………………………………………28
    4.2. 共有子孫樹(COMMON-CHILD TREE)………………………………………………………29
    4.3. 資料前處理…………………………………………………………………………………32
    4.4. EOAOI演算法的效能評估…………………………………………………………………33
    4.4.1. 資料產生……………………………………………………………………………………34
    4.4.2. 執行效能-執行時間………………………………………………………………………35
    4.4.3. 產出品質-視窗平滑度的大小(The smoothing degree of the window)……………………37
    第五章 最佳化數值資料概念階層演算法(AN OPTIMAL ALGORITHM FOR BUILDING CONCEPT HIERARCHIES FROM NUMERICAL DATA)……………………………………………………………………………40
    5.1. 數值資料概念階層的問題定義……………………………………………………………43
    5.2. ONCH演算法的說明………………………………………………………………………45
    5.3. ONCH演算法的效能評估…………………………………………………………………51
    5.4. 運用最佳化數值概念階層之EOAOI的品質效能評………………………………………55
    第六章 結論……………………………………………………………………………………58
    參考文獻…………………………………………………………………………………………61

    圖目
    圖1. 原始序列資料量與執行時間關係圖………………………………………………………36
    圖2. 廣義tuples的數量與執行時間關係圖………………………………………………………36
    圖3. ub/lb比率與執行時間的關係圖……………………………………………………………37
    圖4. 一個概念階層的例子………………………………………………………………………43
    圖5. QD和QDM之間的差別……………………………………………………………………47
    圖6. 屬性「信用卡消費金額」的資料分佈……………………………………………………52
    圖7. 不同演算法的執行時間……………………………………………………………………53
    圖8. 不同演算法對屬性「信用卡消費金額」的建樹距離……………………………………53
    圖9. 屬性「客戶每月的消費金額」的資料分佈………………………………………………54
    圖10. 不同演算法對屬性「客戶每月的消費金額」的建樹距離………………………………54
    圖11. 屬性s3的資料分佈圖………………………………………………………………………56
    圖A.1. 屬性「Location of Manufacturer」的概念樹……………………………………………65
    圖A.2. 屬性「Light Vehicle Model」的概念樹…………………………………………………66
    圖A.3. 屬性「Engine Displacement」的概念樹…………………………………………………66
    圖A.4. 屬性「Price」的概念樹…………………………………………………………………67
    圖B. 一棵範圍從0到100的共同子孫樹…………………………………………………………68
    圖C. 屬性s3的概念階層…………………………………………………………………………69
    圖E.1. 屬性「頻率」的概念階層………………………………………………………………76
    圖E.2. 屬性「年齡」的概念階層………………………………………………………………76
    圖E.4. 屬性「個人月收入」的概念階層………………………………………………………76
    圖E.5. 屬性「家庭平均月收入」的概念階層…………………………………………………77
    圖E.6. 屬性「人口數」的概念階層……………………………………………………………77
    圖E.7. 屬性「家庭經濟」的概念階層…………………………………………………………77
    圖E.3. 屬性「職業」的概念階層………………………………………………………………78

    表目
    表格 1. 10個tuples 和 4 屬性的樣本資料表……………………………………………………7
    表格 2. 表格1的資料用AOI 方法歸納後的結果………………………………………………8
    表格 3. 表格1的資料用我們的演算法運算的結果……………………………………………9
    表格 4. 由表格1的資料所計算出的 F(i,r)值……………………………………………………13
    表格 5. 由表格4的資料所計算出的E(i, j, r) 值…………………………………………………16
    表格 6. 由表格5的資料所計算出的DI(i, j) 值…………………………………………………17
    表格 7. 由表格6的資料所計算出D(i, j, s) 矩陣表………………………………………………22
    表格 8. 由表格6的資料所計算出B(i, j, s) 矩陣表………………………………………………22
    表格 9. 6個學科成績的平均值和標準差………………………………………………………34
    表格10(a). 視窗平滑度R = 1的產出結果…………………………………………………………38
    表格10(b). 視窗平滑度R = 2的產出結果………………………………………………………38
    表格10(c). 視窗平滑度R = 10的產出結果………………………………………………………39
    表格10(d). 視窗平滑度R = 50的產出結果………………………………………………………39
    表格11(a). EOAOI2演算法使用最佳概念階層取得的序列特徵………………………………57
    表格11(b). EOAOI2演算法使用等距分割概念階層取得的序列特徵…………………………57
    表格D.1. 原始資料序列歸納的結果……………………………………………………………72
    表格D.2. 最顯著葉節點值佔區段資料百分比的平均…………………………………………73
    表格D.3. 男性持卡人序列歸納的結果…………………………………………………………73
    表格D.4. 女性持卡人序列歸納的結果…………………………………………………………74
    表格D.5. 未婚持卡人序列歸納的結果…………………………………………………………74
    表格D.6. 已婚持卡人序列歸納的結果…………………………………………………………74
    表格D.7. 正常持卡人序列歸納的結果…………………………………………………………75
    表格D.8. 異常持卡人序列歸納的結果…………………………………………………………75
    Reference Cai, Y., Cercone, N., Han, J., 1990. An attribute-oriented approach for learning classification rules from relational databases. In: Proceedings of Sixth International Conference on Data Engineering, pp. 281–288.
    Carter, C.L., Hamilton, H.J., 1995. Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases. In: Proceedings of Seventh International Conference on Tools with Artificial Intelligence, pp. 486–489.
    Carter, C.L., Hamilton, H.J., 1998. Efficient attribute-oriented generalization for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10 (2), 193–208.
    Chaudhuri, S., Dayal., U., 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26, 65-74.
    Chen, M.S., Han, J., Yu, P. S., 1996. Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.
    Cheung, D.W., Hwang, H.Y., Fu, A.W., Han, J., 2000. Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information Systems, 15 (2), 175-200.
    Codd, E.F., Codd, S.B., Salley, C.T., 1993. Beyond decision support. Computer World, 27(30), 87-89.
    Fayyad, U., Irani, K., 1993. Multi-interval discretion of continuous-values attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, Chambery, France, pp.1022-1029.
    Hamilton, H.J., Hilderman, R.J., Cercone, N., 1996. Attribute-oriented induction using domain generalization graphs. In: Proceedings of Eighth IEEE International Conference on Tools with Artificial Intelligence, pp. 246–252.
    Han, J., Cai, Y., Cercone, N., 1992. Knowledge discovery in databases: an attribute-oriented approach. In: Proceedings of International Conference on Very Large Data Bases (VLDB-92), pp. 547-559.
    Han, J., Cai, Y., Cercone, N., 1993. Data-driven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, 5 (1), 29 –40.
    Han, J., Fu, Y., 1994. Dynamic generation and refinement of concept hierarchies for knowledge discovery in database. In: Proceedings of AAAI'94 Workshop Knowledge Discovery in Database, Seattle, WA, pp.157-168.
    Han, J., Fu, Y., 1995. Discovery of multiple-level association rule from large database. In: Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, pp.420-431
    Han, J., Kamber, M., 2001. Data Mining: Concepts and Techniques, Academic Press.
    Han, J., Nishio, S., Kawano, H., Wang, W., 1998. Generalization-based data mining in object-oriented databases using an object-cube model. Data and Knowledge Engineering, 25, 55-97.
    Hu, X., Cercone, N., 1996. Mining knowledge rules from databases: a rough set approach. In: Proceedings of the Twelfth International Conference on Data Engineering, pp. 96–105.
    Lu, W., Han, J., Ooi, B.C., 1993. Discovery of general knowledge in large spatial databases. In: Proceedings of 1993 Far East Workshop on Geographic Information Systems (FEGIS-93), pp. 275-289.
    Kerber, R. 1992. Discretization of numeric attributes. In: Proceedings of Tenth national Conference on Artificial Intelligence, San Jose, California, pp.123-128.
    Kaufman, L., P. J. Rousseeuw. 1990. Finding Group in data: An Introduction to Cluster Analysis. John Wiley & Sons, New York
    MacQueen, J. 1967. Some methods for classification and ayalysis of multivariate observations. In: Proceedings of 5th Berkeley symp. Math. Statist. Prob., 1, pp.281-297.
    McClean, S., Scotney, B., Shapcott, M., 2000. Incorporating domain knowledge into attribute-oriented data mining. International Journal of Intelligent Systems, 15 (6), 535-548.
    Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
    Shan, N., Hamilton, H.J., Cercone, N., 1995. GRG: knowledge discovery using information generalization, information reduction, and rule generation. In: Proceedings of the Seventh International Conference on the Tools with Artificial Intelligence, pp. 372–379.
    Srikant, R. Agrawal. R. 1995. Mining generalized association rules. In: Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, pp.407-419.
    Tsumoto, S., 2000. Knowledge discovery in clinical databases and evaluation of discovered knowledge in outpatient clinic. Information Sciences, 124 (1), 125-137.
    Advisor
  • Yen-Liang Chen(陳彥良)
  • Files
  • 89443007.pdf
  • approve in 2 years
    Date of Submission 2005-06-27

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.