Title page for 91225003


[Back to Results | New Search]

Student Number 91225003
Author Chien-Chang Wen(溫建璋)
Author's Email Address No Public.
Statistics This thesis had been viewed 1589 times. Download 896 times.
Department Graduate Institute of Statistics
Year 2003
Semester 2
Degree Master
Type of Document Master's Thesis
Language zh-TW.Big5 Chinese
Title 龐大資料集之混合模型分析
Date of Defense 2004-06-03
Page Count 45
Keyword
  • 混合模型
  • Abstract 摘 要
      隨著科技的進步、資訊普及化及生活品質的提升,各行各業的資料可能數以”億”計。但在龐大集資料分析上,受計算工具儲存容量的限制,使得傳統方法變的不可行。本文提出利用分段加權平均法來取代傳統常態混合模型及混合線性迴歸模型中之參數估計。我們將資料予以分組,先在各區段中以E-M演算法得參數之最大概似估計,再將各段參數估計量的變異數加入考慮,使得較大變異區段之估計量具有較小的權重,進而探討估計量之性質。另外提出在龐大資料下決定成份個數的方法。
    Table of Content 目 錄
    第一章 緒論1
    1.1研究動機1
    1.2  文獻回顧及研究方法2
    第二章 常態混合模型7
    2.1  已知成份個數時之參數估計7
    2.2  模型選擇12
    2.3  分類預測14
    第三章 混合線性迴歸模型15
    3.1  常數成份機率之混合線性迴歸模型15
    3.2  成份機率之混合線性迴歸模型18
    第四章 分段加權法21
    4.1  分段加權平均法21
    4.2  龐大資料集之模型選擇23
    第五章 模擬結果及實例分析25
    5.1  常態混合模型相關分段之模擬25
    5.1.1 固定成份個數時模型之參數估計25
    5.1.2 成份個數未知時之模型選擇28
    5.2  混合線性迴歸模型之相關模擬32
    5.3  信用卡實例分析38
    第六章 討論及未來研究方向42 
    參考文獻43
    表 目 錄
    表5.1:常態混合模型使用之加權平均估計27
    表5.2:常態混合模型使用不同加權平均之加權估計量區間之覆蓋
    率及區間長度比(信賴係數為95%) 27
    表5.3:模擬資料為(5.1)時之判斷鑑別27
    表5.4:(5.2)模擬資料中不同的 , 組合下各成份個數被選取的
    比率 29
    表5.5: ,在不同的 , 組合下各成份個數
    被選取的比率 31
    表5.6: ,在不同的 , 組合下各成份個數被
    選取的比率31
    表5.7:模型I中使用不同加權平均之參數估計值33
    表5.8:模型I成份個數選擇之支持比率(模擬次數1000)34
    表5.9:模型I中使用最佳權重與等量權重之加權估計區間估計之
    覆蓋率及區間長度比 34
    表5.10:模型I模擬資料 時之判斷鑑別35
    表5.11:模型II成份個數選擇之支持比率(模擬次數1000) 36
    表5.12:模型II中使用不同加權平均之參數估計值36
    表5.13:模型II中使用最佳權重與等量權重之加權估計區間估計之
    覆蓋率及區間長度比37
    表5.14:模型II模擬資料 時之判斷鑑別37
    表5.15:區段中月刷卡金額成份個數選擇之支持比率( ) 39
    表5.16:月刷卡金額之混合分佈模型中參數之加權平均估計40
    表5.17:月刷卡金額對家庭月收入之成份個數選擇支持比率41
    表5.18:月刷卡金額對家庭月收入之參數估計41
    表5.19:成份個數 (即簡單迴歸模型)之參數估計41
    圖 目 錄
    圖5.1:模型(5.2)模擬樣本之密度函數估計圖29
    圖5.2:模型I之反應變數 對解釋變數 的散佈圖33
    圖5.3:月刷卡金額直方圖39
    Reference 參考文獻
    1.Casella, G. and Berger, R. L. (2001). “Statistical Inference.” 2nd ed ,  
     Duxbury.
    2.Celeux, G. and Diebolt, J. (1985). “The SEM algorithm: a probabilistic 
     teacher algorithm derived from the EM algorithm for the mixture problem.”
     Comput. Statist. , 2, 73-82
    3.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). “Maximum likelihood
     from incomplete data via the EM algorithm(with discussion).” Journal of the
     Royal Statistical Society, Ser. B, 39, 1-38.
    4.Diebolt, J. and Robert, C. P. (1994). “Estimation of Finite Mixture
     Distribution by Bayesian sampling.” Journal of the Royal Statistical
     Society, Ser. B, 57, 357-384.
    5.Everitt, B. and Hand, D. (1981). “Finite Mixture Distribution.” London: 
     Chapman and Hall.
    6.Feng, Z. D. and Mcculloch, C. E. (1996). “Using Bootstrap Likelihood Ratios
     in Finite Mixture Models.” Journal of the Royal Statistical society, Series
     B, 58, No. 3. 609-617.
    7.Fraley, C. and A. E. Raftery. (1998). “How many clusters? Which clustering
     method? Answers via model-based cluster analysis.” Computer Journal,41,578-
     588,1998
    8.Hurn, M. , Justel, A. and Robert, C. P. (2003). “Estimating Mixtures of
     Regressions.” Journal of Computational and Graphical Statistics, Volume 12,
     Number 1, 55-79.
    9.Leytham, K. M. (1984). “Maximum Likelihood Estimates for the Parameters of
     Mixture Distributions.” Water resources Research, Vol. 20, NO. 7, 896-902.
    10.Li, R., Lin, D. K. J., and Li, B. (2003).”Statistical Inference on Large
      Data Sets.” Knowledge Discovery, forthcoming.
    11.Louis, T. A. (1982). “Finding the observed information matrix when using
      the EM algorithm.” Journal of the Royal Statistical society, Series B, 44,
      226-233
    12.McGilchrist, C. A., Yau, K.K.W., 1995. “The derivation of BIUP, ML, REML
      estimation methods for generalized linear mixed models.” Commum. Statist.-
      Theory Method 24, 2963-2980.
    13.McLachlan, G. J. (1987). “On bootstrapping the likelihood ratio test
      statistics for the number of components in a normal mixture.” Appl.
      Statist. , 36, 318-324.
    14.McLachlan, G. J. and Basford K. E. (1988). “Mixture Models: Inference and
      Applications to Clustering.” New York: Marcel Dekker.
    15.McLachlan, G. J. and Peel, D. (1997). “On a resampling approach to choosing
      the number of components in normal mixture models.” In Computing Science
      and Statistics Vol. 28, L. Billard and N. I. Fisher (Eds.). Fairfax Station,
      Virginia: Interface Foundation of North America, 260-266.
    16.Quandt, R. E. , and Ramsey, J. B. (1978). “Estimating Mixtures of Normal 
      Distributions and Switching Regressions.” Journal of the American
      Statictical Association, 73, 730-752
    17.Roeder, C. and Wasserman, L. (1997). “Practical Bayesian Density Estimation
      Using Mixtures of Normals.” Journal of the American Statictical
      Association,92,894-902 
    18.Schwarz, G. (1978). “Estimating the Dimension of a Model.” The Annals of
      Statistics, 6, 461-464.
    19.Tanner, M. D. and Wong, W. (1987) “The calculation of posterion
      distributions by data augmentation(with discussion).” J. Am. Statist. Ass.,
      82 528-550
    20.Titterington, D., Smith, A. F. M. and Makov,U. (1985) “Statistical Analysis
      of Finite Mixture Distributions.” New York: Wiley.
    21.Volinsky C.T. and Raftery A. E. (1998). “Bayesian Information Criterion for
      Censored Survival Models.” Biometrics: Vol. 56, No. 1, 256–262
    22.West, M. (1992) “Midelling with mixtures.” In Bayesian Statistics 4(eds J.
      M. Dernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith). Oxford: Oxford
      University Press.
    23.Yau, K. K. W. , Lee, A. H. and Ng, A. S. K. (2003). “Finite Mixture
      Regression Model with Random Effects: Application to Neonatal Hospotal
      Length of Stay.” Computational Statistics and Data Analysis, 41, 359-366.
    24.Zen, M. M., Lin, Y. H. and Lin, D. K. (2003). “Simple Linear Regression for
      Large Data sets.” Tech. Report.國立成功大學統計研究所。
    25.邵利雅(2003)。”龐大資料集之線性迴歸分析”。國立中央大學統計研究所,碩士
      論文。
    Advisor
  • Tsai-Hung Fan(樊采虹)
  • Files
  • 91225003.pdf
  • approve immediately
    Date of Submission 2004-06-14

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.