Title page for 985202060


[Back to Results | New Search]

Student Number 985202060
Author Zhen-yu Gu(辜振禹)
Author's Email Address gtsr8888@gmail.com
Statistics This thesis had been viewed 483 times. Download 6 times.
Department Computer Science and Information Engineering
Year 2010
Semester 2
Degree Master
Type of Document Master's Thesis
Language English
Title New Segmentation Method and Acoustical Features for Unsupervised Audio Change Detection
Date of Defense 2011-07-21
Page Count 73
Keyword
  • speaker change detection
  • speaker segmentation
  • Abstract Audio segmentation can be divided into two categories which are speech segmentation and environmental sound segmentation. It divides an audio stream into many segments and there is only one speaker or one environmental sound in each segment.
    In speaker segmentation, this thesis proposes a new concept that turns traditional speaker change detection problem into speaker verification problem. In order to solve the problem of insufficient training data, we use support vector machine (SVM) to train the speaker models. Because SVM has a computational load in training, we adopt a two stage search strategy. In the first stage, generalized likelihood ratio is used to find the change point candidates. In the second stage, we confirm it by the proposed SVM based adjacent window similarity criterion. In the experimental results, the performance of the proposed SVM based adjacent window similarity criterion is better than conventional Bayesian information criterion (BIC).
    Considering the acoustical features, we use MFCC to do the speaker segmentation. As for the environmental sound, we propose a feature set based on non-uniform scale frequency map (SFM). This feature is obtained by decomposing an audio signal by matching pursuit algorithm. Experimental results demonstrates that the proposed non-uniform SFM based feature set is more noise robust than MFCC in environmental sound segmentation.
    Table of Content Abstract in Chinese...................................................................................................................I
    Abstract in English..................................................................................................................II
    ACKNOWLEDGMENTS..................................................................................................... III
    Contents...................................................................................................................................IV
    List of Figures...........................................................................................................................V
    List of Tables.........................................................................................................................VII
    Explanation of symbol.........................................................................................................VIII
    Chapter 1 Introduction1
    1-1 Motivation1
    1-2 Research Background and Purpose1
    1-3 Thesis Outline3
    Chapter 2 Related Works4
    2-1 Speech Feature Extraction4
    2-2 Strategy of Searching Change Point8
    2-3 Related Research Method12
    Chapter 3 Speaker Change Detection21
    3-1 Support Vector Machine (SVM)21
    3-2 K-Means Algorithm29
    3-3 Speaker Change Detection Algorithms30
    3-4 Experimental Results38
    Chapter 4 Environmental Sound Change Detection46
    4-1 Non-Uniform Scale-Frequency Map46
    4-2 SFM Descriptors51
    4-3 Experimental results54
    Chapter 5 Conclusion58
    Reference [1]S. Wegmann, P. Zhan, and L. Gillick, “Progress in broadcast news transcription at dragon systems,” IEEE International Conference on Acoustics, Speech, Signal Processing, vol. 1, pp. 33-36, Mar 1999.
    [2]Z. Zhang , S. Furui , and K. Ohtsuki, “On-line incremental speaker adaptation for broadcast news transcription,” Speech Communication, vol. 37, no. 3-4, pp. 271-281, July 2002.
    [3]J. Gauvain, L. Lamel, and G. Adda, “The LIMSI broadcast news transcription system,” Speech Communication, vol. 37, no. 1-2, pp. 89-108, 2002.
    [4]K. Mori and S. Nakagawa, “Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 413-416, May 2001.
    [5]R. Huang and J. H. L. Hansen, “Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 907-919, May 2006.
    [6]K. Park, Jeong-sik Park, and Y. H. Oh, “GMM adaptation based online speaker segmentation for spoken document retrieval,” IEEE Transactions on Consumer Electronics, vol.56, no.2, pp.1123-1129, May 2010.
    [7]L. Couvreur and J.M. Boite, “Speaker tracking in broadcast audio material in the framework of the THISL project,” Workshop Accessing Information in Spoken Audio, pp. 84-89, 1999.
    [8]L. Lu and H. J. Zhang, “Speaker change detection and tracking in real-time news broadcasting analysis,” 10th ACM International Conference on Multimedia, pp. 602-610, Dec. 2002.
    [9]L. Lu and H. J. Zhang “Unsupervised speaker segmentation and tracking in real-time audio content analysis,” Multimedia Systems , vol. 10, no. 4, pp. 332-343, April 2005
    [10]B. W. Zhou, and John H. L. Hansen, “Unsupervised audio stream segmentation and clustering via the Bayesian Information criterion,“ International Conference Spoken Language Processing , vol.1, pp. 714-717, 2000.
    [11]A. Tritschler and R. Gopinath, “Improved speaker segmentation and segments clustering using the Bayesian Information Criterion,” European Conference Speech Communication Technology, pp.679-682, 1999.
    [12]M. Siegler, U. Jain, B.Raj, and R. Stern, “Automatic segmentation, classification and clustering of broadcast news audio,” DARPA Speech Recognition Workshop, pp. 97-99, Feb 1997.
    [13]M. Cettolo, “Segmentation, classification and clustering of an Italian broadcast news corpus,“ Sixth RIAO-Content-Based Multimedia Information Access Conference, pp. 281-372, 2000.
    [14]T. Kemp, M. Schmidt, M. Westphal, and A. Waibel, “Acoustics, strategies for automatic segmentation of audio data,” IEEE International Conference Acoustics, Speech, Signal Process, vol. 3, pp. 1423-1426, June 2000.
    [15]S. Meignier, J.-F. Bonastre, and S. Igounet, “E-HMM approach for learning and adapting sound models for speaker indexing,” Speaker Odyssey—The Speaker Recognition Workshop, pp. 175-180, 2001.
    [16]D. Moraru, S. Meignier, C. Fredouille, L. Besacier, and J.F. Bonastre, “The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation,” IEEE International Conference Acoustics, Speech, and Signal Processing, vol. 1, pp. I-373-I-376, May 2004.
    [17]S. Chen and P. Gopalakrishnan, “Speaker, environment and channel change detection and clustering via the Bayesian information criterion,” DARPA Broadcast News Transcription Understanding Workshop, pp. 127-132, 1998.
    [18]S.S. Cheng and H.M. Wang, “A sequential metric-based audio segmentation method via the Bayesian information criterion,” European Conference Speech Communication and Technology , pp. 945-948, 2003.
    [19]G. Schwarz, “Estimating the dimension of a model,“ The Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978.
    [20]J. W. Hung, H. M. Wang, and L. S. Lee, “Automatic metric-based speech segmentation for broadcast news via principal component analysis“, 2000 International Conference on Spoken Language Processing, 1998.
    [21]J. F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, and C. Wellekens, “A speaker tracking system based on speaker turn detection for NIST evaluation,” IEEE International Conference Acoustics, Speech, Signal Process, vol. 2, pp. 1177-1180, 2000.
    [22]D. Liu and F. Kubala, “Fast speaker change detection for broadcast news transcription and indexing,” European Conference Speech Communication and Technology, pp. 1031-1034, Sept. 1999.
    [23]P. Delacourt and C. J. Wellekens, “DISTBIC: a speaker-based segmentation for audio data indexing,” Speech Communication, vol. 32, no. 1-2, pp. 111-126, Sept. 2000.
    [24]Mohamed Kamal Omar, Upendra Chaudhari, Ganesh Ramaswamy, “Blind change detection for audio segmentation,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, no., pp. 501- 504, March 18-23, 2005.
    [25]Michele Basseville and Igor V. Nikiforov, Detection of Abrupt Changes: Theory and Application, Prentice-Hall, Inc. Upper Saddle River, NJ, USA 1993.
    [26]V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, pp 988-999, 1999.
    [27]S. G, Mallat and Zhifeng Zhang, “Matching pursuits with time-frequency dictionaries”, IEEE Transactions on Signal Processing, vol. 41, no.12, pp.3397-3415,1993.
    [28]王小川,語音訊號處理,修訂版,全華圖書股份有限公司,台北縣,民國96年。
    [29]蘇峻慶,錄音資料中語者切割與分群方法之研究, 清華大學 , 碩士論文 , 民國94年。
    [30]S. S. Cheng, H. M. Wang, and H. C.Fu, “BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 18,pp. 141 - 157 , JAN. 2010.
    Advisor
  • Jia-ching Wang(王家慶)
  • Files
  • 985202060.pdf
  • disapprove authorization
    Date of Submission 2011-08-23

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.