Title page for 965202056


[Back to Results | New Search]

Student Number 965202056
Author Che-wei Sung(宋哲偉)
Author's Email Address No Public.
Statistics This thesis had been viewed 1063 times. Download 657 times.
Department Computer Science and Information Engineering
Year 2008
Semester 2
Degree Master
Type of Document Master's Thesis
Language zh-TW.Big5 Chinese
Title Affective Classification of Movie Scenes Based on Artificial Neural Network
Date of Defense 2009-06-29
Page Count 61
Keyword
  • affective computing
  • video content-based analysis
  • Abstract With the development of technology, digital video collections are growing rapidly in recent years. More and more movies are released around the world and play an important role in our life. How to analyze the huge content to help viewers search a specific type of video effectively becomes one of major issues. In general, earlier video content-based analysis includes object-based classification, genre-based classification and event-based classification. With the growing of affective computing, emotion-based classification is also emphasized because the audiovisual cues in movies are helpful for affective content.
    The purpose of this study is to construct an affective classification of movie scenes through video content-based analysis. First, a dataset of 119 different scenes from eleven movies were labeled manually and each scene can be described by multiple emotional labels, instead of single label as earlier studies. Fifty audiovisual features were extracted from all scenes for our classifier, self-organizing feature map. Then the hierarchical agglomerative algorithm was employed to merge similar clusters into groups. We implement the classification result to construct a retrieval system such that users can view movie scenes with similar emotion content.
    The experiments showed that the average recall and average precision achieves 70%. It was turned out our study is an efficient way.
    Table of Content 摘要I
    AbstractII
    致謝III
    目錄IV
    圖目錄VI
    表目錄VII
    第一章 緒論1
    1.1 研究背景1
    1.2 研究動機1
    1.3 研究目的2
    1.4 論文架構2
    第二章 文獻探討3
    2.1 影片內容式分析3
    2.2 情意計算5
    2.3 自我組織特徵映射圖網路8
    第三章 系統實作10
    3.1系統架構10
    3.2 特徵擷取11
    3.2.1 視覺特徵11
    3.2.2 聽覺特徵20
    3.2.3 特徵擷取小結24
    3.3 SOM網路分群25
    3.3.1 晶格狀初始網路鍵結值25
    3.3.2 SOM網路學習26
    3.3.3 合併鄰近群27
    3.4 相似度計算28
    第四章 實驗結果與討論29
    4.1 測試資料與相關參數設定29
    4.2 得勝者公式之比較32
    4.3 階層式聚合演算法合併結果36
    4.4 與其他演算法之比較40
    第五章 總結43
    5.1 結論43
    5.2 研究貢獻43
    5.3 未來工作43
    參考文獻45
    Reference Bolle, R. M., Yeo, B. L., & Yeung, M. M. (1997). Content-based digital video retrieval. International Broadcasting Convention, 160-165.
    Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.
    Brezeale, D., & Cook, D. J. (2006). Using closed captions and visual features to classify movies by genre. Poster Session of the Seventh International Workshop on Multimedia Data Mining (MDM/KDD2006),
    Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal component analysis of facial expressions. Vision Research, 41(9), 1179-1208.
    Carpenter, G. A., & Grossberg, S. (1987). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics, 26(23), 4919-4930.
    Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. Fourth International Conference on Spoken Language Processing, 1970-1973.
    Dietz, R., & Lang, A. (1999). Affective agents: Effects of agent affect on arousal, attention, liking and learning. Proceedings of Cognitive Technology Conference, San Francisco, CA.
    Ekman, P. (1992). Are there basic emotions. Psychological Review, 99(3), 550-553.
    Ekman, P. (1999). Basic emotions. In Dalgleish, T., & Power, M. J. (Eds.). Handbook of Cognition and Emotion (pp. 45-60). England: Wiley.
    Fischer, S., Lienhart, R., & Effelsberg, W. (1995). Automatic recognition of film genres. Proc. ACM Multimedia, 295-304.
    Gianetti, L. D. (2005). Understanding movies (10th ed.) Pearson Education Canada Inc.
    Gobl, C., & Nı́ Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1-2), 189-212.
    Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 22-30.
    Hanjalic, A. (2004). Content-based analysis of digital video. Norwell, MA: Kluwer Academic Publishers.
    Hanjalic, A. (2006). Extracting moods from pictures and sounds: Towards truly personalized TV. IEEE Signal Processing Magazine, 23(2), 90-100.
    Hanjalic, A., & Xu, L. Q. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143-154.
    Huang, H. Y., Shih, W. S., & Hsu, W. H. (2007). Movie classification using visual effect features. 2007 IEEE Workshop on Signal Processing Systems, 295-300.
    Kang, H. B. (2003). Affective content detection using HMMs. Proceedings of the Eleventh ACM International Conference on Multimedia, 259-262.
    Kobayashi, H., & Hara, F. (1992). Recognition of six basic facial expression and their strength byneural network. IEEE International Workshop on Robot and Human Communication, 1992. Proceedings. 381-386.
    Kohonen, T. (1989). Self-organization and associative memory (3rd ed.). Berlin New York: Springer-Verlag.
    Kohonen, T. (1995). Self-organizing maps. Berlin New York: Springer-Verlag.
    Li, D., Sethi, I. K., Dimitrova, N., & McGee, T. (2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5), 533-544.
    Li, Y., Narayanan, S., & Kuo, C. C. J. (2004). Content-based movie analysis and indexing based on audiovisual cues. IEEE Transactions on Circuits and Systems for Video Technology, 14(8), 1073-1085.
    Liu, Z., Huang, J., & Wang, Y. (1998). Classification TV programs based on audio information using hiddenMarkov model. 1998 IEEE Second Workshop on Multimedia Signal Processing, 27-32.
    Liu, Z., Wang, Y., & Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. The Journal of VLSI Signal Processing, 20(1), 61-79.
    Ortony, A. A., & Collins, A. A. (1988). The cognitive structure of emotions. Cambridge, MA: Cambridge university press.
    Pavlovic, V. I., Sharma, R., & Huang, T. S. (1997). Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 677-695.
    Pfeiffer, S., Fischer, S., & Effelsberg, W. (1997). Automatic audio content analysis. Proceedings of the Fourth ACM International Conference on Multimedia, 21-30.
    Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT press.
    Rasheed, Z., & Shah, M. (2002). Movie genre classification by exploiting audio-visual features of previews. Proceedings of IEEE International Conference on Pattern Recognition, 2, 1086-1089.
    Rasheed, Z., & Shah, M. (2003). Scene detection in hollywood movies and TV shows. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 343-348.
    Rasheed, Z., Sheikh, Y., & Shah, M. (2005). On the use of computable features for film classification. IEEE Transactions on Circuits and Systems for Video Technology, 15(1), 52-64.
    Reilly, W. S. N. (1996). Believable social and emotional agents. Department of Computer Science, Carnegie Mellon University).
    Roach, M. J., Mason, J. D., & Pawlewski, M. (2001). Video genre classification using dynamics. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1557-1560.
    Rui, Y., Huang, T. S., & Mehrotra, S. (1998). Exploring video structure beyond the shots. IEEE International Conference on Multimedia Computing and Systems, 1998. Proceedings. 237-240.
    Satoh, S., Nakamura, Y., & Kanade, T. (1999). Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1), 22-35.
    Saunders, J., Co, L. M., & Nashua, N. H. (1996). Real-time discrimination of broadcast speech/music. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing , 2, 993-996.
    Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 1331-1334.
    Shearer, K., Dorai, C., & Venkatesh, S. (2000). Incorporating domain knowledge with video and voice data analysis in news broadcasts. ACM International Conference on Knowledge Discovery and Data Mining, 46-53.
    Snoek, C. G. M., & Worring, M. (2005). Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1), 5-35.
    Soleymani, M., Chanel, G., Kierkels, J., & Pun, T. (2008). Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia, 228-235.
    Su, M. C., Liu, T. K., & Chang, H. T. (2002). Improving the self-organizing feature map algorithm using an efficient initialization scheme. Tamkang Journal of Science and Engineering, 5(1), 35-48.
    Sudhir, G., Lee, J. C. M., & Jain, A. K. (1998). Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE International Workshop on Content-Based Access of Image and Video Database, 81-90.
    Sugano, M., Furuya, M., Nakajima, Y., & Yanagihara, H. (2004). Shot classification and scene segmentation based on MPEG compressed movie analysis. IEEE Pacific Rim Conf. on Multimedia (PCM) 2004, 271-279.
    Sugano, M., Isaksson, R., Nakajima, Y., & Yanagihara, H. (2003). Shot genre classification using compressed audio-visual features. Proceedings of IEEE International Conference Image Processing, 2, 17-20.
    Tao, J., & Tan, T. (2005). Affective computing: A review. Proceedings of the First International Conference on Affective Computing & Intelligent Interaction (ACII’05). LNCS 3784. Springer, 981-995.
    Vasconcelos, N., & Lippman, A. (2000). Statistical models of video structure for content analysis and characterization. IEEE Transactions on Image Processing, 9(1), 3-19.
    Wactlar, H. D. (2001). The challenges of continuous capture, contemporaneous analysis, and cstomized summarization of video content. Defining a Motion Imagery Research and Development Program Workshop,
    Wang, H. L., & Cheong, L. F. (2006). Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 689-704.
    Wang, Y., Liu, Z., & Huang, J. C. (2000). Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine, 17(6), 12-36.
    Wei, C. Y., Dimitrova, N., & Chang, S. F. (2004). Color-mood analysis of films based on syntactic and psychological models. IEEE International Conference on Multimedia and Expo, 2, 831-834.
    Xiong, Z., Zhou, X. S., Tian, Q., Rui, Y., & Huang, T. S. (2006). Semantic retrieval of video. IEEE Signal Processing Magazine, 23(2), 18-27.
    Yeung, M., Yeo, B. L., & Liu, B. (1998). Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 71(1), 94-109.
    Yoo, H. W. (2008). Retrieval of movie scenes by semantic matrix and automatic feature weight update. Expert Systems with Applications, 34(4), 2382-2395.
    Zhang, H. J., Wu, J., Zhong, D., & Smoliar, S. W. (1997). An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30(4), 643-658.
    Zhang, S., Tian, Q., Jiang, S., Huang, Q., & Gao, W. (2008). Affective MTV analysis based on arousal and valence features. IEEE International Conference on Multimedia and Expo, 1369-1372.
    Zhang, T., & Kuo, C. C. J. (1999). Hierarchical classification of audio data for archiving andretrieving. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 6, 3001-3004.
    Zhao, L., Qi, W., Wang, Y. J., Yang, S. Q., & Zhang, H. J. (2001). Video shot grouping using best-first model merging. Proc. 13th SPIE Symposium on Electronic Imaging--Storage and Retrieval for Image and Video Databases, 262-269.
    廖家慧 (2007)。 基於電影拍攝手法之電影場景情緒探勘。國立政治大學資訊科學研究所碩士論文。
    蔡其澂 (2008)。 開發場景導向之影片式語料庫檢索系統輔助英語口語體理解。國立中央大學網路學習科技研究所碩士論文。
    蘇木春、張孝德 (1999)。 機器學習:類神經網路、模糊系統以及基因演算法則。  台北市:全華科技圖書股份有限公司。
    Advisor
  • Jie-chi Yang(楊接期)
  • Files
  • 965202056.pdf
  • approve in 2 years
    Date of Submission 2009-07-17

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.