skip to main content
10.1145/1646396.1646439acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
poster

Movie segmentation into scenes and chapters using locally weighted bag of visual words

Published: 08 July 2009 Publication History

Abstract

Movies segmentation into semantically correlated units is a quite tedious task due to "semantic gap". Low-level features do not provide useful information about the semantical correlation between shots and usually fail to detect scenes with constantly dynamic content. In the method we propose herein, local invariant descriptors are used to represent the key-frames of video shots and a visual vocabulary is created from these descriptors resulting to a visual words histogram representation (bag of visual words) for each shot. A key aspect of our method is that, based on an idea from text segmentation, the histograms of visual words corresponding to each shot are further smoothed temporally by taking into account the histograms of neighboring shots. In this way, valuable contextual information is preserved. The final scene and chapter boundaries are determined at the local maxima of the difference of successive smoothed histograms for low and high values of the smoothing parameter respectively. Numerical experiments indicate that our method provides high detection rates while preserving a good tradeoff between recall and precision.

References

[1]
V. Chasanis, A. Likas, and N. Galatsanos. Efficient video shot summarization using an enhanced spectral clustering approach. In ICANN '08: Proceedings of the 18th international conference on Artificial Neural Networks, Part I, pages 847--856, Berlin, Heidelberg, 2008. Springer-Verlag.
[2]
A. Del Bimbo. Visual information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999.
[3]
C.-R. Huang, C.-S. Chen, and P.-C. Chung. Contrast context histogram - a discriminating local descriptor for image matching. Pattern Recognition, International Conference on, 4: 53--56, 2006.
[4]
G. Lebanon, Y. Mao, and J. Dillon. The locally weighted bag of words framework for document representation. J. Mach. Learn. Res., 8: 2405--2441, 2007.
[5]
A. Likas, N. Vlassis, and J. J. Verbeek. The global k-means clustering algorithm. Pattern Recognition, 36: 451--461, 2003.
[6]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60: 91--110, 2004.
[7]
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856. MIT Press, 2001.
[8]
Z. Rasheed and M. Shah. Scene detection in hollywood movies and tv shows. Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2: II-343-8 vol. 2, June 2003.
[9]
Z. Rasheed and M. Shah. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, 7(6): 1097--1105, Dec. 2005.
[10]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22: 888--905, 2000.
[11]
M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst., 71(1): 94--109, 1998.
[12]
Y. Zhai and M. Shah. Video scene segmentation using markov chain monte carlo. IEEE Transactions on Multimedia, 8(4): 686--697, Aug. 2006.

Cited By

View all
  • (2021)Plot Structure Decomposition in Narrative Multimedia by Analyzing Personalities of Fictional CharactersApplied Sciences10.3390/app1104164511:4(1645)Online publication date: 11-Feb-2021
  • (2021)Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph EmbeddingIEEE Transactions on Image Processing10.1109/TIP.2020.304444830(1476-1486)Online publication date: 2021
  • (2021)Temporal video scene segmentation using deep-learningMultimedia Tools and Applications10.1007/s11042-020-10450-2Online publication date: 8-Feb-2021
  • Show More Cited By

Index Terms

  1. Movie segmentation into scenes and chapters using locally weighted bag of visual words

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval
    July 2009
    383 pages
    ISBN:9781605584805
    DOI:10.1145/1646396
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CCH descriptors
    2. Lowbow
    3. SIFT descriptors
    4. chapter detection
    5. scene detection

    Qualifiers

    • Poster

    Conference

    CIVR '09
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Plot Structure Decomposition in Narrative Multimedia by Analyzing Personalities of Fictional CharactersApplied Sciences10.3390/app1104164511:4(1645)Online publication date: 11-Feb-2021
    • (2021)Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph EmbeddingIEEE Transactions on Image Processing10.1109/TIP.2020.304444830(1476-1486)Online publication date: 2021
    • (2021)Temporal video scene segmentation using deep-learningMultimedia Tools and Applications10.1007/s11042-020-10450-2Online publication date: 8-Feb-2021
    • (2020)Semantic Analysis of Videos for Tags Prediction and SegmentationIndustrial Internet of Things and Cyber-Physical Systems10.4018/978-1-7998-2803-7.ch014(296-307)Online publication date: 2020
    • (2020)Movie Tags Prediction and Segmentation Using Deep LearningIEEE Access10.1109/ACCESS.2019.29635358(6071-6086)Online publication date: 2020
    • (2020)Understanding a bag of words by conceptual labeling with prior weightsWorld Wide Web10.1007/s11280-020-00806-x23:4(2429-2447)Online publication date: 14-Apr-2020
    • (2020)Learning Motion Regularity for Temporal Video Segmentation and Anomaly DetectionPattern Recognition10.1007/978-3-030-41404-7_9(121-135)Online publication date: 23-Feb-2020
    • (2019)Multimedia information retrieval in big data using OpenCV pythonProceedings of the 25th Brazillian Symposium on Multimedia and the Web10.1145/3323503.3345030(25-27)Online publication date: 29-Oct-2019
    • (2019)Modeling affective character network for story analyticsFuture Generation Computer Systems10.1016/j.future.2018.01.03092:C(458-478)Online publication date: 1-Mar-2019
    • (2019)Correlation based feature fusion for the temporal video scene segmentation taskMultimedia Tools and Applications10.1007/s11042-018-6959-478:11(15623-15646)Online publication date: 1-Jun-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media