poster

Movie segmentation into scenes and chapters using locally weighted bag of visual words

Authors:

Vasileios Chasanis,

Argyris Kalogeratos,

Aristidis LikasAuthors Info & Claims

CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval

Article No.: 35, Pages 1 - 7

https://doi.org/10.1145/1646396.1646439

Published: 08 July 2009 Publication History

Get Access

Abstract

Movies segmentation into semantically correlated units is a quite tedious task due to "semantic gap". Low-level features do not provide useful information about the semantical correlation between shots and usually fail to detect scenes with constantly dynamic content. In the method we propose herein, local invariant descriptors are used to represent the key-frames of video shots and a visual vocabulary is created from these descriptors resulting to a visual words histogram representation (bag of visual words) for each shot. A key aspect of our method is that, based on an idea from text segmentation, the histograms of visual words corresponding to each shot are further smoothed temporally by taking into account the histograms of neighboring shots. In this way, valuable contextual information is preserved. The final scene and chapter boundaries are determined at the local maxima of the difference of successive smoothed histograms for low and high values of the smoothing parameter respectively. Numerical experiments indicate that our method provides high detection rates while preserving a good tradeoff between recall and precision.

References

[1]

V. Chasanis, A. Likas, and N. Galatsanos. Efficient video shot summarization using an enhanced spectral clustering approach. In ICANN '08: Proceedings of the 18th international conference on Artificial Neural Networks, Part I, pages 847--856, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

Google Scholar

[2]

A. Del Bimbo. Visual information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999.

Digital Library

Google Scholar

[3]

C.-R. Huang, C.-S. Chen, and P.-C. Chung. Contrast context histogram - a discriminating local descriptor for image matching. Pattern Recognition, International Conference on, 4: 53--56, 2006.

Digital Library

Google Scholar

[4]

G. Lebanon, Y. Mao, and J. Dillon. The locally weighted bag of words framework for document representation. J. Mach. Learn. Res., 8: 2405--2441, 2007.

Digital Library

Google Scholar

[5]

A. Likas, N. Vlassis, and J. J. Verbeek. The global k-means clustering algorithm. Pattern Recognition, 36: 451--461, 2003.

Crossref

Google Scholar

[6]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60: 91--110, 2004.

Digital Library

Google Scholar

[7]

A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856. MIT Press, 2001.

Digital Library

Google Scholar

[8]

Z. Rasheed and M. Shah. Scene detection in hollywood movies and tv shows. Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2: II-343-8 vol. 2, June 2003.

Crossref

Google Scholar

[9]

Z. Rasheed and M. Shah. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, 7(6): 1097--1105, Dec. 2005.

Digital Library

Google Scholar

[10]

J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22: 888--905, 2000.

Digital Library

Google Scholar

[11]

M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst., 71(1): 94--109, 1998.

Digital Library

Google Scholar

[12]

Y. Zhai and M. Shah. Video scene segmentation using markov chain monte carlo. IEEE Transactions on Multimedia, 8(4): 686--697, Aug. 2006.

Digital Library

Google Scholar

Cited By

View all

Lee OYou EKim J(2021)Plot Structure Decomposition in Narrative Multimedia by Analyzing Personalities of Fictional CharactersApplied Sciences10.3390/app1104164511:4(1645)Online publication date: 11-Feb-2021
https://doi.org/10.3390/app11041645
Dimiccoli MWendt H(2021)Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph EmbeddingIEEE Transactions on Image Processing10.1109/TIP.2020.304444830(1476-1486)Online publication date: 2021
https://doi.org/10.1109/TIP.2020.3044448
Trojahn TGoularte R(2021)Temporal video scene segmentation using deep-learningMultimedia Tools and Applications10.1007/s11042-020-10450-2Online publication date: 8-Feb-2021
https://doi.org/10.1007/s11042-020-10450-2
Show More Cited By

Index Terms

Movie segmentation into scenes and chapters using locally weighted bag of visual words
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A New Bag of Words LBP BoWL Descriptor for Scene Image Classification
CAIP 2013: Proceedings, Part I, of the 15th International Conference on Computer Analysis of Images and Patterns - Volume 8047

This paper explores a new Local Binary Patterns LBP based image descriptor that makes use of the bag-of-words model to significantly improve classification performance for scene images. Specifically, first, a novel multi-neighborhood LBP is introduced ...
Word Image Retrieval Using Bag of Visual Words
DAS '12: Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems

This paper presents a Bag of Visual Words (BoVW) based approach to retrieve similar word images from a large database, efficiently and accurately. We show that a text retrieval system can be adapted to build a word image retrieval solution. This helps ...
A novel method for image classification based on bag of visual words

We apply the salient region extraction to BOW model, it can produce more representive visual words and avoid the disturbance of complex background information.The visual words topological structure is able to integrate into global spatial information of ...

Comments

Information & Contributors

Information

Published In

CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval

July 2009

383 pages

ISBN:9781605584805

DOI:10.1145/1646396

Conference Chairs:
Yiannis Kompatsiaris
CERTH-ITI, Greece
,
Stephane Marchand-Maillet
Univ. of Geneva, Switzerland
,
Program Chairs:
Yannis Avrithis
NTUA, Greece
,
Noel O Connor
DCU, Ireland
,
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Tat-Seng Chua
National University of Singapore, Singapore

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIVR '09

Sponsor:

SIGMM

CIVR '09: CIVR '09 - International Conference on Image and Video Retrieval

July 8 - 10, 2009

Santorini, Fira, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lee OYou EKim J(2021)Plot Structure Decomposition in Narrative Multimedia by Analyzing Personalities of Fictional CharactersApplied Sciences10.3390/app1104164511:4(1645)Online publication date: 11-Feb-2021
https://doi.org/10.3390/app11041645
Dimiccoli MWendt H(2021)Learning Event Representations for Temporal Segmentation of Image Sequences by Dynamic Graph EmbeddingIEEE Transactions on Image Processing10.1109/TIP.2020.304444830(1476-1486)Online publication date: 2021
https://doi.org/10.1109/TIP.2020.3044448
Trojahn TGoularte R(2021)Temporal video scene segmentation using deep-learningMultimedia Tools and Applications10.1007/s11042-020-10450-2Online publication date: 8-Feb-2021
https://doi.org/10.1007/s11042-020-10450-2
Khan U(2020)Semantic Analysis of Videos for Tags Prediction and SegmentationIndustrial Internet of Things and Cyber-Physical Systems10.4018/978-1-7998-2803-7.ch014(296-307)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-2803-7.ch014
Khan UMartinez-Del-Amor MAltowaijri SAhmed ARahman ASama NHaseeb KIslam N(2020)Movie Tags Prediction and Segmentation Using Deep LearningIEEE Access10.1109/ACCESS.2019.29635358(6071-6086)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2019.2963535
Jiang HYang DXiao YWang W(2020)Understanding a bag of words by conceptual labeling with prior weightsWorld Wide Web10.1007/s11280-020-00806-x23:4(2429-2447)Online publication date: 14-Apr-2020
https://doi.org/10.1007/s11280-020-00806-x
Daha FShah S(2020)Learning Motion Regularity for Temporal Video Segmentation and Anomaly DetectionPattern Recognition10.1007/978-3-030-41404-7_9(121-135)Online publication date: 23-Feb-2020
https://doi.org/10.1007/978-3-030-41404-7_9
Goularte RTrojahn TKishi Rdos Santos JMuchaluat Saade Dda Gra�a C. Pimentel MMacedo A(2019)Multimedia information retrieval in big data using OpenCV pythonProceedings of the 25th Brazillian Symposium on Multimedia and the Web10.1145/3323503.3345030(25-27)Online publication date: 29-Oct-2019
https://dl.acm.org/doi/10.1145/3323503.3345030
Lee OJung J(2019)Modeling affective character network for story analyticsFuture Generation Computer Systems10.1016/j.future.2018.01.03092:C(458-478)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1016/j.future.2018.01.030
Kishi RTrojahn TGoularte R(2019)Correlation based feature fusion for the temporal video scene segmentation taskMultimedia Tools and Applications10.1007/s11042-018-6959-478:11(15623-15646)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s11042-018-6959-4
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A New Bag of Words LBP BoWL Descriptor for Scene Image Classification

Word Image Retrieval Using Bag of Visual Words

A novel method for image classification based on bag of visual words