research-article

Discovery and Segmentation of Activities in Video

Authors:

Vera KettnakerAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, Issue 8

Pages 844 - 851

https://doi.org/10.1109/34.868685

Published: 01 August 2000 Publication History

Abstract

Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.

References

[1]

L. Baum T. Petrie G. Soules and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.

[2]

Y. Bengio and P. Frasconi, “Diffusion of Credit in Markovian Models,” Advances in Neural Information Processing Systems, G. Tesauro, D.S. Touretzky, and T. Leen, eds., vol. 7, pp. 553-560, MIT Press, 1995.

[3]

M. Brand, “Pattern Discovery via Entropy Minimization,” Artificial Intelligence and Statistics, D. Heckerman and C. Whittaker, eds., no. 7, Morgan Kaufmann, 1999.

[4]

M. Brand, “Shadow Puppetry,” Proc. Int'l Conf. Computer Vision, 1999.

Digital Library

[5]

M. Brand, “Structure Discovery in Conditional Probability Models via an Entropic Prior and Parameter Extinction,” Neural Computation, vol. 11, no. 5, pp. 1,155-1.182, 1999.

Digital Library

[6]

M. Brand, “Exploring Variational Structure by Cross-Entropy Optimization,” Proc. Int'l Conf. Machine Learning, P. Langley, ed., 2000.

Digital Library

[7]

W. Grimson C. Stauffer R. Romano and L. Lee, “Using Adaptive Tracking to Classify and Monitor Activities in a Site,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 22-29, 1998.

Digital Library

[8]

F. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1998.

Digital Library

[9]

B. Juang S. Levinson and M. Sondhi, “Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Sources,” IEEE Trans. Information Theory, vol. 32, no. no. 2, pp. 307-309, 1986.

Digital Library

[10]

L. Liporace, “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. Information Theory, vol. 28, no. 5, pp. 729-734, 1982.

[11]

Proc. Int'l Conf. Automatic Face and Gesture Recognition, A. Pentland and I. Essa, eds., 1997.

[12]

L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.

[13]

J. Rissanen, Stochastic Complexity and Statistical Inquiry. World Scientific, 1989.

Digital Library

[14]

Proc. DARPA Image Understanding Workshop, T. Strat, ed., 1998.

[15]

P. Vitanyi and M. Li, “Ideal MDL and Its Relation to Bayesianism,” ISIS: Information, Statistics and Induction in Science, pp. 282-291, Singapore: World Scientific, 1996.

[16]

C. Wallace and P. Freeman, “Estimation and Inference by Compact Coding,” J. Royal Statistical Soc., Series B, vol. 49, pp. 240-251, 1987.

[17]

C. Wren A. Azarbayejani T. Darrell and A. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” Proc. SPIE, vol. 2, 615, 1995.

[18]

Proc. Int'l Conf. Automatic Face and Gesture Recognition, M. Yachida, ed., 1998.

Cited By

Jiang BMU YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Diffused Fourier Network for Video Action SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611842(5474-5483)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611842
Gon SMa HWan YXu A(2019)Machine Learning in Human-computer Nonverbal CommunicationNeuroManagement and Intelligent Computing Method on Multimodal Interaction10.1145/3357160.3357670(1-7)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3357160.3357670
Prasad DRajan DRachmawati LRajabally EQuek C(2017)Video Processing From Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A SurveyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.263458018:8(1993-2016)Online publication date: 31-Jul-2017
https://dl.acm.org/doi/10.1109/TITS.2016.2634580
Show More Cited By

Index Terms

Discovery and Segmentation of Activities in Video

Recommendations

Coding with partially hidden Markov models
DCC '95: Proceedings of the Conference on Data Compression

Partially hidden Markov models (PHMM) are introduced. They are a variation of the hidden Markov models (HMM) combining the power of explicit conditioning on past observations and the power of using hidden states. (P)HMM may be combined with arithmetic ...
Segmenting human activities based on HMMs using smartphone inertial sensors

This paper describes the development of a Human Activity Recognition and Segmentation (HARS) system based on Hidden Markov Models (HMMs). This system uses inertial signals from a smartphone to recognize and segment six different physical activities: ...
Towards the detection of unusual temporal events during activities using HMMs
UbiComp '12: Proceedings of the 2012 ACM Conference on Ubiquitous Computing

Most of the systems for recognition of activities aim to identify a set of normal human activities. Data is either recorded by computer vision or sensor based networks. These systems may not work properly if an unusual event or abnormal activity occurs, ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 22, Issue 8

August 2000

177 pages

ISSN:0162-8828

Editor:
Kevin Bowyer
Univ. of South Florida, Tampa

Issue’s Table of Contents

Copyright © Copyright © 2000 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 August 2000

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang BMU YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Diffused Fourier Network for Video Action SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611842(5474-5483)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611842
Gon SMa HWan YXu A(2019)Machine Learning in Human-computer Nonverbal CommunicationNeuroManagement and Intelligent Computing Method on Multimodal Interaction10.1145/3357160.3357670(1-7)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3357160.3357670
Prasad DRajan DRachmawati LRajabally EQuek C(2017)Video Processing From Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A SurveyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.263458018:8(1993-2016)Online publication date: 31-Jul-2017
https://dl.acm.org/doi/10.1109/TITS.2016.2634580
Park SPark HYoo C(2017)Complex Video Scene Analysis Using Kernelized-Collaborative Behavior Pattern Learning Based on Hierarchical Representative Object BehaviorsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.253954027:6(1275-1289)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1109/TCSVT.2016.2539540
�lvarez JSalzmann MBarnes N(2016)Exploiting Large Image Sets for Road Scene ParsingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.252250617:9(2456-2465)Online publication date: 26-Aug-2016
https://dl.acm.org/doi/10.1109/TITS.2016.2522506
Garcia-Rodriguez JOrts-Escolano SAngelopoulou APsarrou AAzorin-Lopez JGarcia-Chamizo J(2016)Real time motion estimation using a neural architecture implemented on GPUsJournal of Real-Time Image Processing10.1007/s11554-014-0417-y11:4(731-749)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1007/s11554-014-0417-y
Liu CDong SLu BAbdel-Mottaleb M(2015)Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regressionNeural Computing and Applications10.1007/s00521-014-1810-y26:7(1561-1574)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1007/s00521-014-1810-y
Wshah SKumar GGovindaraju V(2014)Statistical script independent word spotting in offline handwritten documentsPattern Recognition10.1016/j.patcog.2013.09.01947:3(1039-1050)Online publication date: 1-Mar-2014
https://dl.acm.org/doi/10.1016/j.patcog.2013.09.019
Bhatt CAtrey PKankanhalli M(2013)A reward-and-punishment-based approach for concept detection using adaptive ontology rulesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2457450.24574529:2(1-21)Online publication date: 10-May-2013
https://dl.acm.org/doi/10.1145/2457450.2457452
Zhang TLiu SXu CLu H(2013)M4LPattern Recognition10.1016/j.patcog.2013.02.01846:10(2711-2723)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1016/j.patcog.2013.02.018
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents