skip to main content
research-article

Discovery and Segmentation of Activities in Video

Published: 01 August 2000 Publication History

Abstract

Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.

References

[1]
L. Baum T. Petrie G. Soules and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.
[2]
Y. Bengio and P. Frasconi, “Diffusion of Credit in Markovian Models,” Advances in Neural Information Processing Systems, G. Tesauro, D.S. Touretzky, and T. Leen, eds., vol. 7, pp. 553-560, MIT Press, 1995.
[3]
M. Brand, “Pattern Discovery via Entropy Minimization,” Artificial Intelligence and Statistics, D. Heckerman and C. Whittaker, eds., no. 7, Morgan Kaufmann, 1999.
[4]
M. Brand, “Shadow Puppetry,” Proc. Int'l Conf. Computer Vision, 1999.
[5]
M. Brand, “Structure Discovery in Conditional Probability Models via an Entropic Prior and Parameter Extinction,” Neural Computation, vol. 11, no. 5, pp. 1,155-1.182, 1999.
[6]
M. Brand, “Exploring Variational Structure by Cross-Entropy Optimization,” Proc. Int'l Conf. Machine Learning, P. Langley, ed., 2000.
[7]
W. Grimson C. Stauffer R. Romano and L. Lee, “Using Adaptive Tracking to Classify and Monitor Activities in a Site,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 22-29, 1998.
[8]
F. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1998.
[9]
B. Juang S. Levinson and M. Sondhi, “Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Sources,” IEEE Trans. Information Theory, vol. 32, no. no. 2, pp. 307-309, 1986.
[10]
L. Liporace, “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. Information Theory, vol. 28, no. 5, pp. 729-734, 1982.
[11]
Proc. Int'l Conf. Automatic Face and Gesture Recognition, A. Pentland and I. Essa, eds., 1997.
[12]
L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[13]
J. Rissanen, Stochastic Complexity and Statistical Inquiry. World Scientific, 1989.
[14]
Proc. DARPA Image Understanding Workshop, T. Strat, ed., 1998.
[15]
P. Vitanyi and M. Li, “Ideal MDL and Its Relation to Bayesianism,” ISIS: Information, Statistics and Induction in Science, pp. 282-291, Singapore: World Scientific, 1996.
[16]
C. Wallace and P. Freeman, “Estimation and Inference by Compact Coding,” J. Royal Statistical Soc., Series B, vol. 49, pp. 240-251, 1987.
[17]
C. Wren A. Azarbayejani T. Darrell and A. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” Proc. SPIE, vol. 2, 615, 1995.
[18]
Proc. Int'l Conf. Automatic Face and Gesture Recognition, M. Yachida, ed., 1998.

Cited By

View all
  • (2023)Diffused Fourier Network for Video Action SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611842(5474-5483)Online publication date: 26-Oct-2023
  • (2019)Machine Learning in Human-computer Nonverbal CommunicationNeuroManagement and Intelligent Computing Method on Multimodal Interaction10.1145/3357160.3357670(1-7)Online publication date: 14-Oct-2019
  • (2017)Video Processing From Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A SurveyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.263458018:8(1993-2016)Online publication date: 31-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 22, Issue 8
August 2000
177 pages
ISSN:0162-8828
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 August 2000

Author Tags

  1. Video activity monitoring
  2. entropy minimization.
  3. hidden Markov models
  4. hidden state
  5. parameter estimation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Diffused Fourier Network for Video Action SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611842(5474-5483)Online publication date: 26-Oct-2023
  • (2019)Machine Learning in Human-computer Nonverbal CommunicationNeuroManagement and Intelligent Computing Method on Multimodal Interaction10.1145/3357160.3357670(1-7)Online publication date: 14-Oct-2019
  • (2017)Video Processing From Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A SurveyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.263458018:8(1993-2016)Online publication date: 31-Jul-2017
  • (2017)Complex Video Scene Analysis Using Kernelized-Collaborative Behavior Pattern Learning Based on Hierarchical Representative Object BehaviorsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.253954027:6(1275-1289)Online publication date: 1-Jun-2017
  • (2016)Exploiting Large Image Sets for Road Scene ParsingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2016.252250617:9(2456-2465)Online publication date: 26-Aug-2016
  • (2016)Real time motion estimation using a neural architecture implemented on GPUsJournal of Real-Time Image Processing10.1007/s11554-014-0417-y11:4(731-749)Online publication date: 1-Apr-2016
  • (2015)Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regressionNeural Computing and Applications10.1007/s00521-014-1810-y26:7(1561-1574)Online publication date: 1-Oct-2015
  • (2014)Statistical script independent word spotting in offline handwritten documentsPattern Recognition10.1016/j.patcog.2013.09.01947:3(1039-1050)Online publication date: 1-Mar-2014
  • (2013)A reward-and-punishment-based approach for concept detection using adaptive ontology rulesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2457450.24574529:2(1-21)Online publication date: 10-May-2013
  • (2013)M4LPattern Recognition10.1016/j.patcog.2013.02.01846:10(2711-2723)Online publication date: 1-Oct-2013
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media