skip to main content
10.1145/2339530.2339606acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

The long and the short of it: summarising event sequences with serial episodes

Published: 12 August 2012 Publication History

Abstract

An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned.
We pursue the ideal for sequential data, by employing a pattern set mining approach - an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly understudied for sequential data.
In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameter-free any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.

Supplementary Material

JPG File (307_m_talk_12.jpg)
MP4 File (307_m_talk_12.mp4)

References

[1]
A. Achar, S. Laxman, R. Viswanathan, and P. S. Sastry. Discovering injective episodes with general partial orders. Data Min. Knowl. Disc., 2011.
[2]
R. Bathoorn, A. Koopman, and A. Siebes. Reducing the frequent pattern set. In ICDM-Workshop, pages 1--5, 2006.
[3]
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience New York, 2006.
[4]
S. Dzeroski, B. Goethals, and P. Panov, editors. Inductive Databases and Constraint-Based Data Mining. Springer, 2010.
[5]
P. Gr�nwald. The Minimum Description Length Principle. MIT Press, 2007.
[6]
R. Gwadera, M. J. Atallah, and W. Szpankowski. Markov models for identification of significant episodes. In SDM, pages 404--414, 2005.
[7]
R. Gwadera, M. J. Atallah, and W. Szpankowski. Reliable detection of episodes in event sequences. Knowl. Inf. Sys., 7(4):415--437, 2005.
[8]
J. Kiernan and E. Terzi. EventSummarizer: a tool for summarizing large event sequences. In EDBT, pages 1136--1139, 2009.
[9]
H. T. Lam, F. M�rchen, D. Fradkin, and T. Calders. Mining compressing sequential patterns. In SDM, 2012.
[10]
S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. A fast algorithm for finding frequent episodes in event streams. In KDD, pages 410--419, 2007.
[11]
M. Li and P. Vit�nyi. An Introduction to Kolmogorov Complexity and its Applications. Springer, 1993.
[12]
H. Mannila and C. Meek. Global partial orders from sequential data. In KDD, pages 161--168, 2000.
[13]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc., 1(3):259--289, 1997.
[14]
J. Pei, H. Wang, J. Liu, K. Wang, J. Wang, and P. S. Yu. Discovering frequent closed partial orders from strings. IEEE TKDE, 18(11):1467--1481, 2006.
[15]
J. Rissanen. Modeling by shortest data description. Annals Stat., 11(2):416--431, 1983.
[16]
D. Salomon and G. Motta. Handbook of Data Compression. Springer, 2009.
[17]
K. Smets and J. Vreeken. SLIM: Directly mining descriptive patterns. In SDM, pages 1--12. SIAM, 2012.
[18]
N. Tatti. Significance of episodes based on minimal windows. In ICDM, pages 513--522, 2009.
[19]
N. Tatti and B. Cule. Mining closed episodes with simultaneous events. In KDD, pages 1172--1180, 2011.
[20]
N. Tatti and B. Cule. Mining closed strict episodes. Data Min. Knowl. Disc., 2011.
[21]
N. Vereshchagin and P. Vitanyi. Kolmogorov's structure functions and model selection. IEEE TIT, 50(12):3265-- 3290, 2004.
[22]
J. Vreeken and A. Siebes. Filling in the blanks: Krimp minimisation for missing data. In ICDM, pages 1067--1072, 2008.
[23]
J. Vreeken, M. van Leeuwen, and A. Siebes. KRIMP: Mining itemsets that compress. Data Min. Knowl. Disc., 23(1):169--214, 2011.
[24]
J. Wang and J. Han. Bide: Efficient mining of frequent closed sequences. ICDE, 0:79, 2004.

Cited By

View all
  • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
  • (2024)Geo-SigSPM: mining geographically interesting and significant sequential patterns from trajectoriesInternational Journal of Geographical Information Science10.1080/13658816.2024.232014938:5(879-901)Online publication date: 29-Feb-2024
  • (2024)SWoTTeD: an extension of tensor decomposition to temporal phenotypingMachine Learning10.1007/s10994-024-06545-8113:9(5939-5980)Online publication date: 30-Apr-2024
  • Show More Cited By

Index Terms

  1. The long and the short of it: summarising event sequences with serial episodes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. event sequence
    2. pattern mining
    3. pattern set mining
    4. serial episodes

    Qualifiers

    • Research-article

    Conference

    KDD '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
    • (2024)Geo-SigSPM: mining geographically interesting and significant sequential patterns from trajectoriesInternational Journal of Geographical Information Science10.1080/13658816.2024.232014938:5(879-901)Online publication date: 29-Feb-2024
    • (2024)SWoTTeD: an extension of tensor decomposition to temporal phenotypingMachine Learning10.1007/s10994-024-06545-8113:9(5939-5980)Online publication date: 30-Apr-2024
    • (2024)Data is Moody: Discovering Data Modification Rules from Process Event LogsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_17(285-302)Online publication date: 22-Aug-2024
    • (2023)Efficient Depth-First Search Approach for Mining Injective General EpisodesProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571012(1-9)Online publication date: 4-Jan-2023
    • (2022)TacticFlow: Visual Analytics of Ever-Changing Tactics in Racket SportsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311483228:1(835-845)Online publication date: 1-Jan-2022
    • (2022)A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains ApplicationsIEEE Access10.1109/ACCESS.2022.322896210(130672-130685)Online publication date: 2022
    • (2022)The minimum description length principle for pattern mining: a surveyData Mining and Knowledge Discovery10.1007/s10618-022-00846-z36:5(1679-1727)Online publication date: 4-Jul-2022
    • (2022)Omen: discovering sequential patterns with reliable prediction delaysKnowledge and Information Systems10.1007/s10115-022-01660-164:4(1013-1045)Online publication date: 5-Mar-2022
    • (2021)TQELProceedings of the VLDB Endowment10.14778/3476249.347630914:11(2642-2654)Online publication date: 27-Oct-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media