Abstract
Classical classification methods usually assume that pattern recognition models do not depend on the timing of the data. However, this assumption is not valid in cases where new data frequently become available. Such situations are common in practice, for example, spam filtering or fraud detection, where dependencies between feature values and class numbers are continually changing. Unfortunately, most classical machine learning methods (such as decision trees) do not take into consideration the possibility of the model changing, as a result of so-called concept drift and they cannot adapt to a new classification model. This paper focuses on the problem of concept drift, which is a very important issue, especially in data mining methods that use complex structures (such as decision trees) for making decisions. We propose an algorithm that is able to co-train decision trees using a modified NGE (Nested Generalized Exemplar) algorithm. The potential for adaptation of the proposed algorithm and the quality thereof are evaluated through computer experiments, carried out on benchmark datasets from the UCI Machine Learning Repository.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aggarwal ChC (2009) On classification and segmentation of massive audio data streams. Knowl Inf Syst 20(2): 137–156
Aha, DW (ed) (1997) Lazy learning. Kluwer, Dordrecht
Aksela M, Laaksonen J (2007) Adaptive combination of adaptive classifiers for handwritten character recognition. Pattern Recognit Lett 28(1): 136–143
Alpaydin W (2010) Introduction to Machine Learning, 2nd edn. The MIT Press, London
Asuncion A, Newman DJ (2007) UCI Mach.Learn. Rep. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. University of California, School of Information and Computer Science, Irvine, CA
Ben-Haim Y, Yom-Tov E (2011) A streaming parallel decision tree algorithm. J Mach Learn Res 11: 849–872
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In Proceedings of SIAM International Conference on Data Mining (SDM’07)
Bishop ChM (2006) Pattern recognition and machine learning. Springer, Berlin
Black M, Hickey R (2002) Classification of customer call data in the presence of concept drift and noise. In: Proceedings of the 1st international conference on computing in an imperfect world soft-ware 2002. Springer, Berlin, pp 74–87
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks, Monterey
Brendt M (1995) Instance-based learning: nearest neighbour with generalization, Techical Report of Department of Computer Science. University of Waitako, New Zealand
Domingos P, Hulten G (2000) Mining highSpeed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. Boston, Massachusetts, United States pp 71–80
Duin RPW et al (2004) PRTools4, A Matlab Toolbox for Pattern Recognition. Delft University of Technology
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1): 1–67
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM SIGMOD Record 34(1): 18–26
Gehrke J, Ganti V, Ramakrishnan R, Loh W-L (1999) BOAT: optimistic decision tree construction. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data. Philadelphia, PA, pp 169–180
Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–91
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceeding of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–101
Is See5/C5.0 Better Than C4.5?, RuleQuest Research Pty Ltd. http://rulequest.com/see5-comparison.html. Accessed 10 August 2010
Jin R, Agrawal G (2003) Communication and memory efficient parallel decision tree construction. In: Proceedings of the 3rd SIAM conference on data mining. pp 119–129
Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD). Washington, D.C. pp 571–576
Jiang L, Li Ch, Cai Z (2009) Learning decision tree for ranking. Knowl Inf Syst 20(1): 123–135
Kelly M, Hand D, Adams N (1999) The impact of changing populations on classifier performance. In: Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 367–371
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint Conference on artificial intell. San Mateo, pp 1137–1143
Kufrin R (1997) Decision trees on parallel processors. In: Geller J, Kitano H, Suttner CB (eds) Parallel processing for artificial intelligence, vol.3. Elsevier Science, Amsterdam, pp 279–306
Liu H, Lin Y, Han J Methods for mining frequent items in data streams: an overview. Knowl Inf Syst doi:10.1007/s10115-009-0267-2
Liu S, Duffy AHB, Whitfield RI, Boyle IM (2010) Integration of decision support systems to improve decision support performance. Knowl Inf Syst 22(3): 261–286
Mehta M, et al (1996) SLIQ: A fast scalable classifier for data mining. In: Proceedings of the 5th international conference on extending database technology, pp 18–32
Paliouras G, Bree DS (1995) The effect of numeric features on the scalability of inductive learning programs. Lecture notes in computer science 912: 218–231
Patcha A, Park J (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12): 3448–3470
Quinlan JR (1986) Induction on decision tree. Mach Learn 1: 81–106
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos
Salzberg S (1991) A nearest hyperrectangle learning method. Mach Learn 6: 251–276
Su J, Zhang H (2006) A fast decision tree learning algorithm. In: Proceedings of the twenty-first AAAI conference on artificial intelligence. Boston, Massachusetts July 16–20, pp 500–505
Shafer J et al (1996) SPRINT: a scalable parallel classifier for data mining. In the Proceedings of the 22nd VLBD conference, pp. 544–555
Srivastava A et al (1999) Parallel formulations of decision tree classification algorithms. Data Min Knowl Discov 3(3): 237–261
Tsymbal A (2004) The problem of concept drift: Definitions and related work. Technical report Department of Computer Science, Trinity College: Dublin, Ireland
Ulaş A, Semerci M, Yıldız OT, Alpaydın E (2009) Incremental construction of classifier and discriminant ensembles. Inf Sci 179(9): 1298–1318
Wettschereck D (1994) A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In: Proceedings of the European Conference on machine learning, pp 323–335
Wettschereck D, Dietterich TG (1995) An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Mach Learn 19: 5–27
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publisher, Los Altos
Wozniak M (2009) Modification of nested hyperrectangle exemplar as a proposition of information fusion method. LNCS 5788: 687–694
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Yang Ch-T, Tsai ST, Li K-Ch (2005) Decision tree construction for data mining on grid computing environments. In: Proceedings of the 19th international conference on advanced information networking and applications AINA’05. Taipei, Taiwan, pp 421–424
Yidiz OT, Dikmen O (2007) Parallel univariate decision trees. Pattern Recognit Lett 28(7): 825–832
Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
Acknowledgments
This work is supported in part by the Polish State Committee for Scientific Research under a grant for the period 2010–2013.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Wozniak, M. A hybrid decision tree training method using data streams. Knowl Inf Syst 29, 335–347 (2011). https://doi.org/10.1007/s10115-010-0345-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0345-5