skip to main content
research-article

Large Scale Online Multiple Kernel Regression with Application to Time-Series Prediction

Published: 23 January 2019 Publication History

Abstract

Kernel-based regression represents an important family of learning techniques for solving challenging regression tasks with non-linear patterns. Despite being studied extensively, most of the existing work suffers from two major drawbacks as follows: (i) they are often designed for solving regression tasks in a batch learning setting, making them not only computationally inefficient and but also poorly scalable in real-world applications where data arrives sequentially; and (ii) they usually assume that a fixed kernel function is given prior to the learning task, which could result in poor performance if the chosen kernel is inappropriate. To overcome these drawbacks, this work presents a novel scheme of Online Multiple Kernel Regression (OMKR), which sequentially learns the kernel-based regressor in an online and scalable fashion, and dynamically explore a pool of multiple diverse kernels to avoid suffering from a single fixed poor kernel so as to remedy the drawback of manual/heuristic kernel selection. The OMKR problem is more challenging than regular kernel-based regression tasks since we have to on-the-fly determine both the optimal kernel-based regressor for each individual kernel and the best combination of the multiple kernel regressors. We propose a family of OMKR algorithms for regression and discuss their application to time series prediction tasks including application to AR, ARMA, and ARIMA time series. We develop novel approaches to make OMKR scalable for large datasets, to counter the problems arising from an unbounded number of support vectors. We also explore the effect of kernel combination at prediction level and at the representation level. Finally, we conduct extensive experiments to evaluate the empirical performance on both real-world regression and times series prediction tasks.

References

[1]
Oren Anava, Elad Hazan, Shie Mannor, and Ohad Shamir. 2013. Online learning for time series prediction. In COLT, Vol. 30. 172--184.
[2]
Oren Anava, Elad Hazan, and Assaf Zeevi. 2015. Online time series prediction with missing data. In International Conference on Machine Learning. 2191--2199.
[3]
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 1 (2002), 48--77.
[4]
Olivier Bousquet and Daniel J. L. Herrmann. 2003. On the complexity of learning the kernel matrix. In Advances in NIPS. 415--422.
[5]
Dominik Brugger, Wolfgang Rosenstiel, and Martin Bogdan. 2011. Online SVR training by solving the primal optimization problem. Journal of Signal Processing Systems 65, 3 (2011), 391--402.
[6]
Giovanni Cavallanti, Nicol� Cesa-Bianchi, and Claudio Gentile. 2007. Tracking the best hyperplane with a simple budget perceptron. Machine Learning 69 (2007), 143--167.
[7]
Nicolo Cesa-Bianchi and Gabor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press.
[8]
Badong Chen, Songlin Zhao, Pingping Zhu, and Jos� C. Pr�ncipe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1 (2012), 22--32.
[9]
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research 7 (Dec. 2006), 551--585.
[10]
Koby Crammer, Jaz Kandola, and Yoram Singer. 2004. Online classification on a budget. In Advances in NIPS. MIT Press, Cambridge, MA.
[11]
Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer. 2008. The forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing 37, 5 (2008), 1342--1372.
[12]
Mark Dredze, Koby Crammer, and Fernando Pereira. 2008. Confidence-weighted linear classification. In 25th International Conference on Machine Learning. ACM, 264--271.
[13]
Yoav Freund and Robert E. Schapire. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory, Vol. 904. Springer, Berlin, 23--37.
[14]
Joao Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys 46, 4 (2014), 44.
[15]
Mehmet Gönen and Ethem Alpaydın. 2011. Multiple kernel learning algorithms. Journal of Machine Learning Research 12 (2011), 2211--2268.
[16]
Yujun He, Youchan Zhu, and Dongxing Duan. 2006. Research on hybrid ARIMA and support vector machine model in short term load forecasting. In 6th International Conference on Intelligent Systems Design and Applications (ISDA’06). IEEE, vol. 1, 804--809.
[17]
Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2007. Learning nonparametric kernel matrices from pairwise constraints. In ICML. ACM, 361--368.
[18]
Steven C. H. Hoi, Rong Jin, Peilin Zhao, and Tianbao Yang. 2013. Online multiple kernel classification. Machine Learning 90, 2 (2013), 289--316.
[19]
Steven C. H. Hoi, Michael R. Lyu, and Edward Y. Chang. 2006. Learning the unified kernel machines for classification. In KDD. ACM, 187--196.
[20]
Steven C. H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2018. Online learning: A comprehensive survey. arXiv:1802.02871.
[21]
Steven C. H. Hoi, Jialei Wang, and Peilin Zhao. 2014. Libol: A library for online learning algorithms. Journal of Machine Learning Research 15, 1 (2014), 495--499.
[22]
Wei-Chiang Hong and Ping-Feng Pai. 2006. Predicting engine reliability by support vector machines. International Journal of Advanced Manufacturing Technology 28, 1--2 (2006), 154--161.
[23]
Rong Jin, Steven C. H. Hoi, and Tianbao Yang. 2010. Online multiple kernel learning: Algorithms and mistake bounds. In Algorithmic Learning Theory, Vol. 6331. Springer, Berlin, 390--404.
[24]
Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82, 1 (1960), 35--45.
[25]
Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. 2003. Marginalized kernels between labeled graphs. In ICML, Vol. 3. 321--328.
[26]
J. Kivinen, A. J. Smola, and R. C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8 (2004), 2165--2176.
[27]
James T. Kwok and Ivor W. Tsang. 2003. Learning with idealized kernels. In ICML. 400--407.
[28]
Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. 2004. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 5 (Dec. 2004), 27--72.
[29]
Nick Littlestone and Manfred K. Warmuth. 1989. The weighted majority algorithm. In 30th Annual Symposium on Foundations of Computer Science. IEEE, 256--261.
[30]
Chenghao Liu, Steven C. H. Hoi, Peilin Zhao, and Jianling Sun. 2016. Online ARIMA algorithms for time series prediction. In 13th AAAI Conference on Artificial Intelligence.
[31]
Weifeng Liu, Puskal P. Pokharel, and Jose C. Principe. 2008. The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing 56, 2 (2008), 543--554.
[32]
Jing Lu, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, and Zhi-Yong Liu. 2016. Large scale online kernel learning. Journal of Machine Learning Research 17, 1 (2016), 1613--1655.
[33]
Jing Lu, Doyen Sahoo, Peilin Zhao, and Steven C. H. Hoi. 2018. Sparse passive-aggressive learning for bounded online kernel methods. ACM Transactions on Intelligent Systems and Technology 9, 4 (2018), 45.
[34]
Jing Lu, Peilin Zhao, and Steven C. H. Hoi. 2016. Online sparse passive aggressive learning with kernels. In 2016 SIAM International Conference on Data Mining. SIAM, 675--683.
[35]
Weizhen Lu, Wenjian Wang, Andrew Y. T. Leung, Siu-Ming Lo, Richard K. K. Yuen, Zongben Xu, and Huiyuan Fan. 2002. Air pollutant parameter forecasting using support vector machines. In 2002 International Joint Conference on Neural Networks (IJCNN’02). IEEE, vol. 1, 630--635.
[36]
André F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo. 2011. Online learning of structured predictors with multiple kernels. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 507--515.
[37]
Edward Moroshko and Koby Crammer. 2013. A last-step regression algorithm for non-stationary online learning. Artificial Intelligence and Statistics 451--462.
[38]
K.-R. Müller, Alexander J. Smola, Gunnar Rätsch, Bernhard Schölkopf, Jens Kohlmorgen, and Vladimir Vapnik. 1997. Predicting time series with support vector machines. In International Conference on Artificial Neural Networks. Springer, 999--1004.
[39]
Tu Dinh Nguyen, Trung Le, Hung Bui, and Dinh Phung. 2017. Large-scale online kernel learning with random feature reparameterization. In 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 2543--2549.
[40]
Motoya Ohnishi and Masahiro Yukawa. 2017. Online learning in L 2 space with multiple Gaussian kernels. In 25th European Signal Processing Conference (EUSIPCO’17). IEEE, 1594--1598.
[41]
Francesco Orabona, Joseph Keshet, and Barbara Caputo. 2008. The projectron: A bounded kernel-based perceptron. In ICML. ACM, 720--727.
[42]
Sophocles J. Orfanidis. 1988. Optimum Signal Processing: An Introduction. Macmillan Publishing Company.
[43]
Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems. 1177--1184.
[44]
Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386.
[45]
W. Rudin. 1990. Fourier Analysis on Groups. Wiley-Interscience Publication.
[46]
Doyen Sahoo, Steven Hoi, and Peilin Zhao. 2016. Cost sensitive online multiple kernel classification. In Asian Conference on Machine Learning. 65--80.
[47]
Doyen Sahoo, Steven C. H. Hoi, and Bin Li. 2014. Online multiple kernel regression. In 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 293--302.
[48]
Doyen Sahoo, Quang Pham, Jing Lu, and Steven C. H. Hoi. 2018. Online deep learning: Learning deep neural networks on the fly. In 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2660--2666.
[49]
Nicholas Sapankevych and Ravi Sankar. 2009. Time series prediction using support vector machines: A survey. IEEE Computational Intelligence Magazine 4, 2 (2009), 24--38.
[50]
Bernhard Schölkopf and Alexander J. Smola. 2002. Learning with Kernels. MIT Press.
[51]
Li Cheng S. V. N. Vishwanathan Dale Schuurmans, Shaojun Wang, and Terry Caelli. 2007. Implicit online learning with kernels. In NIPS, Vol. 19. MIT Press, 249.
[52]
Shai Shalev-Shwartz. 2007. Online learning: Theory, algorithms, and applications. Ph.D. Thesis. The Hebrew University.
[53]
John Shawe-Taylor and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press.
[54]
Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199--222.
[55]
S�ren Sonnenburg, Gunnar R�tsch, Christin Sch�fer, and Bernhard Sch�lkopf. 2006. Large scale multiple kernel learning. Journal of Machine Learning Research 7 (2006), 1531--1565.
[56]
Francis E. H. Tay and Lijuan Cao. 2001. Application of support vector machines in financial time series forecasting. Omega 29, 4 (2001), 309--317.
[57]
U. Thissen, R. Van Brakel, A. P. De Weijer, W. J. Melssen, and L. M. C. Buydens. 2003. Using support vector machines for time series prediction. Chemometrics and Intelligent Laboratory Systems 69, 1 (2003), 35--49.
[58]
Steven Van Vaerenbergh and Ignacio Santamar�a. 2013. A comparative study of kernel adaptive filtering algorithms. In 2013 IEEE Digital Signal Processing Workshop and IEEE Signal Processing Education. Software available at https://github.com/steven2358/kafbox/.
[59]
V. G. Vovk. 1995. A game of prediction with expert advice. In COLT. ACM, 51--60.
[60]
Bernard W.idrow and Marcian E. Hoff. 1960. Adaptive switching circuits. MIT Press Cambridge, MA.
[61]
Christopher K. I. Williams and Matthias Seeger. 2001. Using the Nystr�m method to speed up kernel machines. In Advances in Neural Information Processing Systems. 682--688.
[62]
Lingfei Wu, Ian E. H. Yen, Jie Chen, and Rui Yan. 2016. Revisiting random binning features: Fast convergence and strong parallelizability. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1265--1274.
[63]
Yue Wu, Steven C. H. Hoi, Chenghao Liu, Jing Lu, Doyen Sahoo, and Nenghai Yu. 2017. SOL: A library for scalable online learning algorithms. Neurocomputing 260 (2017), 9--12.
[64]
Zenglin Xu, Rong Jin, Irwin King, and Michael R. Lyu. 2008. An extended level method for efficient multiple kernel learning. In NIPS. 1825--1832.
[65]
Haiqin Yang, Zenglin Xu, Irwin King, and Michael R. Lyu. 2010. Online learning for group lasso. In 27th ICML. 1191--1198.
[66]
Peilin Zhao, Jialei Wang, Pengcheng Wu, Rong Jin, and Steven C. H. Hoi. 2012. Fast bounded online gradient descent algorithms for scalable kernel-based online learning. In Proceedings of the 29th International Conference on Machine Learning. Omnipress, 1075--1082.
[67]
Jinfeng Zhuang, Ivor W. Tsang, and Steven C. H. Hoi. 2011. A family of simple non-parametric kernel learning algorithms. Journal of Machine Learning Research 12 (2011), 1313--1347.
[68]
Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In 20th International Conference on International Conference on Machine Learning.

Cited By

View all
  • (2024)Prediction of Extreme Weather Using Nonparametric Regression Approach with Fourier Series EstimatorsData and Metadata10.56294/dm20243193(319)Online publication date: 26-Jun-2024
  • (2024)Attention-Based Interval Aided Networks for Data Modeling of Heterogeneous Sampling Sequences With Missing Values in Process IndustryIEEE Transactions on Industrial Informatics10.1109/TII.2023.332968420:4(5253-5262)Online publication date: Apr-2024
  • (2024)An Online Multiple Kernel Parallelizable Learning SchemeIEEE Signal Processing Letters10.1109/LSP.2023.334318531(121-125)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 1
February 2019
340 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3301280
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2019
Accepted: 01 November 2018
Revised: 01 October 2018
Received: 01 July 2007
Published in TKDD Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Online learning
  2. large-scale kernel learning
  3. multiple kernel regression
  4. time-series prediction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • MOE project of Humanities and Social Science
  • Academic Team Building Plan for Young Scholars fromWuhan University
  • Fundamental Research Funds for the Central Universities
  • National Research Foundation Singapore under its AI Singapore
  • NRF Prime Minister?s Office, Singapore under its International Research Centres in Singapore Funding Initiative

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Prediction of Extreme Weather Using Nonparametric Regression Approach with Fourier Series EstimatorsData and Metadata10.56294/dm20243193(319)Online publication date: 26-Jun-2024
  • (2024)Attention-Based Interval Aided Networks for Data Modeling of Heterogeneous Sampling Sequences With Missing Values in Process IndustryIEEE Transactions on Industrial Informatics10.1109/TII.2023.332968420:4(5253-5262)Online publication date: Apr-2024
  • (2024)An Online Multiple Kernel Parallelizable Learning SchemeIEEE Signal Processing Letters10.1109/LSP.2023.334318531(121-125)Online publication date: 2024
  • (2024)Learning high-order fuzzy cognitive maps via multimodal artificial bee colony algorithm and nearest-better clustering: Applications on multivariate time series predictionKnowledge-Based Systems10.1016/j.knosys.2024.111771(111771)Online publication date: Apr-2024
  • (2024)Multimodal imputation-based stacked ensemble for prediction and classification of air quality index in Indian citiesComputers and Electrical Engineering10.1016/j.compeleceng.2024.109098114(109098)Online publication date: Mar-2024
  • (2023)Self-paced ARIMA for robust time series predictionKnowledge-Based Systems10.1016/j.knosys.2023.110489(110489)Online publication date: Mar-2023
  • (2023)Online evolutionary neural architecture search for multivariate non-stationary time series forecastingApplied Soft Computing10.1016/j.asoc.2023.110522145(110522)Online publication date: Sep-2023
  • (2022)Application of Smoothing Spline in Determining the Unmanned Ground Vehicles Route Based on Ultra-Wideband Distance MeasurementsSensors10.3390/s2221833422:21(8334)Online publication date: 30-Oct-2022
  • (2022)SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI ClassificationComputers, Materials & Continua10.32604/cmc.2022.02196871:1(1403-1425)Online publication date: 2022
  • (2022)A noise-resilient online learning algorithm with ramp loss for ordinal regressionIntelligent Data Analysis10.3233/IDA-20561326:2(379-405)Online publication date: 14-Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media