Abstract
The naïve Bayes approach is a simple but often satisfactory method for supervised classification. In this paper, we focus on the naïve Bayes model and propose the application of regularization techniques to learn a naïve Bayes classifier. The main contribution of the paper is a stagewise version of the selective naïve Bayes, which can be considered a regularized version of the naïve Bayes model. We call it forward stagewise naïve Bayes. For comparison’s sake, we also introduce an explicitly regularized formulation of the naïve Bayes model, where conditional independence (absence of arcs) is promoted via an L 1/L 2-group penalty on the parameters that define the conditional probability distributions. Although already published in the literature, this idea has only been applied for continuous predictors. We extend this formulation to discrete predictors and propose a modification that yields an adaptive penalization. We show that, whereas the L 1/L 2 group penalty formulation only discards irrelevant predictors, the forward stagewise naïve Bayes can discard both irrelevant and redundant predictors, which are known to be harmful for the naïve Bayes classifier. Both approaches, however, usually improve the classical naïve Bayes model’s accuracy.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akaike H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Bergmann, G., Hommel, G.: Improvements of general multiple test procedures for redundant systems of hypotheses. In: Multiple Hypotheses Testing, pp. 100–115. Springer, Berlin (1988)
Boyd S., Vandenberghe L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Domingos P., Pazzani M.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. Mach. Learn. 29, 103–130 (1997)
Drugan M.M., Wiering M.A.: Feature selection for Bayesian network classifiers using the MDL-FS score. Int. J. Approx. Reason. 51, 695–717 (2010)
Ferreira, J.T.A.S., Denison, D.G.T., Hand, D.J.: Data mining with products of trees. In: Advances in Intelligent Data Analysis, Lecture Notes in Computer Science, vol. 2189, pp. 167–176. Springer, Berlin (2001)
Friedman N., Geiger D., Goldszmidt M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)
García S., Herrera F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
van Gerven, M., Heskes, T.: L 1/L p regularization of differences. Tech. Rep. ICIS-R08009, Radboud University Nijmegen (2008)
Hall, M.: Induction of selective Bayesian classifiers. In: Proceedings of the 17th International Conference on Machine Learning, pp. 359–366 (2000)
Hand D.J., Yu K.: Idiot’s Bayes—not so stupid after all. Int. Stat. Rev. 69, 385–398 (2001)
Kohavi R., John G.H.: Wrappers for feature subset selection. Artif. Intell. 29, 273–324 (1996)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406 (1994)
Minsky, M.: Steps toward artificial intelligence. In: Computers and Thought, pp. 406–450. McGraw-Hill, New York (1961)
Tibshirani R.: Regression shrinkage and selection via the Lasso. J. Royal Stat. Soc. Ser. B 58, 267–288 (1996)
Tseng P.: Convergence of block coordinate descent method for nondifferentiable minimation. J. Optim. Theory Appl. 109, 475–494 (2001)
Weisberg S.: Applied Linear Regression. Wiley, New York (1980)
Yuan M., Lin Y.: Model selection and estimation in regression with grouped variables. J. Royal Stat. Soc. Ser. B 70, 53–71 (2006)
Zou H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vidaurre, D., Bielza, C. & Larrañaga, P. Forward stagewise naïve Bayes. Prog Artif Intell 1, 57–69 (2012). https://doi.org/10.1007/s13748-011-0001-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-011-0001-7