skip to main content
article

Decreasing the size of the restricted boltzmann machine

Published: 01 April 2019 Publication History

Abstract

In this letter, we propose a method to decrease the number of hidden units of the restricted Boltzmann machine while avoiding a decrease in the performance quantified by the Kullback-Leibler divergence. Our algorithm is then demonstrated by numerical simulations.

References

[1]
Ackley, D.H., Hinton, G. E., & Sejnowski, T. J. (1987). A learning algorithm for Boltzmann machines. In M.A. Fischler & O. Firschein (Eds.), Readings in computer vision (pp. 522-533). Amsterdam: Elsevier.
[2]
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: Areview and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
[3]
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In B. Sch�lkopf, J. C. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19 (pp. 153-160). Cambridge, MA: MIT Press.
[4]
Berglund, M., Raiko, T., & Cho, K. (2015). Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Networks, 64, 12-18.
[5]
Buchaca, D., Romero, E., Mazzanti, F., & Delgado, J. (2013). Stopping criteria in contrastive divergence: Alternatives to the reconstruction error. arXiv:1312.6062.
[6]
Carleo, G., & Troyer, M. (2017). Solving the quantum many-body problem with artificial neural networks. Science, 355(6325), 602-606.
[7]
Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282.
[8]
Cho, K., Raiko, T., & Ilin, A. (2010). Parallel tempering is efficient for learning restricted boltzmann machines. In Proceedings of the 2010 International Joint Conference on Neural Networks. Piscataway, NJ: IEEE.
[9]
Christopher, M. B. (2016). Pattern recognition and machine learning. New York: Springer-Verlag.
[10]
Desjardins, G., Bengio, Y., & Courville, A. C. (2011). On tracking the partition function. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 24 (pp. 2501-2509). Red Hook, NY: Curran.
[11]
Desjardins, G., Courville, A., Bengio, Y., Vincent, P., & Delalleau, O. (2010). Tempered-Markov Chain Monte Carlo for training of restricted boltzmann machines. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 145-152).
[12]
Dumoulin, V., Goodfellow, I. J., Courville, A. C., & Bengio, Y. (2014). On the challenges of physical implementations of RBMS. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (vol. 2014, pp. 1199-1205).
[13]
Fischer, A., & Igel, C. (2010). Empirical analysis of the divergence of gibbs sampling based learning algorithms for restricted Boltzmann machines. In Proceedings of the International Conference on Artificial Neural Networks (pp. 208-217). Cambridge, MA: AAAI Press.
[14]
Fischer, A., & Igel, C. (2012). An introduction to restricted Boltzmann Machines. In Proceedings of the Iberoamerican Congress on Pattern Recognition (pp. 14-36). Berlin: Springer.
[15]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. O. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 2672-2680). Red Hook, NY: Curran.
[16]
Guo, Y., Yao, A., & Chen, Y. (2016). Dynamic network surgery for efficient DNNS. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 1379-1387). Red Hook, NY: Curran.
[17]
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems, 28 (pp. 1135-1143). Red Hook, NY: Curran.
[18]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 770-778). Piscataway, NJ: IEEE.
[19]
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... Zhou, Y. (2017). Deep learning scaling is predictable, empirically. arXiv:1712.00409.
[20]
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800.
[21]
Hinton, G. E. (2012). Apractical guide to training restricted Boltzmann machines. In G. Montavon, G. B. Orr, & K.-R. M�ller (Eds.), Neural networks: Tricks of the trade (pp. 599-619). New York: Springer.
[22]
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
[23]
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
[24]
Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th International Conference on machine learning (pp. 536-543). Madison, WI: Omnipress.
[25]
LeCun, Y., & Cortes, C. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
[26]
Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631-1649.
[27]
MacKay, D. J., & Mac Kay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge: Cambridge University Press.
[28]
Neal, R. M. (1996). Sampling from multimodal distributions using tempered transitions. Statistics and Computing, 6(4), 353-366.
[29]
Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125-139.
[30]
Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv:1609.03499.
[31]
Salakhutdinov, R. (2009). Learning in Markov random fields using tempered transitions. In Y. Bengio, D. Schuurmans, J. L. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1598-1606). Red Hook, NY: Curran.
[32]
Salakhutdinov, R., & Hinton, G. E. (2009). Deep Boltzmann machines. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (vol. 3, p. 3).
[33]
Salakhutdinov, R., & Larochelle, H. (2010). Efficient learning of deep Boltzmannmachines. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 693-700).
[34]
Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning (pp. 872-879). Madison, WI: Omnipress.
[35]
Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory (Tech. Rep.) Colorado University at Boulder, Department of Computer Science.
[36]
Taylor, G. W., Hinton, G. E., & Roweis, S. T. (2007). Modeling human motion using binary latent variables. In B. Sch�lkopf, J. C. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19 (pp. 1345-1352). Cambridge, MA: MIT Press.
[37]
Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning (pp. 1064-1071). Madison, WI: Omni press.
[38]
Tieleman, T., & Hinton, G. (2009). Using fast weights to improve persistent contrastive divergence. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1033-1040). Madison, WI: Omnipress.
[39]
Tubiana, J., & Monasson, R. (2017). Emergence of compositional representations in restricted Boltzmann machines. Physical Review Letters, 118(13), 138301.
[40]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ?., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems, 30 (pp. 6000-6010). Red Hook, NY: Curran.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 31, Issue 4
April 2019
213 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 April 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media