skip to main content
article

Minimal gated unit for recurrent neural networks

Published: 01 June 2016 Publication History

Abstract

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically.

References

[1]
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[2]
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, Nevada, USA, pp. 1097-1105, 2012.
[3]
K. Cho, B. van Meri�nboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1724-1734, 2014.
[4]
I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Montreal, Canada, pp. 3104-3112, 2014.
[5]
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations 2015, San Diego, USA, 2015.
[6]
A. Graves, A. R. Mohamed, G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 6645-6649, 2013.
[7]
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2048-2057, 2015.
[8]
A. Karpathy, F. F. Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3128-3137, 2015.
[9]
R. Lebret, P. O. Pinheiro, R. Collobert. Phrase-based image captioning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2085-2094, 2015.
[10]
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2625-2634, 2015.
[11]
N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 843-852, 2015.
[12]
X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of Advances in Neural Information Processing Systems 28, NIPS, Montreal, Canada, pp. 802-810, 2015.
[13]
M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Zurich, Switzerland, vol. 8689, pp. 818-833, 2014.
[14]
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[15]
F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks, IEEE, Edinburgh, UK, vol. 2, pp. 850-855, 1999.
[16]
F. A. Gers, N. N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, vol. 3, pp. 115-143, 2003.
[17]
J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.
[18]
R. Jozefowicz, W. Zaremba, I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2342-2350, 2015.
[19]
K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, J. Schmidhuber. LSTM: A search space odyssey. arXiv:1503.04069, 2015.
[20]
Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994.
[21]
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. Neural Networks, vol. 18, no. 5-6, pp. 602-610, 2005.
[22]
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, M. Ranzato. Learning longer memory in recurrent neural networks. In Proceedings of International Conference on Learning Representations, San Diego, CA, 2015.
[23]
Q. V. Le, N. Jaitly, G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv: 1504.00941, 2015.
[24]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL, Stroudsburg, USA, pp. 142-150, 2011.
[25]
M. P. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, vol. 19, no. 2, pp. 313-330, 1993.
[26]
W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.
[27]
Z. Z. Wu, S. King. Investigating gated recurrent neural networks for speech synthesis. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, 2016.

Cited By

View all
  • (2024)Power terminal anomaly monitoring technology based on autoencoder and multi-layer perceptronProceedings of the 2024 3rd International Conference on Networks, Communications and Information Technology10.1145/3672121.3672130(39-43)Online publication date: 7-Jun-2024
  • (2024)Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With NeuroevolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664145(1924-1933)Online publication date: 14-Jul-2024
  • (2024)Revolutionizing gear hobbing machine precisionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122826242:COnline publication date: 16-May-2024
  • Show More Cited By
  1. Minimal gated unit for recurrent neural networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of Automation and Computing
    International Journal of Automation and Computing  Volume 13, Issue 3
    June 2016
    106 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 June 2016

    Author Tags

    1. Recurrent neural network
    2. deep learning
    3. gate recurrent unit (GRU)
    4. gated unit
    5. long short-term memory (LSTM)
    6. minimal gated unit (MGU)

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Power terminal anomaly monitoring technology based on autoencoder and multi-layer perceptronProceedings of the 2024 3rd International Conference on Networks, Communications and Information Technology10.1145/3672121.3672130(39-43)Online publication date: 7-Jun-2024
    • (2024)Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With NeuroevolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664145(1924-1933)Online publication date: 14-Jul-2024
    • (2024)Revolutionizing gear hobbing machine precisionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122826242:COnline publication date: 16-May-2024
    • (2024)Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graphMultimedia Systems10.1007/s00530-024-01406-930:4Online publication date: 10-Jul-2024
    • (2023)Adaptive Modularized Recurrent Neural Networks for Electric Load ForecastingJournal of Database Management10.4018/JDM.32343634:1(1-18)Online publication date: 18-May-2023
    • (2023)A hybrid model for text classification using part-of-speech featuresJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23169945:1(1235-1249)Online publication date: 1-Jan-2023
    • (2023)A Minimal “Functionally Sentient” Organism Trained With Backpropagation Through TimeAdaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems10.1177/1059712323116641631:6(531-544)Online publication date: 15-Nov-2023
    • (2023)Designing and Training of Lightweight Neural Networks on Edge Devices Using Early Halting in Knowledge DistillationIEEE Transactions on Mobile Computing10.1109/TMC.2023.329702623:5(4665-4677)Online publication date: 19-Jul-2023
    • (2023)Parking Prediction in Smart Cities: A SurveyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.327902424:10(10302-10326)Online publication date: 1-Oct-2023
    • (2023)Policy gradient empowered LSTM with dynamic skips for irregular time series dataApplied Soft Computing10.1016/j.asoc.2023.110314142:COnline publication date: 1-Jul-2023
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media