article

Gated orthogonal recurrent units: On learning to forget

Authors:

Caglar Gulcehre,

Marin Soljacic,

Yoshua BengioAuthors Info & Claims

Neural Computation, Volume 31, Issue 4

Pages 765 - 783

https://doi.org/10.1162/neco_a_01174

Published: 01 April 2019 Publication History

Abstract

We present a novel recurrent neural network RNN-based model that combines the remembering ability of unitary evolution RNNs with the ability of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.

References

[1]

Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 1120-1128). Berlin: Springer-Verlag.

Digital Library

[2]

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166.

Digital Library

[3]

Chan, W., Jaitly, N., Le, Q. V., & Vinyals, O. (2016). Listen, attend and spell. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE.

[4]

Cho, K., Van Merri�nboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 103-111). Stroudsburg, PA: Association for Computational Linguistics.

[5]

Cho, K., Van Merri�nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1724-1734). Stroudsburg, PA: Association for Computational Linguistics.

[6]

Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems (pp. 577-585). Red Hook, NY: Curran.

Digital Library

[7]

Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., & Zue, V. (1993). Darpa timit acoustic-phonetic continous speech corpus (Technical Report NISTIR 4930). Philadelphia: Linguistic Data Consortium.

[8]

Gers, F. (2001). Long short-term memory in recurrent neural networks. Ph.D. diss., Ecole Polytechnique F�d�rale de Lausanne.

[9]

Gers, F. A., Schmidhuber, J., & Cummins, F. A. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12, 2451-2471.

Digital Library

[10]

Graves, A., Wayne, G., & Danihelka, I. (2014). Neural Turing Machines. arXiv:1410.5401.

[11]

Henaff, M., Szlam, A., & LeCun, Y. (2016). Recurrent orthogonal networks and long-memory tasks. In M. F. Balcan & K. Q. Weinberger (Eds.), Proceedings of the 33rd International Conference on Machine Learning (pp. 2034-2042). Berlin: Springer-Verlag.

Digital Library

[12]

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen netzen. Diploma thesis Technische Universit�t M�nchen.

[13]

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

Digital Library

[14]

Hyland, S., & Rtsch, G. (2017). Learning unitary operators with help from u(n). In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 2050-2058).

Digital Library

[15]

Jing, L., Shen, Y., Dubcek, T., John, P., Skirlo, S., LeCun, Y.,... Soljacic, M. (2016). Tunable efficient unitary neural networks (EUNN) and their application to RNNS. arXiv:1612.05231.

[16]

Jordan, M. (1997). Serial order: A parallel distributed processing approach. Advances in Psychology, 121, 471-495.

[17]

Karpathy, A., Johnson, J., & Li, F.-F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078.

[18]

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR, abs/1412.6980.

[19]

Le, Q. V., Jaitly, N., & Hinton, G. E. (2015). A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941.

[20]

Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2015). Gated graph sequence neural networks. arXiv:1511.05493.

[21]

Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19, 313-330.

Digital Library

[22]

Mhammedi, Z., Hellicar, A. D., Rahman, A., & Bailey, J. (2017). Efficient orthogonal parameterisation of recurrent neural networks using householder reflections. In Proceedings of the International Conference on Machine Learning.

Digital Library

[23]

Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning.

Digital Library

[24]

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 3104-3112). Red Hook, NY: Curran.

Digital Library

[25]

Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26-31.

[26]

Weston, J., Bordes, A., Chopra, S., Rush, A., van Merrinboer, B., Joulin, A., & Mikolov, T. (2015). Towards AI-complete question answering: A set of prerequisite toy tasks. arXiv:1502.05698.

[27]

Wisdom, S., Powers, T., Hershey, J., Le Roux, J., & Atlas, L. (2016). Full-capacity unitary recurrent neural networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 4880-4888).

Cited By

Huang FZhuang SYu ZChen YGuo K(2023)Adaptive Modularized Recurrent Neural Networks for Electric Load ForecastingJournal of Database Management10.4018/JDM.32343634:1(1-18)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.4018/JDM.323436
Huang LQin JZhou YZhu FLiu LShao L(2023)Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325024145:8(10173-10196)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3250241
Ezzini SAbualhaija SArora CSabetzadeh MGrundy JPollock LPenta M(2023)AI-Based Question Answering Assistance for Analyzing Natural-Language RequirementsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00113(1277-1289)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00113
Show More Cited By

Recommendations

Minimal gated unit for recurrent neural networks

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such ...
Gated recurrent units based neural network for time heterogeneous feedback recommendation

Nowadays, recommender systems face the problem of time heterogeneous feedback recommendation, in which items are recommended according to several kinds of user feedback with time stamps. Previously proposed recurrent neural network based recommendation ...
Gated feedback recurrent neural networks
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper ...

Comments

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 31, Issue 4

April 2019

213 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 April 2019

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang FZhuang SYu ZChen YGuo K(2023)Adaptive Modularized Recurrent Neural Networks for Electric Load ForecastingJournal of Database Management10.4018/JDM.32343634:1(1-18)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.4018/JDM.323436
Huang LQin JZhou YZhu FLiu LShao L(2023)Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325024145:8(10173-10196)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3250241
Ezzini SAbualhaija SArora CSabetzadeh MGrundy JPollock LPenta M(2023)AI-Based Question Answering Assistance for Analyzing Natural-Language RequirementsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00113(1277-1289)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00113
Taylor-Melanson WFerreira MMatwin S(2023)SGORNNNeural Networks10.1016/j.neunet.2022.11.028159:C(25-33)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.neunet.2022.11.028
Mikhaeil JMonfared ZDurstewitz DKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)On the difficulty of learning chaotic dynamics with RNNsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601091(11297-11312)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601091
Hao TLi XHe YWang FQu Y(2022)Recent progress in leveraging deep learning methods for question answeringNeural Computing and Applications10.1007/s00521-021-06748-334:4(2765-2783)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s00521-021-06748-3
Chen LXie BChen Y(2021)Review of deep learning networkProceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence10.1145/3507548.3507601(347-351)Online publication date: 4-Dec-2021
https://dl.acm.org/doi/10.1145/3507548.3507601
Gu FChung MChignell MValaee SZhou BLiu X(2021)A Survey on Deep Learning for Human Activity RecognitionACM Computing Surveys10.1145/347229054:8(1-34)Online publication date: 4-Oct-2021
https://dl.acm.org/doi/10.1145/3472290
Lalapura VAmudha JSatheesh H(2021)Recurrent Neural Networks for Edge IntelligenceACM Computing Surveys10.1145/344897454:4(1-38)Online publication date: 24-May-2021
https://dl.acm.org/doi/10.1145/3448974
Landi FBaraldi LCornia MCucchiara R(2021)Working Memory Connections for LSTMNeural Networks10.1016/j.neunet.2021.08.030144:C(334-341)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1016/j.neunet.2021.08.030
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents