skip to main content
article

Gated orthogonal recurrent units: On learning to forget

Published: 01 April 2019 Publication History

Abstract

We present a novel recurrent neural network RNN-based model that combines the remembering ability of unitary evolution RNNs with the ability of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.

References

[1]
Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 1120-1128). Berlin: Springer-Verlag.
[2]
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166.
[3]
Chan, W., Jaitly, N., Le, Q. V., & Vinyals, O. (2016). Listen, attend and spell. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE.
[4]
Cho, K., Van Merri�nboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 103-111). Stroudsburg, PA: Association for Computational Linguistics.
[5]
Cho, K., Van Merri�nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1724-1734). Stroudsburg, PA: Association for Computational Linguistics.
[6]
Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems (pp. 577-585). Red Hook, NY: Curran.
[7]
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., & Zue, V. (1993). Darpa timit acoustic-phonetic continous speech corpus (Technical Report NISTIR 4930). Philadelphia: Linguistic Data Consortium.
[8]
Gers, F. (2001). Long short-term memory in recurrent neural networks. Ph.D. diss., Ecole Polytechnique F�d�rale de Lausanne.
[9]
Gers, F. A., Schmidhuber, J., & Cummins, F. A. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12, 2451-2471.
[10]
Graves, A., Wayne, G., & Danihelka, I. (2014). Neural Turing Machines. arXiv:1410.5401.
[11]
Henaff, M., Szlam, A., & LeCun, Y. (2016). Recurrent orthogonal networks and long-memory tasks. In M. F. Balcan & K. Q. Weinberger (Eds.), Proceedings of the 33rd International Conference on Machine Learning (pp. 2034-2042). Berlin: Springer-Verlag.
[12]
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen netzen. Diploma thesis Technische Universit�t M�nchen.
[13]
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
[14]
Hyland, S., & Rtsch, G. (2017). Learning unitary operators with help from u(n). In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 2050-2058).
[15]
Jing, L., Shen, Y., Dubcek, T., John, P., Skirlo, S., LeCun, Y.,... Soljacic, M. (2016). Tunable efficient unitary neural networks (EUNN) and their application to RNNS. arXiv:1612.05231.
[16]
Jordan, M. (1997). Serial order: A parallel distributed processing approach. Advances in Psychology, 121, 471-495.
[17]
Karpathy, A., Johnson, J., & Li, F.-F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078.
[18]
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
[19]
Le, Q. V., Jaitly, N., & Hinton, G. E. (2015). A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941.
[20]
Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2015). Gated graph sequence neural networks. arXiv:1511.05493.
[21]
Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19, 313-330.
[22]
Mhammedi, Z., Hellicar, A. D., Rahman, A., & Bailey, J. (2017). Efficient orthogonal parameterisation of recurrent neural networks using householder reflections. In Proceedings of the International Conference on Machine Learning.
[23]
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning.
[24]
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 3104-3112). Red Hook, NY: Curran.
[25]
Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26-31.
[26]
Weston, J., Bordes, A., Chopra, S., Rush, A., van Merrinboer, B., Joulin, A., & Mikolov, T. (2015). Towards AI-complete question answering: A set of prerequisite toy tasks. arXiv:1502.05698.
[27]
Wisdom, S., Powers, T., Hershey, J., Le Roux, J., & Atlas, L. (2016). Full-capacity unitary recurrent neural networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 4880-4888).

Cited By

View all
  • (2023)Adaptive Modularized Recurrent Neural Networks for Electric Load ForecastingJournal of Database Management10.4018/JDM.32343634:1(1-18)Online publication date: 18-May-2023
  • (2023)Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325024145:8(10173-10196)Online publication date: 1-Aug-2023
  • (2023)AI-Based Question Answering Assistance for Analyzing Natural-Language RequirementsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00113(1277-1289)Online publication date: 14-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 31, Issue 4
April 2019
213 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 April 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Adaptive Modularized Recurrent Neural Networks for Electric Load ForecastingJournal of Database Management10.4018/JDM.32343634:1(1-18)Online publication date: 18-May-2023
  • (2023)Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325024145:8(10173-10196)Online publication date: 1-Aug-2023
  • (2023)AI-Based Question Answering Assistance for Analyzing Natural-Language RequirementsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00113(1277-1289)Online publication date: 14-May-2023
  • (2023)SGORNNNeural Networks10.1016/j.neunet.2022.11.028159:C(25-33)Online publication date: 1-Feb-2023
  • (2022)On the difficulty of learning chaotic dynamics with RNNsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601091(11297-11312)Online publication date: 28-Nov-2022
  • (2022)Recent progress in leveraging deep learning methods for question answeringNeural Computing and Applications10.1007/s00521-021-06748-334:4(2765-2783)Online publication date: 1-Feb-2022
  • (2021)Review of deep learning networkProceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence10.1145/3507548.3507601(347-351)Online publication date: 4-Dec-2021
  • (2021)A Survey on Deep Learning for Human Activity RecognitionACM Computing Surveys10.1145/347229054:8(1-34)Online publication date: 4-Oct-2021
  • (2021)Recurrent Neural Networks for Edge IntelligenceACM Computing Surveys10.1145/344897454:4(1-38)Online publication date: 24-May-2021
  • (2021)Working Memory Connections for LSTMNeural Networks10.1016/j.neunet.2021.08.030144:C(334-341)Online publication date: 1-Dec-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media