article

Learning to Forget: Continual Prediction with LSTM

Authors:

J�rgen A. Schmidhuber,

Fred A. CumminsAuthors Info & Claims

Neural Computation, Volume 12, Issue 10

Pages 2451 - 2471

https://doi.org/10.1162/089976600300015015

Published: 01 October 2000 Publication History

Abstract

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.

References

[1]

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166.

Digital Library

[2]

Cleeremans, A., Servan-Schreiber, D., & McClelland, J. L. (1989). Finite-state automata and simple recurrent networks. Neural Computation, 1, 372-381.

Digital Library

[3]

Cummins, F., Gers, F., & Schmidhuber, J. (1999). Language identification from prosody without explicit features. In Proceedings of EUROSPEECH'99 (Vol. 1, pp. 371-374).

[4]

Darken, C. (1995). Stochastic approximation and neural network learning. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (pp. 941- 944). Cambridge, MA: MIT Press.

Digital Library

[5]

Doya, K., & Yoshizawa, S. (1989). Adaptive neural oscillator using continuous-time backpropagation learning. Neural Networks, 2(5), 375-385.

Digital Library

[6]

Fahlman, S. E. (1991). The recurrent cascade-correlation learning algorithm. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in neural information processing systems, 3 (pp. 190-196). San Mateo, CA: Morgan Kaufmann.

Digital Library

[7]

Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM (Tech. Rep. No. IDSIA-01-99). Lugano, Switzerland: IDSIA.

Digital Library

[8]

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Technische Universit�t M�nchen. Available online at www7. informatik.tu-muenchen.de/~hochreit.

[9]

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

Digital Library

[10]

Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Eighth Annual Cognitive Science Society Conference. Hillsdale, NJ: Erlbaum.

[11]

Lin, T., Horne, B. G., Ti�o, P., & Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6), 1329-1338.

Digital Library

[12]

Mozer, M. C. (1989). A focused backpropagation algorithm for temporal pattern processing. Complex Systems, 3, 349-381.

[13]

Pearlmutter, B. A. (1995). Gradient calculation for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Networks, 6(5), 1212-1228.

Digital Library

[14]

Robinson, A. J., & Fallside, F. (1987). The utility driven dynamic error propagation network. (Tech. Rep. No. CUED/F-INFENG/TR.1). Cambridge: Cambridge University Engineering Department.

[15]

Schmidhuber, J. (1989). The neural bucket brigade: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4), 403-412.

[16]

Schmidhuber, J. (1992). A fixed size storage O(n ³) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2), 243-248.

Digital Library

[17]

Schraudolph, N. (1999). A fast, compact approximation of the exponential function. Neural Computationx, 11(4), 853-862.

Digital Library

[18]

Smith, A. W., & Zipser, D. (1989). Learning sequential structures with the real-time recurrent learning algorithm. International Journal of Neural Systems, 1(2), 125-131.

Digital Library

[19]

Waibel, A. (1989). Modular construction of time-delay neural networks for speech recognition. Neural Computation, 1(1), 39-46.

Digital Library

[20]

Werbos, P. J. (1988). Generalisation of backpropagation with application to a recurrent gas market model. Neural Networks, 1, 339-356.

[21]

Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2(4), 490-501.

Digital Library

[22]

Williams, R. J., & Zipser, D. (1992). Gradient-based learning algorithms for recurrent networks and their computational complexity. In Y. Chauvin & D. E. Rumelhart (Eds.), Back-propagation: Theory, architectures and applications. Hillsdale, NJ: Erlbaum.

Digital Library

Cited By

Donta PXu ALi Y(2024)Marketing Decision Model and Consumer Behavior Prediction With Deep LearningJournal of Organizational and End User Computing10.4018/JOEUC.33654736:1(1-25)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.4018/JOEUC.336547
Zhang HYan BCao LMadden SRundensteiner E(2024)MetaStore: Analyzing Deep Learning Meta-Data at ScaleProceedings of the VLDB Endowment10.14778/3648160.364818217:6(1446-1459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648182
Du WLi ZXie Z(2024)A modified LSTM network to predict the citation counts of papersJournal of Information Science10.1177/0165551522111100050:4(894-909)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1177/01655515221111000
Show More Cited By

Learning to Forget: Continual Prediction with LSTM
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Learning to Forget: Continual Prediction with LSTM
Software failure time series prediction with RBF, GRNN, and LSTM neural networks
Abstract
The important task of software quality assurance is failure prediction. Time series forecasting methods can be successfully used for this purpose. This paper aims to study and compare the effectiveness of software failure prediction using ...
Soil moisture prediction model based on LSTM and Elman neural network
AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and System

China is a large agricultural country, and in the process of agricultural production, it is very important to make accurate prediction of soil moisture. To address the problems of local minimization and slow convergence of traditional BP (back ...

Comments

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 12, Issue 10

October 2000

242 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 October 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

560
Total Citations
View Citations
10
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Donta PXu ALi Y(2024)Marketing Decision Model and Consumer Behavior Prediction With Deep LearningJournal of Organizational and End User Computing10.4018/JOEUC.33654736:1(1-25)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.4018/JOEUC.336547
Zhang HYan BCao LMadden SRundensteiner E(2024)MetaStore: Analyzing Deep Learning Meta-Data at ScaleProceedings of the VLDB Endowment10.14778/3648160.364818217:6(1446-1459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648182
Du WLi ZXie Z(2024)A modified LSTM network to predict the citation counts of papersJournal of Information Science10.1177/0165551522111100050:4(894-909)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1177/01655515221111000
Li YTu LZhang C(2024)A State-of-Health Estimation Method for Lithium Batteries Based on Incremental Energy Analysis and Bayesian TransformerJournal of Electrical and Computer Engineering10.1155/2024/58221062024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/5822106
Tian BKong XWu G(2024)The Application of the LSTM Neural Networks on the Hydrology ForecastProceedings of the 2024 6th International Conference on Pattern Recognition and Intelligent Systems10.1145/3689218.3689233(93-97)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3689218.3689233
Zhao FXu HHuang ZYang W(2024)NASDAQ 100 Index Prediction Using LSTM And Sentiment AnalysisProceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning10.1145/3677779.3677837(351-357)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3677779.3677837
Matsukawa SSuzuki AArasawa KMatsuzaki H(2024)Drum groove visualization using information distribution maps at LSTM variational autoencoderProceedings of the 2024 6th International Conference on Image, Video and Signal Processing10.1145/3655755.3655780(185-191)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3655755.3655780
Chen LLuo XZhou H(2024)A Hierarchical Underwater Acoustic Target Recognition Method Based on Transformer and Transfer LearningProceedings of the 2024 6th International Conference on Image, Video and Signal Processing10.1145/3655755.3655758(16-24)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3655755.3655758
Wu TLi LGao HQi GWang YLi Y(2024)NPEL: Neural Paired Entity Linking in Web TablesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3652511Online publication date: 19-Mar-2024
https://doi.org/10.1145/3652511
Yao RWu O(2024)A Taxonomy for Learning with Perturbation and AlgorithmsACM Transactions on Knowledge Discovery from Data10.1145/364439118:5(1-38)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3644391
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents