skip to main content
10.5555/2969442.2969583guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Semi-supervised Sequence Learning

Published: 07 December 2015 Publication History

Abstract

We present two approaches to use unlabeled data to improve Sequence Learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.

References

[1]
R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res., 6:1817-1853, December 2005.
[2]
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. In JMLR, 2003.
[3]
A. Cardoso-Cachopo. Datasets for single-label text categorization. http://web.ist.utl.pt/acardoso/datasets/, 2015. [Online; accessed 25-May-2015].
[4]
William Chan, Navdeep Jaitly, Quoc V Le, and Oriol Vinyals. Listen, attend and spell. arXiv preprint arXiv:1508.01211, 2015.
[5]
Y. Dauphin and Y. Bengio. Stochastic ratio matching of RBMs for sparse high-dimensional inputs. In NIPS, 2013.
[6]
F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LSTM. Neural Computation, 2000.
[7]
A. Graves. Generating sequences with recurrent neural networks. In Arxiv, 2013.
[8]
K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. In ICML, 2015.
[9]
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks, 2001.
[10]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997.
[11]
S. Jean, K. Cho, R. Memisevic, and Y. Bengio. On using very large target vocabulary for neural machine translation. In ICML, 2014.
[12]
R. Johnson and T. Zhang. Effective use of word order for text categorization with convolutional neural networks. In NAACL, 2014.
[13]
Y. Kim. Convolutional neural networks for sentence classification, 2014.
[14]
R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-thought vectors. In NIPS, 2015.
[15]
A. Krizhevsky. Convolutional deep belief networks on CIFAR-10. Technical report, University of Toronto, 2010.
[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[17]
K. Lang. Newsweeder: Learning to filter netnews. In ICML, 1995.
[18]
H. Larochelle, M. Mandel, R. Pascanu, and Y. Bengio. Learning algorithms for the classification restricted boltzmann machine. JMLR, 2012.
[19]
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, 2014.
[20]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, et al. DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 2014.
[21]
T. Luong, I. Sutskever, Q. V. Le, O. Vinyals, and W. Zaremba. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206, 2014.
[22]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In ACL, 2011.
[23]
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys, pages 165-172. ACM, 2013.
[24]
T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, 2010.
[25]
J. Y. H. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
[26]
B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, 2005.
[27]
D. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 1986.
[28]
L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. In EMNLP, 2015.
[29]
R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. Semantic compositionality through recursive matrix-vector spaces. In EMNLP, 2012.
[30]
R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.
[31]
N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In ICML, 2015.
[32]
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
[33]
O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. In NIPS, 2015.
[34]
O. Vinyals and Q. V. Le. A neural conversational model. In ICML Deep Learning Workshop, 2015.
[35]
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2014.
[36]
S. I. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In ACL, 2012.
[37]
P. J. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard, 1974.
[38]
W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
[39]
X. Zhang and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015.

Cited By

View all
  • (2024)Evaluating and Improving ChatGPT for Unit Test GenerationProceedings of the ACM on Software Engineering10.1145/36607831:FSE(1703-1726)Online publication date: 12-Jul-2024
  • (2024)Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-trainingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671513(4839-4849)Online publication date: 25-Aug-2024
  • (2024)CL4DIV: A Contrastive Learning Framework for Search Result DiversificationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635851(171-180)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2
December 2015
3626 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 07 December 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating and Improving ChatGPT for Unit Test GenerationProceedings of the ACM on Software Engineering10.1145/36607831:FSE(1703-1726)Online publication date: 12-Jul-2024
  • (2024)Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-trainingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671513(4839-4849)Online publication date: 25-Aug-2024
  • (2024)CL4DIV: A Contrastive Learning Framework for Search Result DiversificationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635851(171-180)Online publication date: 4-Mar-2024
  • (2024)Self-Paced Pairwise Representation Learning for Semi-Supervised Text ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645664(4352-4361)Online publication date: 13-May-2024
  • (2023)BrainformersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620199(42531-42542)Online publication date: 23-Jul-2023
  • (2023)The flan collectionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619349(22631-22648)Online publication date: 23-Jul-2023
  • (2023)Investigating Unsupervised Neural Machine Translation for Low-resource Language Pair English-Mizo via Lexically Enhanced Pre-trained Language ModelsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360922222:8(1-18)Online publication date: 23-Aug-2023
  • (2022)OrdinalCLIPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602829(35313-35325)Online publication date: 28-Nov-2022
  • (2022)Merging models with fisher-weighted averagingProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601557(17703-17716)Online publication date: 28-Nov-2022
  • (2022)HETProceedings of the VLDB Endowment10.14778/3489496.348951115:2(312-320)Online publication date: 4-Feb-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media