Search | arXiv e-print repository

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Authors: Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, Nan Duan

Abstract: Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further… ▽ More Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed CoCLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%, and incorporating CoCLR brings a further improvement of 10.5%. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: ACL 2021 main conference. The CoSQA data and leaderboard are available at https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery. The code is available at https://github.com/Jun-jie-Huang/CoCLR

arXiv:2105.11174 [pdf, other]

Retrieval Enhanced Model for Commonsense Generation

Authors: Han Wang, Yang Liu, Chenguang Zhu, Linjun Shou, Ming Gong, Yichong Xu, Michael Zeng

Abstract: Commonsense generation is a challenging task of generating a plausible sentence describing an everyday scenario using provided concepts. Its requirement of reasoning over commonsense knowledge and compositional generalization ability even puzzles strong pre-trained language generation models. We propose a novel framework using retrieval methods to enhance both the pre-training and fine-tuning for… ▽ More Commonsense generation is a challenging task of generating a plausible sentence describing an everyday scenario using provided concepts. Its requirement of reasoning over commonsense knowledge and compositional generalization ability even puzzles strong pre-trained language generation models. We propose a novel framework using retrieval methods to enhance both the pre-training and fine-tuning for commonsense generation. We retrieve prototype sentence candidates by concept matching and use them as auxiliary input. For fine-tuning, we further boost its performance with a trainable sentence retriever. We demonstrate experimentally on the large-scale CommonGen benchmark that our approach achieves new state-of-the-art results. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: Findings of ACL-IJCNLP 2021

arXiv:2104.05480 [pdf, other]

doi 10.14778/3457390.3457401

Towards Crowd-aware Indoor Path Planning (Extended Version)

Authors: Tiantian Liu, Huan Li, Hua Lu, Muhammad Aamir Cheema, Lidan Shou

Abstract: Indoor venues accommodate many people who collectively form crowds. Such crowds in turn influence people's routing choices, e.g., people may prefer to avoid crowded rooms when walking from A to B. This paper studies two types of crowd-aware indoor path planning queries. The Indoor Crowd-Aware Fastest Path Query (FPQ) finds a path with the shortest travel time in the presence of crowds, whereas the… ▽ More Indoor venues accommodate many people who collectively form crowds. Such crowds in turn influence people's routing choices, e.g., people may prefer to avoid crowded rooms when walking from A to B. This paper studies two types of crowd-aware indoor path planning queries. The Indoor Crowd-Aware Fastest Path Query (FPQ) finds a path with the shortest travel time in the presence of crowds, whereas the Indoor Least Crowded Path Query (LCPQ) finds a path encountering the least objects en route. To process the queries, we design a unified framework with three major components. First, an indoor crowd model organizes indoor topology and captures object flows between rooms. Second, a time-evolving population estimator derives room populations for a future timestamp to support crowd-aware routing cost computations in query processing. Third, two exact and two approximate query processing algorithms process each type of query. All algorithms are based on graph traversal over the indoor crowd model and use the same search framework with different strategies of updating the populations during the search process. All proposals are evaluated experimentally on synthetic and real data. The experimental results demonstrate the efficiency and scalability of our framework and query processing algorithms. △ Less

Submitted 29 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: The extension of a VLDB'21 paper "Towards Crowd-aware Indoor Path Planning"

arXiv:2104.01767 [pdf, other]

WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Authors: Junjie Huang, Duyu Tang, Wanjun Zhong, Shuai Lu, Linjun Shou, Ming Gong, Daxin Jiang, Nan Duan

Abstract: Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have there main findings. First, ave… ▽ More Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have there main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top andbottom layers is better than only using top layers. Lastly, an easy whitening-based vector normalization strategy with less than 10 lines of code consistently boosts the performance. △ Less

Submitted 8 April, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

arXiv:2103.16333 [pdf, ps, other]

doi 10.4208/cmr.2021-0039

Global weak solutions for compressible Navier-Stokes-Vlasov-Fokker-Planck system

Authors: Hai-Liang Li, Ling-Yun Shou

Abstract: The one-dimensional compressible Navier-Stokes-Vlasov-Fokker-Planck system with density-dependent viscosity and drag force coefficients is investigated in the present paper. The existence, uniqueness, and regularity of global weak solution to the initial value problem for general initial data are established in spatial periodic domain. Moreover, the long time behavior of the weak solution is analy… ▽ More The one-dimensional compressible Navier-Stokes-Vlasov-Fokker-Planck system with density-dependent viscosity and drag force coefficients is investigated in the present paper. The existence, uniqueness, and regularity of global weak solution to the initial value problem for general initial data are established in spatial periodic domain. Moreover, the long time behavior of the weak solution is analyzed. It is shown that as the time grows, the distribution function of the particles converges to the global Maxwellian, and both the fluid velocity and the macroscopic velocity of the particles converge to the same speed. △ Less

Submitted 6 April, 2023; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: 42 pages

MSC Class: 35Q30; 35Q84; 82C40

Journal ref: Communications in Mathematical Research 39 (1), (2023), 136-172

arXiv:2102.11114 [pdf, other]

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR s… ▽ More Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

arXiv:2102.06578 [pdf, other]

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

Abstract: Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal languag… ▽ More Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing parameters and applying cross-attentions, we explore maximizing the representation universality and realizing the best alignment of language-agnostic information. We also introduce a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Experiments on two public multilingual parallel datasets show that our proposed model achieves a competitive or better results than universal NMT and strong pivot baseline. Moreover, we experiment incrementally adding new language to the trained model by only updating the new model parameters. With this little effort, the zero-shot translation between this newly added language and existing languages achieves a comparable result with the model trained jointly from scratch on all languages. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2102.04664 [pdf, other]

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Authors: Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

Abstract: Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems,… ▽ More Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems. △ Less

Submitted 16 March, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: 14 pages; Revise CodeBLEU scores for all models on text-to-code task

arXiv:2012.14116 [pdf, other]

Syntax-Enhanced Pre-trained Model

Authors: Zenan Xu, Daya Guo, Duyu Tang, Qinliang Su, Linjun Shou, Ming Gong, Wanjun Zhong, Xiaojun Quan, Nan Duan, Daxin Jiang

Abstract: We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the appli… ▽ More We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing, and question answering. Results show that our model achieves state-of-the-art performance on six public benchmark datasets. We have two major findings. First, we demonstrate that infusing automatically produced syntax of text improves pre-trained models. Second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens. △ Less

Submitted 29 May, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: Accepted by ACL-IJCNLP 2021: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

arXiv:2012.06048 [pdf, other]

Reinforced Multi-Teacher Selection for Knowledge Distillation

Authors: Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang

Abstract: In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation transfers knowledge from one or multiple large (teacher) models to a small (student) model. When multiple teacher models are available in distillation, the state-o… ▽ More In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation transfers knowledge from one or multiple large (teacher) models to a small (student) model. When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation. Furthermore, most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled. We systematically develop a reinforced method to dynamically assign weights to teacher models for different training instances and optimize the performance of student model. Our extensive experimental results on several NLP tasks clearly verify the feasibility and effectiveness of our approach. △ Less

Submitted 13 December, 2020; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: AAAI 2021

arXiv:2012.05048 [pdf, ps, other]

Global well-posedness of one-dimensional compressible Navier-Stokes-Vlasov system

Authors: Hai-Liang Li, Ling-Yun Shou

Abstract: A fluid-particle model is investigated in the present paper, which consists of the compressible Navier-Stokes equations coupled with the Vlasov equation though a nonlinear drag force. We consider the initial value problem for the one-dimensional compressible Navier-Stokes-Vlasov system and establish the global existence and uniqueness of the weak solution for general initial data in either spatial… ▽ More A fluid-particle model is investigated in the present paper, which consists of the compressible Navier-Stokes equations coupled with the Vlasov equation though a nonlinear drag force. We consider the initial value problem for the one-dimensional compressible Navier-Stokes-Vlasov system and establish the global existence and uniqueness of the weak solution for general initial data in either spatial periodic domain or spatial real line, which is shown to be a classical solution for regular initial data. △ Less

Submitted 15 September, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: 51 pages

arXiv:2011.11928 [pdf, other]

GLGE: A New General Language Generation Evaluation Benchmark

Authors: Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan

Abstract: Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task b… ▽ More Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset are publicly available at https://github.com/microsoft/glge). △ Less

Submitted 1 June, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: Findings of Association for Computational Linguistics. ACL 2021

arXiv:2011.11160 [pdf, other]

LINDT: Tackling Negative Federated Learning with Local Adaptation

Authors: Hong Lin, Lidan Shou, Ke Chen, Gang Chen, Sai Wu

Abstract: Federated Learning (FL) is a promising distributed learning paradigm, which allows a number of data owners (also called clients) to collaboratively learn a shared model without disclosing each client's data. However, FL may fail to proceed properly, amid a state that we call negative federated learning (NFL). This paper addresses the problem of negative federated learning. We formulate a rigorous… ▽ More Federated Learning (FL) is a promising distributed learning paradigm, which allows a number of data owners (also called clients) to collaboratively learn a shared model without disclosing each client's data. However, FL may fail to proceed properly, amid a state that we call negative federated learning (NFL). This paper addresses the problem of negative federated learning. We formulate a rigorous definition of NFL and analyze its essential cause. We propose a novel framework called LINDT for tackling NFL in run-time. The framework can potentially work with any neural-network-based FL systems for NFL detection and recovery. Specifically, we introduce a metric for detecting NFL from the server. On occasion of NFL recovery, the framework makes adaptation to the federated model on each client's local data by learning a Layer-wise Intertwined Dual-model. Experiment results show that the proposed approach can significantly improve the performance of FL on local data in various scenarios of NFL. △ Less

Submitted 22 November, 2020; originally announced November 2020.

arXiv:2011.05723 [pdf, other]

CalibreNet: Calibration Networks for Multilingual Sequence Labeling

Authors: Shining Liang, Linjun Shou, Jian Pei, Ming Gong, Wanli Zuo, Daxin Jiang

Abstract: Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be a… ▽ More Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be adopted as a base model to generate an initial answer. In the second step, CalibreNet refines the boundary of the initial answer. To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet. Experiments on two cross-lingual benchmark datasets show that the proposed approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Long paper in WSDM 2021

arXiv:2010.14271 [pdf, other]

Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation

Authors: Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang, Daxin Jiang

Abstract: Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging problem due to the lack of large-scale annotated datasets in low-source languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use translation data by translating from a rich-source language, such as English, to low-source languages as auxiliary supervision. However, how to effectively leverage translatio… ▽ More Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging problem due to the lack of large-scale annotated datasets in low-source languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use translation data by translating from a rich-source language, such as English, to low-source languages as auxiliary supervision. However, how to effectively leverage translation data and reduce the impact of noise introduced by translation remains onerous. In this paper, we tackle this challenge and enhance the cross-lingual transferring performance by a novel augmentation approach named Language Branch Machine Reading Comprehension (LBMRC). A language branch is a group of passages in one single language paired with questions in all target languages. We train multiple machine reading comprehension (MRC) models proficient in individual language based on LBMRC. Then, we devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages. Combining the LBMRC and multilingual distillation can be more robust to the data noises, therefore, improving the model's cross-lingual ability. Meanwhile, the produced single multilingual model is applicable to all target languages, which saves the cost of training, inference, and maintenance for multiple models. Extensive experiments on two CLMRC benchmarks clearly show the effectiveness of our proposed method. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: Accepted as long paper in COLING 2020

arXiv:2010.07606

Learning Better Representation for Tables by Self-Supervised Tasks

Authors: Liang Li, Can Ma, Yinliang Yue, Linjun Shou, Dayong Hu

Abstract: Table-to-text generation aims at automatically generating natural text to help people to conveniently obtain the important information in tables. Although neural models for table-to-text have achieved remarkable progress, some problems still overlooked. The first is that the values recorded in many tables are mostly numbers in practice. The existing approaches do not do special treatment for these… ▽ More Table-to-text generation aims at automatically generating natural text to help people to conveniently obtain the important information in tables. Although neural models for table-to-text have achieved remarkable progress, some problems still overlooked. The first is that the values recorded in many tables are mostly numbers in practice. The existing approaches do not do special treatment for these, and still regard these as words in natural language text. Secondly, the target texts in training dataset may contain redundant information or facts do not exist in the input tables. These may give wrong supervision signals to some methods based on content selection and planning and auxiliary supervision. To solve these problems, we propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation. The former works on the column dimension to help to incorporate the size property of numbers into table representation. The latter acts on row dimension and help to learn a significance-aware table representation. We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news. The experimental results demonstrate that the model trained together with these two self-supervised tasks can generate text that contains more salient and well-organized facts, even without modeling context selection and planning. And we achieve the state-of-the-art performance on automatic metrics. △ Less

Submitted 30 March, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: This article is writing messy, and some of the experiments are inadequate, which may mislead the reader about our work

arXiv:2010.06801 [pdf, other]

A Graph Representation of Semi-structured Data for Web Question Answering

Authors: Xingyao Zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen, Daxin Jiang

Abstract: The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as fl… ▽ More The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: Accepted as long paper in COLING 2020

arXiv:2010.03910 [pdf, other]

An Experimental Analysis of Indoor Spatial Queries: Modeling, Indexing, and Processing

Authors: Tiantian Liu, Huan Li, Hua Lu, Muhammad Aamir Cheema, Lidan Shou

Abstract: Indoor location-based services (LBS), such as POI search and routing, are often built on top of typical indoor spatial queries. To support such queries and indoor LBS, multiple techniques including model/indexes and search algorithms have been proposed. In this work, we conduct an extensive experimental study on existing proposals for indoor spatial queries. We survey five model/indexes, compare t… ▽ More Indoor location-based services (LBS), such as POI search and routing, are often built on top of typical indoor spatial queries. To support such queries and indoor LBS, multiple techniques including model/indexes and search algorithms have been proposed. In this work, we conduct an extensive experimental study on existing proposals for indoor spatial queries. We survey five model/indexes, compare their algorithmic characteristics, and analyze their space and time complexities. We also design an in-depth benchmark with real and synthetic datasets, evaluation tasks and performance metrics. Enabled by the benchmark, we obtain and report the performance results of all model/indexes under investigation. By analyzing the results, we summarize the pros and cons of all techniques and suggest the best choice for typical scenarios. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: An Experiment and Analysis Paper

arXiv:2009.14348 [pdf, other]

MaP: A Matrix-based Prediction Approach to Improve Span Extraction in Machine Reading Comprehension

Authors: Huaishao Luo, Yu Shi, Ming Gong, Linjun Shou, Tianrui Li

Abstract: Span extraction is an essential problem in machine reading comprehension. Most of the existing algorithms predict the start and end positions of an answer span in the given corresponding context by generating two probability vectors. In this paper, we propose a novel approach that extends the probability vector to a probability matrix. Such a matrix can cover more start-end position pairs. Precise… ▽ More Span extraction is an essential problem in machine reading comprehension. Most of the existing algorithms predict the start and end positions of an answer span in the given corresponding context by generating two probability vectors. In this paper, we propose a novel approach that extends the probability vector to a probability matrix. Such a matrix can cover more start-end position pairs. Precisely, to each possible start index, the method always generates an end probability vector. Besides, we propose a sampling-based training strategy to address the computational cost and memory issue in the matrix training phase. We evaluate our method on SQuAD 1.1 and three other question answering benchmarks. Leveraging the most competitive models BERT and BiDAF as the backbone, our proposed approach can get consistent improvements in all datasets, demonstrating the effectiveness of the proposed method. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: to appear at AACL-IJCNLP 2020

arXiv:2009.12056 [pdf, other]

No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension

Authors: Xuguang Wang, Linjun Shou, Ming Gong, Nan Duan, Daxin Jiang

Abstract: The Natural Questions (NQ) benchmark set brings new challenges to Machine Reading Comprehension: the answers are not only at different levels of granularity (long and short), but also of richer types (including no-answer, yes/no, single-span and multi-span). In this paper, we target at this challenge and handle all answer types systematically. In particular, we propose a novel approach called Refl… ▽ More The Natural Questions (NQ) benchmark set brings new challenges to Machine Reading Comprehension: the answers are not only at different levels of granularity (long and short), but also of richer types (including no-answer, yes/no, single-span and multi-span). In this paper, we target at this challenge and handle all answer types systematically. In particular, we propose a novel approach called Reflection Net which leverages a two-step training procedure to identify the no-answer and wrong-answer cases. Extensive experiments are conducted to verify the effectiveness of our approach. At the time of paper writing (May.~20,~2020), our approach achieved the top 1 on both long and short answer leaderboard, with F1 scores of 77.2 and 64.1, respectively. △ Less

Submitted 29 September, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: Accepted by Findings of EMNLP 2020

arXiv:2009.07406 [pdf, other]

Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding

Authors: Martin Kuo, Yaobo Liang, Lei Ji, Nan Duan, Linjun Shou, Ming Gong, Peng Chen

Abstract: Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer… ▽ More Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer. There are two approaches to solve this problem. One is an extractive method which extracts candidate answers from the passage with the Open IE model, and ranks them by matching with questions. It fully uses the passage information at the extraction step, but the extraction is independent to the question. The other one is the generative method which uses a sequence to sequence model to generate answers directly. It combines the question and passage as input at the same time, but it generates the answer from scratch, which does not use the facts that most of the answer words come from in the passage. To guide the generation by passage, we present a two-stage decoding model which contains a tagging decoder and a correction decoder. At the first stage, the tagging decoder will tag keywords from the passage. At the second stage, the correction decoder will generate answers based on tagged keywords. Our model could be trained end-to-end although it has two stages. Compared to previous generative models, we generate better answers by generating coarse to fine. We evaluate our model on WebAssertions (Yan et al., 2018) which is a Question aware Open IE dataset. Our model achieves a BLEU score of 59.32, which is better than previous generative methods. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: 11 pages, 1 figure, 4 tables

MSC Class: 68T50; 68T01

arXiv:2006.07581 [pdf, other]

doi 10.1145/3394486.3403343

Mining Implicit Relevance Feedback from User Behavior for Web Question Answering

Authors: Linjun Shou, Shining Bo, Feixiang Cheng, Ming Gong, Jian Pei, Daxin Jiang

Abstract: Training and refreshing a web-scale Question Answering (QA) system for a multi-lingual commercial search engine often requires a huge amount of training examples. One principled idea is to mine implicit relevance feedback from user behavior recorded in search engine logs. All previous works on mining implicit relevance feedback target at relevance of web documents rather than passages. Due to seve… ▽ More Training and refreshing a web-scale Question Answering (QA) system for a multi-lingual commercial search engine often requires a huge amount of training examples. One principled idea is to mine implicit relevance feedback from user behavior recorded in search engine logs. All previous works on mining implicit relevance feedback target at relevance of web documents rather than passages. Due to several unique characteristics of QA tasks, the existing user behavior models for web documents cannot be applied to infer passage relevance. In this paper, we make the first study to explore the correlation between user behavior and passage relevance, and propose a novel approach for mining training data for Web QA. We conduct extensive experiments on four test datasets and the results show our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine, especially for languages with low resources. Our techniques have been deployed in multi-language services. △ Less

Submitted 15 June, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: Accepted by KDD 2020

arXiv:2004.14069 [pdf, other]

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Authors: Fei Yuan, Linjun Shou, Xuanyu Bai, Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Abstract: Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we p… ▽ More Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach. △ Less

Submitted 8 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: Accepted to ACL 2020

arXiv:2004.13659 [pdf, other]

LogicalFactChecker: Leveraging Logical Operations for Fact Checking with Graph Module Network

Authors: Wanjun Zhong, Duyu Tang, Zhangyin Feng, Nan Duan, Ming Zhou, Ming Gong, Linjun Shou, Daxin Jiang, Jiahai Wang, Jian Yin

Abstract: Verifying the correctness of a textual statement requires not only semantic reasoning about the meaning of words, but also symbolic reasoning about logical operations like count, superlative, aggregation, etc. In this work, we propose LogicalFactChecker, a neural network approach capable of leveraging logical operations for fact checking. It achieves the state-of-the-art performance on TABFACT, a… ▽ More Verifying the correctness of a textual statement requires not only semantic reasoning about the meaning of words, but also symbolic reasoning about logical operations like count, superlative, aggregation, etc. In this work, we propose LogicalFactChecker, a neural network approach capable of leveraging logical operations for fact checking. It achieves the state-of-the-art performance on TABFACT, a large-scale, benchmark dataset built for verifying a textual statement with semi-structured tables. This is achieved by a graph module network built upon the Transformer-based architecture. With a textual statement and a table as the input, LogicalFactChecker automatically derives a program (a.k.a. logical form) of the statement in a semantic parsing manner. A heterogeneous graph is then constructed to capture not only the structures of the table and the program, but also the connections between inputs with different modalities. Such a graph reveals the related contexts of each word in the statement, the table and the program. The graph is used to obtain graph-enhanced contextual representations of words in Transformer-based architecture. After that, a program-driven module network is further introduced to exploit the hierarchical structure of the program, where semantic compositionality is dynamically modeled along the program structure with a set of function-specific modules. Ablation experiments suggest that both the heterogeneous graph and the module network are important to obtain strong results. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 13 pages; 7 figures; Accepted by ACL2020 as a long paper

arXiv:2004.05568 [pdf, other]

Pre-training Text Representations as Meta Learning

Authors: Shangwen Lv, Yuechen Wang, Daya Guo, Duyu Tang, Nan Duan, Fuqing Zhu, Ming Gong, Linjun Shou, Ryan Ma, Daxin Jiang, Guihong Cao, Ming Zhou, Songlin Hu

Abstract: Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, w… ▽ More Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps. The standard multi-task learning objective adopted in BERT is a special case of our learning algorithm where the depth of meta-train is zero. We study the problem in two settings: unsupervised pre-training and supervised pre-training with different pre-training objects to verify the generality of our approach.Experimental results show that our algorithm brings improvements and learns better initializations for a variety of downstream tasks. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 2 figures, 3 tables

arXiv:2004.04438 [pdf, other]

Improving Readability for Automatic Speech Recognition Transcription

Authors: Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and… ▽ More Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC) followed by text-to-speech (TTS) and ASR. Furthermore, we propose metrics borrowed from similar tasks to evaluate performance on the APR task. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method. Our results suggest that finetuned models improve the performance on the APR task significantly, hinting at the potential benefits of using APR systems. We hope that the read, understand, and rewrite approach of our work can serve as a basis that many NLP tasks and human readers can benefit from. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2004.03070 [pdf, other]

Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning

Authors: Daya Guo, Akari Asai, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Jian Yin, Ming Zhou

Abstract: We study the problem of generating inferential texts of events for a variety of commonsense like \textit{if-else} relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowled… ▽ More We study the problem of generating inferential texts of events for a variety of commonsense like \textit{if-else} relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowledge (e.g., \textit{isA} and \textit{relatedTo} relations), having a limited number of inferential knowledge. We use not only structured commonsense knowledge bases, but also natural language snippets from search-engine results. These sources are incorporated into a generative base model via key-value memory network. In addition, we introduce a meta-learning based multi-task learning algorithm. For each targeted commonsense relation, we regard the learning of examples from other relations as the meta-training process, and the evaluation on examples from the targeted relation as the meta-test process. We conduct experiments on Event2Mind and ATOMIC datasets. Results show that both the integration of multiple knowledge sources and the use of the meta-learning algorithm improve the performance. △ Less

Submitted 15 April, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

arXiv:2004.01401 [pdf, ps, other]

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Authors: Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

Abstract: In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it pr… ▽ More In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (2) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder(Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison. △ Less

Submitted 22 May, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

arXiv:2002.08155 [pdf, other]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Authors: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou

Abstract: We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates t… ▽ More We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing. △ Less

Submitted 18 September, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Accepted to Findings of EMNLP 2020. 12 pages

arXiv:1910.14478 [pdf, other]

doi 10.1103/PhysRevResearch.5.013065

Optimization of CNOT circuits on limited connectivity architecture

Authors: Bujiao Wu, Xiaoyu He, Shuai Yang, Lifu Shou, Guojing Tian, Jialin Zhang, Xiaoming Sun

Abstract: A CNOT circuit is the key gadget for entangling qubits in quantum computing systems. However, the qubit connectivity of noisy intermediate-scale quantum (NISQ) devices is constrained by their {limited connectivity architecture}. To improve the performance of CNOT circuits on NISQ devices, we investigate the optimization of the size/depth of CNOT circuits under the limited connectivity architecture… ▽ More A CNOT circuit is the key gadget for entangling qubits in quantum computing systems. However, the qubit connectivity of noisy intermediate-scale quantum (NISQ) devices is constrained by their {limited connectivity architecture}. To improve the performance of CNOT circuits on NISQ devices, we investigate the optimization of the size/depth of CNOT circuits under the limited connectivity architecture. We present a method that can optimize the size of any $n$-qubit CNOT circuit $O\left(\frac{n^2}{\log δ}\right)$ on any connected graph with minimum degree $δ$, and prove this bound is optimal for the regular graph. For the near-term sparsely connected structure, we additionally present a method that can optimize the size of any $n$-qubit CNOT circuit to below $2n^2$. The numerical experiment shows that our method performs better than state-of-the-art results. Specifically, we present an example to illustrate the applicability of our algorithm. For the grid structure, which is commonly used in current quantum devices, we demonstrate that the depth of any $n$-qubit CNOT circuit can be optimized to be linear in $n$ with certain ancillary qubits (ancillas). Experimental results indicate that this method has significant improvements compared with all of the existing methods. We additionally test our algorithms on the five-qubit IBMQ devices, and the experiments show that the measurement results of the optimized circuit with our algorithm are more robust to noise compared with the IBM mapping method. △ Less

Submitted 2 February, 2023; v1 submitted 31 October, 2019; originally announced October 2019.

Comments: 24 pages, 13 figures

Journal ref: Physical Review Research, 5 (2023) 013065

arXiv:1910.08381 [pdf, other]

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Authors: Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang

Abstract: Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous model compression methods usually suffer from… ▽ More Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous model compression methods usually suffer from information loss during the model compression procedure, leading to inferior models compared with the original one. To tackle this challenge, we propose a Two-stage Multi-teacher Knowledge Distillation (TMKD for short) method for web Question Answering system. We first develop a general Q\&A distillation task for student model pre-training, and further fine-tune this pre-trained student model with multi-teacher knowledge distillation on downstream tasks (like Web Q\&A task, MNLI, SNLI, RTE tasks from GLUE), which effectively reduces the overfitting bias in individual teacher models, and transfers more general knowledge to the student model. The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with substantial speedup of model inference. △ Less

Submitted 18 October, 2019; originally announced October 2019.

Comments: Accepted by WSDM 2020

arXiv:1909.05311 [pdf, other]

Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering

Authors: Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu

Abstract: Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structu… ▽ More Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA leaderboard. △ Less

Submitted 8 June, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: 8 pages, 7 figure, AAAI 2020

arXiv:1909.00964 [pdf, other]

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Authors: Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Ming Zhou

Abstract: We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT and XLM, three new cross-lingual pre-training tasks are proposed, including cross… ▽ More We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT and XLM, three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI, 1.8% averaged accuracy improvement (on 15 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5% averaged accuracy improvement (on French and German) is obtained. △ Less

Submitted 4 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted to EMNLP2019; 10 pages, 2 figures

arXiv:1904.09636 [pdf, other]

Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System

Authors: Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang

Abstract: Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous works often leverage model compression approaches t… ▽ More Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous works often leverage model compression approaches to resolve this problem. However, these methods usually induce information loss during the model compression procedure, leading to incomparable results between compressed model and the original model. To tackle this challenge, we propose a Multi-task Knowledge Distillation Model (MKDM for short) for web-scale Question Answering system, by distilling knowledge from multiple teacher models to a light-weight student model. In this way, more generalized knowledge can be transferred. The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with significant speedup of model inference. △ Less

Submitted 21 April, 2019; originally announced April 2019.

Comments: 9 pages, 2 figures

arXiv:1904.09535 [pdf, other]

NeuronBlocks: Building Your NLP DNN Models Like Playing Lego

Authors: Ming Gong, Linjun Shou, Wutao Lin, Zhijie Sang, Quanjia Yan, Ze Yang, Feixiang Cheng, Daxin Jiang

Abstract: Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. However, many engineers find it a big overhead when they have to choose from multiple frameworks, compare different types of models, and understand various optimization mechanisms. An NLP toolkit for DNN models with both generality and flexibility can greatly improve the pro… ▽ More Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. However, many engineers find it a big overhead when they have to choose from multiple frameworks, compare different types of models, and understand various optimization mechanisms. An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks. In this paper, we introduce NeuronBlocks\footnote{Code: \url{https://github.com/Microsoft/NeuronBlocks}} \footnote{Demo: \url{https://youtu.be/x6cOpVSZcdo}}, a toolkit encapsulating a suite of neural network modules as building blocks to construct various DNN models with complex architecture. This toolkit empowers engineers to build, train, and test various NLP models through simple configuration of JSON files. The experiments on several NLP datasets such as GLUE, WikiQA and CoNLL-2003 demonstrate the effectiveness of NeuronBlocks. △ Less

Submitted 18 October, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

Comments: 6 pages, 3 figures

Journal ref: EMNLP 2019

arXiv:1904.03898 [pdf, other]

Semi-Supervised Few-Shot Learning for Dual Question-Answer Extraction

Authors: Jue Wang, Ke Chen, Lidan Shou, Sai Wu, Sharad Mehrotra

Abstract: This paper addresses the problem of key phrase extraction from sentences. Existing state-of-the-art supervised methods require large amounts of annotated data to achieve good performance and generalization. Collecting labeled data is, however, often expensive. In this paper, we redefine the problem as question-answer extraction, and present SAMIE: Self-Asking Model for Information Ixtraction, a se… ▽ More This paper addresses the problem of key phrase extraction from sentences. Existing state-of-the-art supervised methods require large amounts of annotated data to achieve good performance and generalization. Collecting labeled data is, however, often expensive. In this paper, we redefine the problem as question-answer extraction, and present SAMIE: Self-Asking Model for Information Ixtraction, a semi-supervised model which dually learns to ask and to answer questions by itself. Briefly, given a sentence $s$ and an answer $a$, the model needs to choose the most appropriate question $\hat q$; meanwhile, for the given sentence $s$ and same question $\hat q$ selected in the previous step, the model will predict an answer $\hat a$. The model can support few-shot learning with very limited supervision. It can also be used to perform clustering analysis when no supervision is provided. Experimental results show that the proposed method outperforms typical supervised methods especially when given little labeled data. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: 7 pages, 5 figures, submission to IJCAI19

arXiv:1807.03596 [pdf, other]

Constraints on the generalized natural inflation after Planck 2018

Authors: Nan Zhang, Ya-Bo Wu, Jun-Wang Lu, Chu-Wen Sun, Li-Jie Shou, Hai-Zhou Xu

Abstract: Based on the dynamics of single scalar field slow-roll inflation and the theory of reheating, we investigate the generalized natural inflationary (GNI) model. Concretely, we give constraints on the scalar spectral index $n_{s}$ and tensor-to scalar ratio $r$ for $Λ$CDM $+r$ model according to the latest data from Plack 2018 TT,TE,EE+lowE+lensing (P18) and BICEP2/Keck 2015 season (BK15), i.e.,… ▽ More Based on the dynamics of single scalar field slow-roll inflation and the theory of reheating, we investigate the generalized natural inflationary (GNI) model. Concretely, we give constraints on the scalar spectral index $n_{s}$ and tensor-to scalar ratio $r$ for $Λ$CDM $+r$ model according to the latest data from Plack 2018 TT,TE,EE+lowE+lensing (P18) and BICEP2/Keck 2015 season (BK15), i.e., $n_{s}=0.9659\pm0.0044$ at $68\%$ confidence level (CL) and $r<0.0623$ at $95\%$CL. We find that the GNI model is favored by P18 plus BK15 in the ranges of $\log_{10}(f/M_{p})=0.62^{+0.17}_{-0.18}$ and $m=0.35^{+0.13}_{-0.23}$ at $68\%$CL. In addition, the corresponding predictions of the general and two-phase reheating are respectively discussed. It follows that the parameter $m$ has the significant effect on the model behaviors. △ Less

Submitted 1 July, 2020; v1 submitted 10 July, 2018; originally announced July 2018.

Comments: 15 pages, 15 figures, accepted for publication in CPC

arXiv:1510.08408 [pdf, ps, other]

Trace formulas for Schrödinger operators on star graphs

Authors: Semra Demirel-Frank, Laura Shou

Abstract: We derive trace formulas of the Buslaev-Faddeev type for quantum star graphs. One of the new ingredients is high energy asymptotics of the perturbation determinant. We derive trace formulas of the Buslaev-Faddeev type for quantum star graphs. One of the new ingredients is high energy asymptotics of the perturbation determinant. △ Less

Submitted 28 October, 2015; originally announced October 2015.

arXiv:1509.05279 [pdf, other]

Subcritical behavior for quasi-periodic Schrödinger cocycles with trigonometric potentials

Authors: C. A. Marx, L. H. Shou, J. L. Wellens

Abstract: We give a criterion implying subcritical behavior for quasi-periodic Schrödinger operators where the potential sampling function is given by a trigonometric polynomial. Subcritical behavior, in the sense of Avila's global theory, is known to imply purely absolutely continuous spectrum for all irrational frequencies and all phases. We give a criterion implying subcritical behavior for quasi-periodic Schrödinger operators where the potential sampling function is given by a trigonometric polynomial. Subcritical behavior, in the sense of Avila's global theory, is known to imply purely absolutely continuous spectrum for all irrational frequencies and all phases. △ Less

Submitted 31 October, 2015; v1 submitted 17 September, 2015; originally announced September 2015.

Comments: to appear in the Journal of Spectral Theory

Showing 51–89 of 89 results for author: Shou, L