Scalable Wide and Deep Learning for Computer Assisted Coding
Marilisa Amoia | Frank Diehl | Jesus Gimenez | Joel Pinto | Raphael Schumann | Fabian Stemmer | Paul Vozila | Yi Zhang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

In recent years the use of electronic medical records has accelerated resulting in large volumes of medical data when a patient visits a healthcare facility. As a first step towards reimbursement healthcare institutions need to associate ICD-10 billing codes to these documents. This is done by trained clinical coders who may use a computer assisted solution for shortlisting of codes. In this work, we present our work to build a machine learning based scalable system for predicting ICD-10 codes from electronic medical records. We address data imbalance issues by implementing two system architectures using convolutional neural networks and logistic regression models. We illustrate the pros and cons of those system designs and show that the best performance can be achieved by leveraging the advantages of both using a system combination approach.


A Graphical Interface for MT Evaluation and Error Analysis
Meritxell Gonzàlez | Jesús Giménez | Lluís Màrquez
Proceedings of the ACL 2012 System Demonstrations

UNED: Improving Text Similarity Measures without Human Assessments
Enrique Amigó | Jesús Giménez | Julio Gonzalo | Felisa Verdejo
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)


Corroborating Text Evaluation Results with Heterogeneous Measures
Enrique Amigó | Julio Gonzalo | Jesús Giménez | Felisa Verdejo
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing


Combining Confidence Estimation and Reference-based Metrics for Segment-level MT Evaluation
Lucia Specia | Jesús Giménez
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We describe an effort to improve standard reference-based metrics for Machine Translation (MT) evaluation by enriching them with Confidence Estimation (CE) features and using a learning mechanism trained on human annotations. Reference-based MT evaluation metrics compare the system output against reference translations looking for overlaps at different levels (lexical, syntactic, and semantic). These metrics aim at comparing MT systems or analyzing the progress of a given system and are known to have reasonably good correlation with human judgments at the corpus level, but not at the segment level. CE metrics, on the other hand, target the system in use, providing a quality score to the end-user for each translated segment. They cannot rely on reference translations, and use instead information extracted from the input text, system output and possibly external corpora to train machine learning algorithms. These metrics correlate better with human judgments at the segment level. However, they are usually highly biased by difficulty level of the input segment, and therefore are less appropriate for comparing multiple systems translating the same input segments. We show that these two classes of metrics are complementary and can be combined to provide MT evaluation metrics that achieve higher correlation with human judgments at the segment level.

Document-Level Automatic MT Evaluation based on Discourse Representations
Elisabet Comelles | Jesús Giménez | Lluís Màrquez | Irene Castellón | Victoria Arranz
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR


Empirical machine translation and its evaluation
Jesús Giménez
Proceedings of the 13th Annual conference of the European Association for Machine Translation

On the Robustness of Syntactic and Semantic Features for Automatic MT Evaluation
Jesús Giménez | Lluís Màrquez
Proceedings of the Fourth Workshop on Statistical Machine Translation

The Contribution of Linguistic Features to Automatic Machine Translation Evaluation
Enrique Amigó | Jesús Giménez | Julio Gonzalo | Felisa Verdejo
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP


A Smorgasbord of Features for Automatic MT Evaluation
Jesús Giménez | Lluís Màrquez
Proceedings of the Third Workshop on Statistical Machine Translation

Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations
Jesús Giménez | Lluís Màrquez
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

Towards Heterogeneous Automatic MT Error Analysis
Jesús Giménez | Lluís Màrquez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This work studies the viability of performing heterogeneous automatic MT error analyses. Error analysis is, undoubtly, one of the most crucial stages in the development cycle of an MT system. However, often not enough attention is paid to this process. The reason is that performing an accurate error analysis requires intensive human labor. In order to speed up the error analysis process, we suggest partially automatizing it by having automatic evaluation metrics play a more active role. For that purpose, we have compiled a large and heterogeneous set of features at different linguistic levels and at different levels of granularity. Through a practical case study, we show how these features provide an effective means of ellaborating interpretable and detailed automatic reports of translation quality.


Context-aware Discriminative Phrase Selection for Statistical Machine Translation
Jesús Giménez | Lluís Màrquez
Proceedings of the Second Workshop on Statistical Machine Translation

Linguistic Features for Automatic Evaluation of Heterogenous MT Systems
Jesús Giménez | Lluís Màrquez
Proceedings of the Second Workshop on Statistical Machine Translation


Iqmt: A Framework for Automatic Machine Translation Evaluation
Jesús Giménez | Enrique Amigó
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present the IQMT Framework for Machine Translation Evaluation Inside QARLA. IQMT offers a common workbench in which existing evaluation metrics can be utilized and combined. It provides i) a measure to evaluate the quality of any set of similarity metrics (KING), ii) a measure to evaluate the quality of a translation using a set of similarity metrics (QUEEN), and iii) a measure to evaluate the reliability of a test set (JACK). The first release of the IQMT package is freely available for public use. Current version includes a set of 26 metrics from 7 different well-known metric families, and allows the user to supply its own metrics. For future releases, we are working on the design of new metrics that are able to capture linguistic aspects of translation beyond lexical ones.

MT Evaluation: Human-Like vs. Human Acceptable
Enrique Amigó | Jesús Giménez | Julio Gonzalo | Lluís Màrquez
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

Low-Cost Enrichment of Spanish WordNet with Automatically Translated Glosses: Combining General and Specialized Models
Jesús Giménez | Lluís Màrquez
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

The LDV-COMBO system for SMT
Jesús Giménez | Lluís Màrquez
Proceedings on the Workshop on Statistical Machine Translation


Machine Translation Evaluation Inside QARLA
Enrike Amigo | Jesus Gimenez | Chiori Hori
Proceedings of the Second International Workshop on Spoken Language Translation

Semantic Role Labeling as Sequential Tagging
Lluís Màrquez | Pere Comas | Jesús Giménez | Neus Català
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Combining Linguistic Data Views for Phrase-based SMT
Jesús Giménez | Lluís Màrquez
Proceedings of the ACL Workshop on Building and Using Parallel Texts


SVMTool: A general POS Tagger Generator Based on Support Vector Machines
Jesús Giménez | Lluís Màrquez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Bilingual Connections for Trilingual Corpora: An XML Approach
Victoria Arranz | Núria Castell | Josep Maria Crego | Jesús Giménez | Adrià de Gispert | Patrik Lambert
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)