Publications & Preprints

Mitigating Metric Bias in Minimum Bayes Risk Decoding
Geza Kovacs, Daniel Deutsch, and Markus Freitag
WMT 2024
pdf

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apu Shah, and Markus Freitag
WMT 2024
pdf

MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Juraj Juraska, Daniel Deutsch, Mara Finkelstein, and Markus Freitag
WMT 2024
pdf

Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Brian Thompson, Nitika Mathur, Daniel Deutsch, and Huda Khayrallah
WMT 2024
pdf

LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback
Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, and Markus Freitag
NAACL 2024
pdf

Finding Replicable Human Evaluations via Stable Ranking Probability
Parker Riley, Daniel Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, and Markus Freitag
NAACL 2024
pdf

There's no Data Like Better Data: Using QE Metrics for MT Data Filtering
Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, and Markus Freitag
WMT 2023
pdf

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
Daniel Deutsch, George Foster, and Markus Freitag
EMNLP 2023
Outstanding Paper in Machine Translation Award
pdf

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Daniel Deutsch, Juraj Juraska, Mara Finkelstein, and Markus Freitag
WMT 2023
pdf

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, and Orhan Firat
WMT 2023
pdf

Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
Daniel Deutsch and Dan Roth
EACL 2023
pdf

On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch and Rotem Dror and Dan Roth
EMNLP 2022
pdf

Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code
Daniel Deutsch and Dan Roth
arXiv
pdf code

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Daniel Deutsch, Rotem Dror, and Dan Roth
NAACL 2022
pdf code

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
Daniel Deutsch and Dan Roth
ACL Findings 2022
pdf

Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
Daniel Deutsch and Dan Roth
CoNLL 2021
pdf code

A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods
Daniel Deutsch, Rotem Dror, and Dan Roth
TACL 2021
pdf code

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
Daniel Deutsch, Tania Bedrax-Weiss, and Dan Roth
TACL 2021
pdf code

Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
Disha Jindal, Daniel Deutsch, and Dan Roth
COLING 2020
pdf

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch and Dan Roth
NLP-OSS 2020
pdf code

A General-Purpose Algorithm for Constrained Sequential Inference
Daniel Deutsch,* Shyam Upadhyay,* and Dan Roth (* = equal contribution)
CoNLL 2019
pdf code slides bib

Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization
Daniel Deutsch and Dan Roth
EMNLP 2019
pdf code data slides talk bib

A Distributional and Orthographic Aggregation Model for English Derivational Morphology
Daniel Deutsch,* John Hewitt,* and Dan Roth (* = equal contribution)
ACL 2018
pdf code data slides talk bib