Publications & Preprints
Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
WMT 2024
pdf
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
WMT 2024
pdf
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback
NAACL 2024
pdf
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
EMNLP 2023
Outstanding Paper in Machine Translation Award
pdf
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
WMT 2023
pdf
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
WMT 2023
pdf
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
EACL 2023
pdf
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code
arXiv
pdf
code
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
NAACL 2022
pdf
code
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL Findings 2022
pdf
Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
CoNLL 2021
pdf
code
A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods
TACL 2021
pdf
code
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
TACL 2021
pdf
code
Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
COLING 2020
pdf
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
NLP-OSS 2020
pdf
code