Speaker: Caroline Craig, Kartik Goyal, Gregory Crane, Farnoosh Shamsian and David A. Smith
Affiliation: 1, Khoury College of Computer Sciences, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States; 2, School of Interactive Computing, Georgia Institute of Technology, North Ave NW, Atlanta, GA 30332, United States; 3, School of Arts and Sciences , &, School of Engineering, Tufts University, 419 Boston Ave, Medford, MA 02155, United States; 4, Leipzig University, Augustuspl. 10, 04109 Leipzig, Germany
Title: Testing the Limits of Neural Sentence Alignment Models on Classical Greek and Latin Texts and Translations
Abstract: The Greek and Latin classics, like many other ancient texts, have been widely translated into a variety of languages over the past two millennia. Although many digital editions and libraries contain one or two translations for a given text, about one hundred translations of the Iliad and twenty of Herodotus, for example, exist in English alone. Aligning the corpus of classical texts and translations at the sentence and word level would provide a valuable resource for studying translation theory, digital humanities, and natural language processing (NLP). Precise and faithful sentence alignment via computational methods, however, remains a challenging problem. Current alignment methods tend to have poor coverage and recall since their primary aim is to extract single sentence pairs for training machine translation systems. This paper evaluates and examines the limits of such state-of-the-art models for cross-language sentence embedding and alignment of ancient Greek and Latin texts with translations into English, French, German, and Persian. We release evaluation data for Platoβs Crito , manually annotated at the word and sentence level, and larger test datasets based on coarser structural metadata for Thucydides (Greek) and Lucretius (Latin). Testing LASER and LaBSE for sentence embedding and nearest-neighbor retrieval and Vecalign for sentence alignment, we found best results using LaBSE-Vecalign. LaBSE worked surprisingly well on ancient Greek, most probably because it had been merged with modern Greek data in its training. Both LASER-Vecalign and LaBSE-Vecalign did best when there were many ground-truth one-to-one alignments between source and target sentences, and when the order of sentences in the source was preserved in the translation. However, these conditions are often not present in the kinds of literary and free translation we wish to study, nor in editions with multiple translations, extensive commentary, or other paratext. We perform book-level and chapter-level error analysis to inform the development of a software pipeline that can be deployed on the vast corpus of translations of ancient texts.