Enhancing HTR of Historical Texts through Scholarly Editions: A Case Study from an Ancient Collation of the Hebrew Bible

:speech_balloon: Speaker: Luigi Bambaci and Daniel Stökl Ben Ezra

:classical_building: Affiliation: 1, Archéologie , &, Philologie
d’Orient et d’Occident UMR 8546, École Pratique des Hautes Études, Université Paris Sciences , &, Lettres (EPHE, PSL), Les Patios Saint-Jacques, 4-14 Rue Ferrus, 75014 Paris, France; 2, Archéologie , &, Philologie
d’Orient et d’Occident UMR 8546, EPHE, PSL

Title: Enhancing HTR of Historical Texts through Scholarly Editions: A Case Study from an Ancient Collation of the Hebrew Bible

Abstract: Printed critical editions of literary texts are a largely neglected source of knowledge in computational humanities. However, under certain conditions, they hold significant potential for multifaceted exploration: First, through Optical Character Recognition (OCR) of the text and its apparatus, coupled with intelligent parsing of the variant readings, it becomes possible to reconstruct comprehensive manuscript collations, which can prove invaluable for a variety of investigations, including phylogenetic analyses, redaction history studies, linguistic inquiries, and more. Second, by aligning the printed edition with manuscript images, a substantial amount of Handwritten Text Recognition (HTR) ground truth can be generated. This serves as valuable material for paleography, layout analysis, as well as for assessing the quality of the collation criteria adopted by the editor. The present paper focuses on the challenges mastered in the processes of the OCR, the apparatus parsing, the text reconstruction, and the alignment with the manuscript images, taking as a case study the edition of the Hebrew Bible published by Kennicott in the late eighteenth century. %After a brief introduction (§ \ref{introduction}) and a description of this edition (§ \ref{kennicott}), we will provide an overview of the adopted method (§ \ref{pipeline}), from image acquisition (§ \ref{image_acquisition}) to the final textual reconstruction (§ \ref{text_reconstruction}). Finally, we will conclude with an assessment of the work carried out and an outlook on potential future developments (§ \ref{conclusion}).

:newspaper: Link to paper

:file_folder: