Speaker: Joris van Zundert, Andreas van Cranenburgh and Roel Smeets
Affiliation: 1, Department of Computational Literary Studies, Huygens Institute, The Netherlands; 2, Department of Language Technology, University of Groningen, The Netherlands; 3, Department of Modern Languages and Cultures, Radboud University, The Netherlands
Title: Putting Dutchcoref to the Test: Character Detection and Gender Dynamics in Contemporary Dutch Novels
Abstract: Although coreference resolution is a necessary step for a wide range of automated narratological analyses, most of the systems performing this task leave much to be desired in terms of either accuracy or their practical application in literary studies. While there are coreference resolution systems that demonstrate good performance on annotated fragments of novels, evaluations typically do not consider performance on the full texts of novels. In order to optimize its output for concrete use in Dutch literary studies, we are in the process of evaluating and finetuning Dutchcoref. Dutchcoref is an implementation of the Stanford Multi-Pass Sieve Coreference System for Dutch. Using a “silver standard” of annotated data on 2,137 characters in 170 contemporary Dutch novels, we assess the extent to which Dutchcoref is able to identify the most prominent characters and their gender. Furthermore, we explore the usability of the system by exploring a specific narratological question about the gender distribution of the characters. We find that Dutchcoref is highly accurate in detecting noun phrases, proper names, and pronouns referring to characters, and that it is accurate in establishing their gender. However, the ability to cluster co-references together in a character profile, which we compare to BookNLP’s performance in this respect, is still sub-optimal and deteriorates with text length. We show that, notwithstanding current state of development, Dutchcoref can be applied for meaningful literary analysis, and we outline future prospects.