Speaker: Marijn Koolen (1, 2) and Rik Hoekstra (1,2)
Affiliation: (1) KNAW Huygens Institute, Amsterdam, the Netherlands; (2) DHLab, KNAW Humanities Cluster, Amsterdam, the Netherlands
Title: Detecting Formulaic Language Use in Historical Administrative Corpora
Abstract: Historical administrative corpora are filled with jargon and formulaic expressions that were used consistently across many documents. Governmental decisions, notarial deeds and official charters often contain fixed expressions to ensure that the same legal aspects in different documents had the same interpretation. Such formulaic expressions can be used to identify specific elements of a document. For instance, a deed has different formulas to indicate whether it concerns the sale of property or the transferal of rights. In this paper we explore formulas as a methodological devise to structure the text of an administrative corpus and make the information contained in it better accessible. We use a data-driven method to detect potential formulaic expressions in historical corpora, that can deal with spelling variation and change and recognition errors introduced in the digitisation process. We apply this exploratory technique on a corpus of almost 300,000 eighteenth-century resolutions of the States General of the Dutch Republic and find many formulaic expressions that capture relationships between the political actors involved and the decisions that were made. A first analysis suggests that many formulas can be used to add metadata to individual resolutions on various elements of the proposals and decisions that are part of each resolution.