Speaker: Folgert Karsdorp (1), Enrique Manjavacas (2) and Lauren Fonteyn (2)
Affiliation: (1) KNAW Meertens Institute, Amsterdam, the Netherlands; (2) Leiden University, Leiden, the Netherlands
Title: Introducing Functional Diversity: A Novel Approach to Lexical Diversity in (Historical) Corpora
Abstract: The question how we can reliably estimate the lexical diversity of a particular text (collection) has often been asked by linguists and literary scholars alike. This short paper introduces a way of operationalizing functional diversity measurements by means of token-based embeddings, and argues that functional diversity is not only a practically advantageous, but also a theoretically relevant addition to the Computational Humanities Research toolkit. By means of an experiment on the historical ARCHER corpus, we show that lexical diversity at the level of functional groups is less sensitive to orthographic variation, and provides insight into an important and often disregarded dimension of vocabulary diversity in textual data.