Speaker: Joris J. van Zundert (1, 2), Marijn Koolen (1, 2), Julia Neugarten (3), Peter Boot (1), Willem van Hage (4) and Ole Mussmann (4)
Affiliation: (1) KNAW Huygens Institute, Amsterdam, the Netherlands; (2) DHLab, KNAW Humanities Cluster, Amsterdam, the Netherlands; (3) Radboud University Nijmegen, Nijmegen, the Netherlands; (4) eScience Center, Amsterdam, the Netherlands
Abstract: We apply Top2Vec to a corpus of 10,921 novels in the Dutch language. For the purposes of our research we want to understand if our topic model may serve as a proxy for genre. We find that topics are extremely narrowly related to an existing genre classification historically created by publishers. Interestingly we also find that, notwithstanding careful vocabulary filtering as suggested by prior research, various other signals, such as author signal, stubbornly remain.