Speaker: Hannah Seemann and Tatjana Scheffler
Affiliation: Germanistisches Institut, Ruhr-Universität Bochum, Germany
Title: Differentiating Social Media Texts via Clustering
Abstract: We propose to use clustering of documents based on their fine-grained linguistic properties in order to capture and validate text type distinctions such as medium and register. Correlating the bottom-up, linguistic feature driven clustering with text type distinctions (medium and register) enables us to quantify the influence of individual author choice and medium/register conventions on variable linguistic phenomena. Our pilot study applies the method to German particles and intensifiers in a multimedia corpus, annotated for register. We show that German particles and intensifiers differ across both register and medium. The clustering based on the linguistic features most closely corresponds to the medium distinction, while the stratification into registers is reflected to a lesser extent.