I would first draw the line between literary and linguistic data.
An example is data on rhyming vs. cognates. Having pairs of words that rhyme is literary data, while cognates would be linguistic data. Both datasets allow you to determine the similarity of sounds (e.g. vowel similarity in imperfect rhyme), and how these similarities changed over time. Both can be used to reconstruct pronunctiation and linguistic typology:
List, J. M., Pathmanathan, J. S., Hill, N. W., Bapteste, E., & Lopez, P. (2017). Vowel purity and rhyme evidence in Old Chinese reconstruction. Lingua Sinica , 3 (1), 5.
Katz, J. (2015). Hip-hop rhymes reiterate phonological typology. Lingua , 160 , 54-73.
Rama, T., & List, J. M. (2019, July). An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 6225-6235).
List, J. M. (2016). Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction. Journal of Language Evolution , 1 (2), 119-136.
However, with rhyme (or literary data in general) you deal with a considerable amount of poetic license. Poets might expand the notion of what rhyme can be and reinforce this through additional schema consistency in the poem.
Depending on what you want to look at, this makes it somewhat hard to figure out what is licensed and what is not, or which feature is marked, and what is just ordinary language use.
Adam Hammond wrote an interesting article on the divide of the disciplines and how they respectively deal with their data.
Hammond, A., Brooke, J., & Hirst, G. (2013, June). A tale of two cultures: Bringing literary analysis and computational linguistics together. In Proceedings of the Workshop on Computational Linguistics for Literature (pp. 1-8).
He raises the point that literary scholarship might be deliberately interested in ambiguity or polysemy and thus, distinct from other analytic schools, aims not to resolve ambiguity but to describe and explore it (cf. Jakobson, or Empson). I also like Fotis’ point that humanities data mainly deals with artefacts that not only have a historical but also an aesthetic dimension. The problem is then that the humanistic approach is mainly one of ‘criticism’ (to determine aesthetic valuation or historical embedding), rather than of finding universal laws, or probing (computational) methodology.
In Computational Linguistics, by contrast, ambiguity is almost uniformly treated as a problem to be solved. The focus is on disambiguation, with the assumption that one true, correct interpretation exists. I’m assuming this is partly grounded in the computer sciences that traditionally aim at deterministic systems, or at least replicable results and consistent datasets, but also recognizing that there is often a finite number of possibilities to analyze a certain linguistic phenomenon (e.g. scope underspecification in syntax).
Hammond notes that computational work in the humanities recognized the challenge of “subjective” annotation, or tries to find aspects of texts which readers would not find particularly ambiguous, for example identifying major narrative threads or distinguishing author gender.
I wonder how this historically grown methodological divide shaped the descriptive inventory (terminology) of the respective disciplines, and how this inventory can be expanded, e.g., by reconciling the different epistemological interests. Also, I am very interested in how we could determine the boundaries of (necessary and sufficient) ambiguity depending on the problem we are studying, and how much (non-)ambiguity certain annotation workflows allow and how models built on these datasets can deal with that.
When we annotated emotions in poetry, we tried to integrate the best of both worlds.
Haider, Thomas, Steffen Eger, Evgeny Kim, Roman Klinger, and Winfried Menninghaus. (2020). “PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry.” LREC 2020. arXiv preprint arXiv:2003.07723.
The first batches showed seemingly conflicting annotation in some places, e.g. two annotators would label some lines/stanzas with ‘Humor and Vitality (found it animating)’, while annotator three annotated ‘Sadness’. Upon inspection, we found a case of Schadenfreude in this particular poem. The annotators did not really recognize that Rilke might have intended a mixed emotional reaction (they were not supposed to interpret the text anyway, because this is difficult and time consuming).
In another case we found that annotators agreed that certain lines of Georg Trakl elicited both feelings of ‘Awe/Sublime’, but also of ‘Uneasiness’. This reinforced our notion that we need multiple labels per instance (line) to cover the emotional range of poetry, while not losing sight of complexity (thus only allowing two labels per line).
In the end we decided to have the annotators create a goldstandard of 48 poems with majority voting and discussing how their different views might be reconciled. When they were annotating the rest of the dataset we instructed them to annotate according to how they feel, and if they were not sure, they should annotate according to goldstandard, i.e., how they think the others would annotate. That improved consistency to a point that is usable for computational modeling (.7 kappa).
These are my five cents. Hope it helps.