Validation in computational humanities! Basic readings?

Hey, can anyone recommend me readings on the basics of validation? (Esp. within data-intensive, but insight-driven research workflows)

Basics on how, when, why to check your search/model/algorithm results, building gold standards, manual annotation tips, terminology, synthetic data etc?

Looking for basic papers to read for students in an introductory data+dh course, but also improve my own skills on this. It is one of the topics that is perhaps widely ignored or unsystematically done in DH papers. On the other hand, not sure if simple NLP foundations on this give the most relevant tips. (E.g. NLP usually has ground truth to rely on, and complex insight-driven workflows may need checking in various points in the cycle.) But maybe all the tips are the same…

Any recommendations welcome, thanks!


Although it is mainly about Bayesian statistics, I can wholeheartedly recommend McElreacth’s book Statistical Rethinking. Especially for insight-driven quantitative research, this work offers many leads.


Thank you!! I also love Statistical Rethinking and should pick up the slack to finally properly work my way through it.

But indeed, I’m looking for some more general readings, about when and how you should check your results when doing these computational data-intensive humanities analysis. For statistics, these questions are at the forefront indeed yep.

I was hoping for a chapter about this in the upcoming Karsdorp, Kestemont and Riddel. Does that mean there isn’t going to be? :frowning:

@ptinits Regarding more specifically NLP there is Statistical Significance Testing for Natural Language Processing. The chapter on Statistical Hypothesis Testing is particularly good. Perhaps it could be adapted/modified for other types of research?


haha :slight_smile: Well, we most certainly touch upon several aspects of validation (e.g. for topic modeling) but we don’t devote a separate chapter to it.


This really short article by Epstein on modeling, ‘why model?’ I have found it very elucidating when trying to explain models. I agree that the ways in which validation works in the NLP are not always as useful for humanists. I think this article helps to reframe how we view models and the role they have in the production of knowledge.