Authorship in Computational Humanities Papers

yannryan · June 26, 2020, 11:47am

Hello all,

I’m putting together a submission to the Computational Humanities workshop with a small team, and we’re at the stage where we need to decide our authorship conventions. We come from a variety of backgrounds (English lit, history, physics/biology), and have been finding it difficult to decide on the correct author order (or whether it even matters). Does anyone have experience as to the best practice on multiple author computational/digital humanities papers where the authors come from different disciplines? Do you use science-y practices of the ‘main’ author first and most senior last, or do you just go with alphabetical?

Many thanks,

Yann.

folgert · June 26, 2020, 3:00pm

Hi! In my experience, it’s often clear who is the main author, who is then put first. Sometimes the second author has done an equal amount of work, and I consider it good practice to make that known. Additionally, I like to add a brief statement about the authors’ contributions. There are many guidelines to write such a statement. For example, this is the one of the Royal Society Open Science:

Authors’ contributions

All submissions, other than those with a single author, will require an Authors’ Contributions section which individually lists the specific contribution of each author. The list of authors should meet the criteria provided on our policy page. All contributors who do not meet all of these criteria should be included in the acknowledgements section.

We suggest the following format:

AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses and critically revised the manuscript; EF collected field data and critically revised the manuscript; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication and agree to be held accountable for the work performed therein.

ptinits · June 27, 2020, 1:58pm

I like @folgert’s practical suggestions.

Also shout-out to when pigs fly: changing the author lists and science first, scientists later suggesting that we should arrange them more like film credits, and a CH example using this from the Turing Institute here.

thomas.haider · June 27, 2020, 3:17pm

The ‘Living Machines’ paper looks great!

On a personal note, I like to credit authorship based on merit.
First, who wrote the most and the most convincing, and then who put in the most work regarding experiments etc.

Not sure if I like the ‘film credit’ approach on papers that have less than 6 or 7 authors. For this massive Turing Institute paper it definitely makes sense though.

I am really against alphabetical order, or other games like that. To me, that has a whiff of deliberate obfuscation and rewards laziness.

Lesia_Tkacz · July 2, 2020, 10:57pm

I like the ‘film credits’ example. It highlights the fact that there is a production process going on, and whether we like it or not, a publishing industry as well. The film credits example fits well with the extras/essentials we expect with many papers now: the full dataset, the code, and maybe tutorials and other guides too (which you might already get in Computational Creativity research papers). I don’t mean to use the following term disparagingly: I think what we want from some papers is the researcher’s equivalent of ‘the director’s cut DVD boxset’ - all aspects of the project, not just the clean little camera-ready slice.

folgert · July 3, 2020, 6:49am

I like that metaphor Thanks!

marieke · July 3, 2020, 10:00am

The film credits setup is really neat! One thing to consider though is how this will be indexed by sites like DBLP and Google Scholar and how it should be cited. It can help if the authors specify themselves how a work should be cited (also handy for software and other non-traditional research contributions)

simon · July 3, 2020, 11:08am

Do GScholar and others really parse the PDF for that, or do they get the metadata from other sources (eg one’s institution’s Pure, or arXiv, or whatever the publisher publishes)?

thomas.haider · July 5, 2020, 6:12pm

Google scholar says that a .pdf should be enough.
https://scholar.google.com/intl/en/scholar/inclusion.html

However, the LREC 2020 proceedings are still not indexed. So, I’m not sure what’s going on.

yannryan · July 6, 2020, 8:45am

There are really useful suggestions, thank you to everyone! It seems if nothing else, there’s a fairly general consensus that alphabetical order is not the way to go. We’ll definitely look in to the ‘film credit’ style but it’s probably not necessary for a short paper.

I don’t know about other disciplines, but our (correspondence metadata) datasets are the product of many layers of labour, right across the 19th and 20th centuries as well as more recently. The next challenge is to figure out how to give fair credit to all of these groups!

simon · July 9, 2020, 9:24am

I think they are? I can see your PO-EMO paper there (both the ACL anthology version and the lrec-conf.org one) if that’s what you’re referring to.

thomas.haider · July 9, 2020, 10:00am

Yes, the paper itself is in there, you are right.

However, it already has a citation (https://arxiv.org/pdf/1912.03184.pdf) that doesn’t show up.
Weirdly enough, that particular paper already has some citations. Probably because it had been on arxiv for a while.

Also, none of the papers that we cited actually got the citation.
Looking at a small sample confirmed that this is also the case for other LREC papers.

Semantic Scholar got it right, though.

simon · July 9, 2020, 10:47am

Ah, weird! GScholar misses a lot of those. There was an issue recently with GScholar not parsing PDFs correctly if people use the “canonical” bibs generated by the ACL Anthology (https://github.com/acl-org/acl-anthology/issues/434#issuecomment-629826425), which was fixed recently – perhaps the paper was indexed before the fix, and you have to wait for it to re-index it again.

(We’re probably off-topic)