Speaker: Lyra D’Souza and David Mimno
Affiliation: 1, Department of Computer Science, Cornell University, USA; 2, Department of Information Science, Cornell University, USA
Title: The Chatbot and the Canon: Poetry Memorization in LLMs
Abstract: Large language models are able to memorize and generate long passages of text from their pretraining data. Poetry is commonly available on the web and often fits within language model context sizes. As LLMs continue to grow as a tool in literary analysis, the accessibility of poems will determine the effective canon. We assess whether we can prompt current language models to retrieve existing poems, and what methods lead to the most successful retrieval. For the highest performing model, ChatGPT, we then evaluate which features of poets best predict memorization, as well as document changes over time in ChatGPT’s ability and willingness to retrieve poetry.