In my research into cultural change, it is important not to focus too much on a few
canonized data sets. That is why I am always looking for interesting, new or forgotten,
relatively unknown data sets. On my recent travels around the Internet, I collected the
following list. I call it “Corpora Obscura”:
- Auction Catalogs - Collection of an estimated 30,000 auction catalogs published from 1744 to the present by auction houses around the world.
- Amsterdam in WW II - Data set about where and when bombs were dropped on Amsterdam in the 1940-1945 period.
- Bigfoot Sightings - Full text and geocoded sighting reports from the Bigfoot Field Researchers Organization (BFRO).
- Coin Production in the Low Countries - Datasets on coin production in the Southern and Northern Low Countries (present-day Netherlands, Belgium and Luxembourg) that were compiled by various historians over the past few decades.
- Comic book route - Location of comic book walls of the City of Brussels (with characters and authors).
- Convict Tattoo Descriptions - Descriptions of tattoos of at least 60,000 convicts in the Old Bailey proceedings.
- Danish Folklore Fieldtrips - Historical GIS data of the Danish folklore collection of Evald Tang Kristensen (1843-1929).
- Death Row Last Statements - Dataset of the Texas (US) Departement of Criminal Justice with Last Statements of people on Death Row.
- English Jokes - A dataset of 200k English plaintext jokes.
- Pudding data - Data sets created for stories on The Pudding, open to the public.
- Red Riding Hood - A corpus of more than 400 Dutch retellings of “Little Red Riding Hood” (1780-2015). (This one is mine, actually )
- South Park Script Data - CSV files containing script information including: season, episode, character, & line.
- The Paper Chain Letter Archive - Transcriptions of paper chain letters from a long historical period.
- The Tate Collection - Metadata for around 70,000 artworks housed by the Tate galleries.
- UFO Sightings - 80,000 UFO sighting reports for approximately a century of data.
- Vincent van Gogh: the Letters - All the surviving letters written and received by Vincent van Gogh (1853-1890) in XML.
- Witch Trials - Data set on witch trials in Europe.
- Women in Film - Data collection on how women are portrayed in film.
Maybe there is something for you here, or, perhaps you have some interesting and
preferably obscure additions?