A Topological Data Analysis of Navigation Paths within Digital Libraries

:speech_balloon: Speaker: Bayrem Kaabachi and Simon Dumas Primbault

:classical_building: Affiliation: 1, Laboratory for the history of science and technology (LHST), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland; 2, Biomedical Data Science Center (BDSC), Centre Hospitalier Universitaire Vaudois (CHUV), CH-1002 Lausanne, Switzerland; 3, OpenEdition (UAR 2504, CNRS/EHESS/AMU/AU), 22 rue John Maynard Keynes, 13013 Marseille, France; 4, Bibliothèque nationale de France (BnF), Quai François Mauriac, 75706 Paris, France

Abstract: The digitization of library resources and services have opened up physical informational spaces to new dimensions by allowing users to access a wealth of documents in ways that differ from browsing bookshelves traditionally organized according to the “tree of knowledge”. How do readers of digital library orient themselves within big corpora? What landmarks do they use to navigate masses of digital documents? Taking Gallica as a case study–the digital heritage platform of the French national library–, this paper presents an experimental research on the navigation practices of its users. Using methods from topological data analysis, we inferred from Gallica’s server logs an informational space as it is roamed by readers. Coupled with user interviews, this mixed-methods study allowed us to identify a set of “regimes of navigation” characterizing how readers deploy various strategies to browse the digital library’s corpus. From directed search to wandering to crawling, these regimes answer different needs and show that a single corpus can, in turns, be apprehended as a heritage collection, a database, a set of documents, and a mass of information.

:newspaper: Link to paper