Skip to main content

Theme 4 – Transversality of digital humanities

Overview

History and prospects for development

The UMR Litt&Arts was responsible for one of the foundational projects in digital humanities: the Stendhal online manuscripts project, which was the methodological and technological precursor to many digital corpora projects.

Digital humanities are understood here as the set of data science methods and tools applied to the sciences of the arts and humanities, as well as the questions they raise. As such, the “Transversality of Digital Humanities” research theme focuses on the role of digital technology in research on the arts and humanities. 

By definition, this research theme is not connected to the physicality of sources (manuscripts, published works, library holdings, performing arts etc.) or to literary periods. It supports the laboratory’s three research themes and draws on the researchers’ projects to inform its fundamental questions.

Its work is based on the four major steps of the data cycle:

  • Production: digital literacy, interfaces, participatory and collaborative production
  • Use: digital methods for processing literary and artistic data
  • Presentation: open research data and FAIR principles, digital publishing
  • Sustainability: long-term data preservation

Production 

Unlike in fields such as language sciences, where source data is often available prior to a research project, the arts and humanities often face the tedious and technically complex task of producing data in order to conduct its research.

To help make the task less complex, production tools (transcription and annotation) must be developed for the corpus in accordance with users’ digital literacy, while also improving this literacy. This means producing accessible interfaces, while disseminating a digital culture so that researchers may gain a better understanding of data, its structure and the potential for structured data.

Participatory platforms seem to be a solution for making data production less cumbersome, but they must be studied in detail. Beyond the hype surrounding Web 2.0, we seek to qualitatively and quantitatively examine the data produced through such platforms and study the impacts of data quality on research.

Use

Data on research in arts and humanities has not yet reached—at least, within the research unit’s practices—the thresholds required for deep-learning analyses. It is a deliberate choice to distance ourselves from such approaches, which are largely studied elsewhere, to focus on symbolic and statistical approaches that give rise to new knowledge through traceable, interpretable mechanisms.

We leverage literary and artistic data using natural language processing methods. However, the language models used by these tools are not always suited to research data: imperfect spelling or grammar forms in manuscripts, outdated forms of the French language, old or little-documented languages etc. The work in this research theme therefore involves reconsidering, adapting and assessing tools.

Presentation

Digital humanities are based on the principle that research results must always be reproducible. To this end, we believe that compliance with FAIR Data principles (Findable, Accessible, Interoperable, Reusable) is a key issue in order to promote our research subjects, and most importantly, to open them up to scientific debate. But these fundamental principles must still be put into practice, not only from an ethical point of view within the scientific community, but from a technological point of view.

The other side of presenting data is the question of promoting it. While many projects have benefited from digital editions or hybrid print-digital editions, new ways to make research subjects available—both to expert audiences and the general public—must still be invented.

Sustainability

To address the aspects mentioned above (use, experimental reproducibility and presentation), we must consider the issue of data sustainability in the long term, meaning not only the issue of the physicality of media, but also the reusability of data in the long term.

This issue naturally concerns the data produced (theme 1). Beyond its interoperability, the goal is to identify methods to characterise encoding in order to make this data reusable and transposable to other formats. It also relates to tools (for producing and using data) to ensure the reproducibility of observations and analyses, again, in the long term. The issues also extends to innovative forms of publishing (or musealization), since they depend on both data and tools.

Work on data sustainability is being carried out through a collaboration between the TGIR Huma-Num and CINES, but the other aspects are unfortunately on hold at this time.

Funded projects (in French)

Research projects (in French)

In charge of

Members

Submitted on 2 November 2023

Updated on 9 September 2024