1. médialab Sciences Po
  2. News
  3. Digital methods at the service of the oral history of the Obama Presidency

Digital methods at the service of the oral history of the Obama Presidency

Since 2019, Columbia University's Incite project has interviewed hundreds of people to produce the official oral history of the Obama presidency, some 1,100 hours of audio and video. To process this digital corpus and make it usable, Jean Philippe Cointet, a researcher at the médialab, mobilized various digital methods.

Post

The "Obama Presidency Oral History" project

Rooted in the tradition of oral history, the Obama Presidency Oral History project aims to document Barack Obama's presidency through the testimonies of over 450 people - members of the administration, politicians, activists, artists and "ordinary citizens". 

The project uses multiple points of view to shift the focus away from the presidential figure and record the memories, personal stories and experiences of those affected by Obama's governance.

“So, rather than creating something like a biography of President Obama, we were really interested in power—people who wield power and people who don’t wield power, and how those things flow back and forth.”, Chris Pandza, designer and former oral history Master of Arts fellow.

Initiated in 2019, the project is being carried out in partnership with the Obama Foundation by the INCITE Institute, which houses the Columbia Center for Oral History Research (CCOHR) - a reference center for the practice and teaching of oral history.

With the aim of producing a presidential oral history unprecedented in its scope and approach, the project will make available the audio and visual documents collected, their transcriptions and summaries according to a detailed thematic indexing.

The contribution of digital methods

Totaling over 1,100 hours of recordings, the richness of the raw corpus collected by the Obama Presidency Oral History project is such that its exploitation required a long analytical process. 

One of the major challenges of the project was to label each of the interview sequences with nearly forty themes (human rights, climate, Chicago, racial policy, terrorism, etc.), providing a key to understanding the 8 years of the Obama presidency. The development of such a typology, applied to such a large quantity of documents, is a qualitative and quantitative task that calls on digital methods.

This is why Jean Philippe Cointet, a researcher at the médialab, contributed to the project by building such an ontology using automatic natural language processing (NLP) methods.

Artificial intelligence was also used to extract named entities from each interview - mainly places, people, organizations or events mentioned - and to enrich each segment with one or more themes, which then serve as a grid for classifying the interviews.

While the processing of qualitative data can be greatly accelerated thanks to artificial intelligence technologies, it is still a tedious task of re-reading and validation, carried out manually by the research team. 

Available data

The first publication of data took place in mid-2023, based on interviews relating to environmental issues ("Climate, environment and energy" theme). In early 2024, on the occasion of the 14th anniversary of the Affordable Care Act (ACA), better known as Obamacare, interviews related to public health policies were also made public.

The rest of the corpus will be released progressively until 2026.