Our data set consisted of excerpts of 10-k filings relating to “climate risk” extracted from a subset of the Russell 3000 companies (download the full dataset). We dealt with roughly 600 companies across 5 industry sectors -- Oil and Gas, Electric Utilities, Insurance, Food and Agriculture, and Textiles and Apparel.

Corporate 10-k filings typically stretch to a 100 pages and not every company is necessarily disclosing climate-related risks either because they have not started analyzing them or because they consider them immaterial. So isolating the bits of text (if any) addressing climate risk within these broader reports takes some clever extractive thinking.

According to the SEC’s interpretive guidance on climate risk, there are various sections of the 10-K report where it is appropriate to make climate-related disclosures. These include:

Jackie Cook, from CookESG Research, devised a series of rule based algorithms that, based on a series of key-word queries working in sequence, find these bits of text in the larger reports. Each piece of extracted text (ranging from a couple sentences to a few paragraphs), was compiled into a year-by-year corpus of climate disclosures statements for each company. We then brought these .csv files into the natural language processing platform called CorText to parse the texts and identify the most salient terms and phrases being used to discuss climate risk in the corporate filings, find their relations to each other through co-occurence algorithms, and then spatialize and organize these relationships using various visual strategies (network graphs, term histograms and radial diagrams). There is a more detailed description of the methods and data treatments used for each visualization on the each visualization page.

If you request more information about the methodology applied in compiling the data set please contact If you are interested in the data set and the digital methods applied to its analysis please contact