top of page

The growing need for AI-ready corporate ESG reports

Writer: Claire WallsClaire Walls

Updated: Mar 3

By Claire Walls, Insig AI ESG Product Manager

2.3.25



The recent and rapid development of Large Language Models (LLMs) has generated excitement over the various ways they can be applied to speed up current workflows. In the world of corporate reporting, the question is whether they could be used to streamline the creation as well as the analysis of reports. As a consequence, the advent of these models is likely to encourage a fundamental change in how companies report their information.


The advantage of using AI in evaluating company ESG performance is undeniable. The process is costly in terms of time and resources. Annual Reports are long, averaging 248 pages. Information is spread across multiple documents, the information is sometimes quite technical and thus hard to read, and comparative analysis (between companies or year on year) increases the time needed to sift through the information.


Investors have been using ESG ratings agencies to bypass time-consuming work, often relying on just an output number and trusting somewhat opaque methodologies. These agencies themselves have been using LLMs and other automated systems to speed up the analysis.


Who's Reading ESG Reports?

LLMs are a game changer, allowing investors more direct control over the way they analyze company documents. One issue remains—reports are still largely made to be read by humans, not machines. The difference is non-trivial.


The current best practice in terms of transparency is to publish reports and policies as PDFs on company websites. This allows the report to be downloaded and sent as an attachment via email. Crucially, providing a PDF version of the Annual Report allows for version control. These documents are clearly dated, and amendments have to be published as a new PDF with a new date.


However, PDFs present their own challenges to automated analysis. Before this can be run through an AI model of any kind, the text has to be extracted, sometimes using Optical Character Recognition (OCR) for difficult documents. Text extraction from PDFs is rarely perfect, and some of the content can be lost, mostly because of the formatting of “glossy” reports.


Human vs. Machine

Examples of things that are designed to be read by humans and are complicated for machines to read:


  • Graphs and images

  • Tables

  • Columns or uneven text layouts

  • Special characters

  • Fancy fonts

Below are some examples of “glossy” sections and the extracted text, which illustrate some of the issues that are common. Here is a piece of financial information:




Here is the resulting extraction:




Here is a table:




Here is the resulting extraction: 




As a result of this, there is an increasing call for more toned-down formats that can be parsed through automation more easily. For example, SEC filings, such as Form 10-Ks, are required in the US, and XBRL (eXtensible Business Reporting Language) documents are on the rise and in the pipeline for the EU’s CSRD requirements. These formats are very unappealing to the human eye but ideal for finding and extracting data from the reports.



Future-Proofing Formats

As LLMs are increasingly likely to be doing a “first pass” analysis on corporate reports, we can assume that the need for a “machine-ready” format will increase. Companies may choose to simply make their reports less stylized or publish two different versions of the same documents, just like 10-Ks and glossies, PDF and XBRL formats. This is worth considering for companies to ensure that their information is being evaluated to its full extent. Companies that continue relying on highly stylized PDF glossy reports will have to rely on text extraction and accept the risk of information being lost.



Comments


bottom of page