top of page

Finding reporting gaps: essential to ESG analysis and a weakness of LLMs.

Writer: Claire WallsClaire Walls

Updated: Mar 3

27 January 2025



Can sycophants be good analysts?

In the past two years, a lot has been said about the limitations of large language models (LLMs). They can be inconsistent, sometimes lazy, have shown bias and there exists a security concern for companies looking to add them to internal systems. 

In a product development meeting this month, a developer called AI chatbots “sycophants”. This was offered as an explanation for the fact that LLMs are often too generous in their responses, when being used to critically analyse corporate reports. These tools are biased to find an answer to the question asked, even if there isn’t one. As such, when asked about the contents of a given text, a model will sometimes provide the user with a wrong answer rather than admit ignorance.  

This is part of the infamous “AI hallucinations”. Counter-intuitively, these have worsened as the models evolved with larger training data sets. This is the result of encouraging models to be non-evasive, to avoid turning users away. Earlier models would often avoid responding, simply stating “Sorry, I can’t help you with that”. To prevent this, the models now tend to provide wrong answers more often, although they appear entirely sensible. It’s important to keep in mind that these models don’t know anything, they are guessing the most probable response based on their training data. 

As a result, LLMs are very good at providing well established knowledge, such as the shopping list for a user hoping to bake an apple pie, but not for analytical work which requires some reasoning. 

 

 

Tackling the CSRD, one prompt at a time

Insig AI is developing a way to use AI chatbots to evaluate corporate reports against the many requirements of the CSRD. The objective is to surface all CSRD reporting gaps by automatically running company reports through an AI model, with a series of prompts relating to each section of the ESRS.The unique problem of using AI chatbots to evaluate ESG disclosure lies in the intersection of the sycophantic nature of AI, and the tendency for omission in non-financial reporting. Anyone familiar with corporate ESG disclosure will know that companies over-emphasize their existing sustainability initiatives and evade mentions of issues where they aren’t doing well enough. 


This has led to increasing discussions about “greenhushing”, as well as a rise in legislation of “comply or explain” requirements. Companies withhold information about their sustainability performance, out of fear of criticism. Regulators, as a response, have started to require companies to provide an explanation as to why they do not comply with a legislation. This should mean that there is disclosure, even if a policy is absent. However this isn’t currently widely in place and non-financial corporate reports remain opaque. 


Reading between the lines

ESG analysts have, for years, read between the lines of Sustainability Reports, understanding that what goes unsaid in reporting can be as insightful as the text itself. This is a critical analysis skill which is difficult to automate. LLMs are unable to understand the implied gaps in reporting and have been programmed to guess an answer rather than say “there is no answer to this question”. They are trained to prioritise giving an answer, so the response may not be as specific as needed. 

Below is an example of an over-generous interpretation of the question asked by the user. 


The model was provided with an Annual Report and asked to surface disclosures of current and future resources, financial or other, allocated to Climate Change mitigation. An analyst reading the report will have answered “the company doesn’t say anything about that”. The model has found commitments and goals which are vaguely related, and made an uneducated guess that this may represent the answer. 
The model was provided with an Annual Report and asked to surface disclosures of current and future resources, financial or other, allocated to Climate Change mitigation. An analyst reading the report will have answered “the company doesn’t say anything about that”. The model has found commitments and goals which are vaguely related, and made an uneducated guess that this may represent the answer. 

Mind the gaps

To avoid a very generous interpretation of corporate disclosures, the LLM can be primed to be critical. Prompts can be constructed to give more accurate answers. The query given must be clear, precise and as detailed as possible, which prevents an off-topic answer. Additionally, it’s possible to add systematic instructions for the model, in the form of rules, which are appended to every prompt and direct the answers to be more insightful. This can include rules such as:

  • “Begin with a clear summary of evidence quantity and significance”

  • “Provide evidence for your answer”

  • “Avoid repetition”

  • “Only include evidence that matches the question”

  • “If no valid evidence exists, give a negative answer”

This results in stronger answers, with evidence provided, which the user can assess. As well as providing evidence, linking back to the original source is necessary. This builds trust, and allows a user to quickly double-check the answer, should any doubt remain. 


It’s not possible to completely avoid wrong answers, but curbing the behaviour with precise prompts can improve the response quality and cut the analysis time. Models can also improve through dedicated training, provided with examples of good and bad answers, and learning to replicate the best answers. 

There are a lot of reasons to be optimistic about the advantages of using LLMs to evaluate corporate disclosures. Obtaining the highest quality analysis requires an investment of time from both subject matter experts and data engineers to create an appropriately demanding prompt. 


Comments


Commenting has been turned off.
bottom of page