By Steve Cracknell, Founder and Chief Product Officer, Insig AI
and Diana Rose, Head of ESG, Insig AI

Introduction
In the rapidly evolving intersection of technology and corporate sustainability reporting, there is a growing role for specialised Large Language Models (LLMs).
Open-source generalist LLMs like GPT-4 are ground-breaking. And the next natural step is to explore the unique value that topic specific LLMs can bring.
While not suitable for many tasks, there are exciting possibilities to use LLMs to consolidate vast amounts of information to support human analysis across business, finance and sustainability.
This article explains the differences between generic and specialised LLMs and the rationale and process for creating them, focussing on use cases in sustainable business and investment.
Why are LLMs a good fit for corporate sustainability?
Experts bridging AI and corporate sustainability recognise that the application of LLMs in corporate sustainability is not just part of the Generative AI buzz, but practical and, while it’s early days, likely to become the norm.
Companies are under huge pressure from investors, regulators and consumers to publish ever more environmental and social sustainability disclosure alongside financial and governance reporting.
This creates an enormous burden to report comprehensively and accurately, and a burden also for stakeholders to make sense of that information in context. Disclosure should explain what difference the company’s sustainability initiatives make to their risk and impact on people and planet, but everyone working in this space knows how hard it is see the wood for the trees.
With supply chain (also known as value chain or Scope 3) information now included under EU and other reporting regulation, the challenge of keeping data cohesive and meaningful is only growing. Analysts are already drowning in data and overwhelmed with a massive research burden, further compounded by concerns over greenwashing.
Specialised LLMs models offer:
– Expertise for nuanced outputs: A topic centric LLM can be trained in social impact indicators, environmental technologies, risk and impact frameworks or the latest regulations. Humans can’t be experts at everything in a fast-moving landscape, and a specialised LLM’s outputs can save research time and enhance a human team’s efforts in analysis.
– Better decision-making: For businesses, navigating the increasingly complex web of corporate reporting, and encompassing supplier performance into this, requires informed and evidence-based decisions. An LLM focused on this area can synthesise vast, messy datasets into actionable insights.
– Driving innovation: The dynamic and complex field of corporate sustainability and investment is ripe for the predictive, big data capabilities of this technology. When applied responsibly and trained expertly, LLMs in this space are a great example of using AI for good.
What’s the difference between a generic and specialised LLM?
Generic LLMs
Broad knowledge base: Generic LLMs like BERT, GPT-4, Bard and Claude (to name a few) are trained on a vast array of topics. They have a wide-ranging but sometimes superficial understanding of many subjects, from everyday knowledge to specific fields.
General purpose: Designed for a variety of applications, they are versatile. Whether it’s writing content, coding, answering trivia or providing business advice, they can handle a multitude of tasks.
Flexibility: Due to their broad training, generic LLMs can pivot between topics easily, making them useful for unpredictable queries.
Training data: They are trained on diverse datasets, often encompassing a large chunk of the publicly available internet text, which includes books, websites, articles and more.
Limitations in specialisation: While capable in many areas, breadth can lead to a lack of depth. They may not always provide the most up-to-date or in-depth information on specialised topics.
Specialised LLMs
Focused expertise: These models are fine-tuned on a specific subject area and this training allows them to provide more detailed and accurate responses in their area. FinBERT, ESGBERT and ClimateBERT are good examples of more topic centric models, which can form the basis of even more specialised models.
Targeted use: Their use can be better targeted. A sustainability focussed LLM could be trained on international ESG frameworks, or be more tailored to handle queries relating to the particular priorities of the user, such as supplier community and human rights performance in line with international standards.
Depth over breadth: While they might not possess the same versatility across a wide range of topics, their capacity for accuracy in a particular field is significantly greater.
Customised training data: The training dataset is curated to include extensive information about the particular topic. This might involve academic articles, industry reports and high-quality disclosures.
Precision and relevance: Due to their specialised nature, these models are more likely to provide up-to-date, nuanced, and contextually relevant information in their area of specialism.
In Summary…
While a generic LLM is a ‘jack of all trades’, a topic specific LLM is a ‘master’ in its domain, providing depth and precision in its outputs in a particular field. The choice between the two depends on the needs and goals of the user or application.
How to build a topic specific LLM
Creating an LLM suitable for supporting sustainable business or finance involves several key steps:
1. Data gathering: The foundation for any model is a comprehensive dataset encompassing robust sources such as scientific research, industry reports, regulatory documents and reliable news in the sector.
2. Data pre-processing: This step involves cleaning, standardising, tagging and labelling the data and ensuring it’s relevant and unbiased.
3. Model training: Utilising machine learning techniques, the model is trained on this curated dataset. This phase demands significant computational resources, AI expertise and supervision.
4. Fine-tuning and ongoing evaluation: Post-training, the model undergoes fine-tuning with more specific datasets and use cases. Regular evaluations and updates are essential to maintain its accuracy and relevance.
5. Ethical alignment: It’s vital to ensure the model’s adherence to ethical standards, particularly in data sensitivity and bias mitigation.
Real-world applications of a specialised LLM
The potential use cases of a specialised LLM in sustainable business and finance are vast. Here are a few examples:
– Sustainable investment research: By incorporating large datasets, an LLM can inform sustainable investment strategies, identify risks and highlight opportunities in line with a framework such as the SDGs.
– Navigating regulatory compliance: Businesses can use a bespoke model to stay abreast of and comply with the latest sustainability regulations.
– Supply chain sustainability assessment: Companies can track the environmental and social profile and performance of suppliers to monitor and report their own sustainability footprint.
– Identify and monitor greenwashing: A specially trained model can aid in identifying potential flags for corporate and financial greenwashing when mapped to a defined framework.
– Aiding public policy: Governments can use these models to inform policymaking, based on environmental data analysis and impact forecasting.
Conclusion
The development of a specialised LLMs is not a stand-alone ‘clever’ technological advancement to be adopted for the sake of it. When done well, it is a step towards more research-driven, evidence-based decision making in business and finance.
We believe the role of this technology will become a driver for sustainable growth, centred on high quality data, targeted design and ethical practice. When applied responsibly, specialised LLMs will support innovation, adaptation and improvement in the challenge of achieving our shared global goals for sustainable business and investment.
Comments