AI for ESG Data Collection

The supply chain data problem

For most ESG teams, data collection is the single biggest time sink. Not analysis. Not strategy. Not even reporting. Just getting the data.

Consider a typical Scope 3 emissions calculation. You need activity data from your supply chain — and that means sending questionnaires to hundreds of suppliers, many of whom have never reported sustainability data before. The responses come back in different formats: some in your standardised template, some in their own format, some as PDFs, some as spreadsheets, and some not at all.

Your team then manually extracts the relevant data points from each response, converts units, enters them into your consolidation spreadsheet, and follows up on gaps. For a company with 300 suppliers, this process can take 6-8 weeks of dedicated effort.

AI cannot make your suppliers respond faster. But it can compress the processing time from weeks to days — and dramatically improve accuracy in the process.

What is the most time-consuming part of your ESG data collection process?

Processing supplier sustainability questionnaires at scale

Here is a practical workflow for using AI to process supplier questionnaires. This is not theoretical — it is the approach that leading ESG teams are using today.

Step 1: Define your extraction template. Before you give AI a single document, define exactly what you need extracted. For a Scope 3 supplier questionnaire, this might include: annual energy consumption (kWh), fuel type breakdown, Scope 1 and 2 emissions (tCO2e), waste generated (tonnes), water consumption (cubic metres), number of employees, and revenue.

Step 2: Create a structured prompt. Tell the AI exactly what to extract, in what format, and how to handle edge cases. Here is an example:

You are an ESG data analyst processing supplier sustainability questionnaires.

Extract the following data points from the attached supplier response:
- Company name
- Reporting period
- Total energy consumption (convert to kWh if reported in other units)
- Energy breakdown by source (renewable vs non-renewable)
- Scope 1 emissions (tCO2e)
- Scope 2 emissions (tCO2e) — specify if market-based or location-based
- Total waste generated (tonnes)
- Waste diversion rate (%)
- Water consumption (cubic metres)
- Number of employees (FTE)

For each data point:
- If present, extract the exact value and the page/section where you found it
- If missing, mark as "NOT REPORTED"
- If the unit is ambiguous, flag it for human review
- If the value seems implausible (e.g., energy consumption per employee >50,000 kWh), flag it

Return the results as a structured table.

Step 3: Batch process. Feed supplier responses through this prompt in batches that fit within the context window. A batch of 10-15 questionnaires can typically be processed simultaneously, allowing cross-comparison.

Step 4: Human review of flagged items. AI's output will include clean extractions and flagged items. Your team reviews only the flags — typically 10-20% of total data points — rather than every single field.

Extracting metrics from utility bills and energy reports

Utility bills are a different challenge from supplier questionnaires. The data is usually accurate (it comes from the utility provider), but the format varies wildly between providers, countries, and facility types. A single company might receive electricity bills from 15 different providers across 8 countries, each with different layouts, languages, and units.

AI handles this well because the underlying data structure is consistent even when the format is not. Every electricity bill contains: the billing period, consumption in kWh (or equivalent), cost, and usually the facility address or account number.

A practical prompt for utility bill extraction:

Extract the following from this utility bill:
- Utility type (electricity, natural gas, water, district heating)
- Provider name
- Account/meter reference
- Billing period (start and end dates)
- Total consumption (with original unit)
- Consumption converted to standard unit (kWh for energy, cubic metres for water)
- Total cost (with currency)
- Facility/site identifier (address or account name)
- Any renewable energy certificates or green tariff indicators

If this is a summary bill covering multiple meters/sites, extract data for each meter/site separately.

The power of this approach scales with volume. Processing 10 utility bills manually takes an hour. Processing 500 manually takes a week. Processing 500 with AI takes the same time as processing 10 — plus human review of flagged items.

How many utility bills or energy data sources does your organisation process per reporting cycle?

Standardising data across different supplier formats

One of the most frustrating aspects of ESG data collection is that every supplier reports differently. Some use your template. Some use their own sustainability report. Some send a one-page email with a few numbers. Some send a 50-page PDF where the relevant data is buried on page 37.

AI handles format variation remarkably well because it processes language, not layouts. Whether the emissions figure is in row 12 of a spreadsheet, paragraph 4 of a PDF, or embedded in a sentence of an email, AI can find it.

The key is giving AI a clear instruction about what to extract regardless of format:

The attached document is a supplier's response to our sustainability data request.
The response format may vary — it could be our standard questionnaire, the supplier's
own sustainability report, an email, or a spreadsheet.

Regardless of format, extract the following data points. For each, provide:
1. The extracted value (with original unit)
2. The standardised value (converted to our standard units)
3. The source location (page number, section, or cell reference)
4. Confidence level (HIGH if clearly stated, MEDIUM if inferred from context, LOW if estimated)

Data points to extract:
[your standard list of required metrics]

The confidence level field is critical. It tells your review team where to focus their attention. A "HIGH confidence" extraction from a clearly labelled field in a standard template can be spot-checked. A "LOW confidence" extraction that was inferred from narrative text needs careful human verification.

Gap analysis — identifying missing data before deadlines

Missing data is the hidden risk in ESG reporting. You can process every questionnaire that comes back, but if 30% of your suppliers have not responded, you have a gap that threatens your disclosure completeness and accuracy.

AI can help with gap analysis at two levels:

Supplier-level gaps: After processing all received responses, AI generates a completeness matrix — which suppliers have responded, which have not, and what percentage of your Scope 3 emissions (by category) is covered. This tells you whether your data coverage is sufficient for disclosure, or whether you need to escalate follow-ups.

Field-level gaps: Even among suppliers who responded, many leave fields blank or provide partial data. AI can generate a field-by-field completeness report: "87% of suppliers reported Scope 1 emissions, but only 54% reported Scope 2 with market-based vs location-based breakdown." This tells you which data points you can confidently disclose and which need more collection effort.

A practical prompt for gap analysis:

I have processed [X] supplier sustainability questionnaires out of [Y] total suppliers
in our reporting boundary.

Based on the extracted data, generate a gap analysis report:

1. SUPPLIER COVERAGE: List non-responsive suppliers, ordered by their estimated
   contribution to our Scope 3 emissions (based on procurement spend as a proxy)
2. FIELD COMPLETENESS: For each required data point, what percentage of responding
   suppliers provided it?
3. CRITICAL GAPS: Which missing data points would prevent us from completing our
   [CSRD/TCFD/SEC] disclosure requirements?
4. RECOMMENDED ACTIONS: Which suppliers should we prioritise for follow-up, and
   which gaps can be addressed with estimation methodologies?

This gap analysis, run mid-cycle, gives you weeks of lead time to chase critical responses before your reporting deadline.

Automating follow-up with non-responsive suppliers

Non-responsive suppliers are a universal problem. In any given data collection cycle, 20-40% of suppliers will miss the initial deadline. Chasing them is tedious, repetitive, and critical.

AI can automate the follow-up workflow:

Tiered escalation drafting. Based on your gap analysis, AI can draft follow-up communications at different urgency levels:

First reminder (2 weeks past deadline): polite, referencing the specific data points needed
Second reminder (4 weeks): firmer, noting the regulatory requirement driving the request
Escalation (6 weeks): addressed to a more senior contact, referencing contractual sustainability obligations
Final notice: flagging that non-response will result in estimation using industry averages, which may not represent the supplier favourably

Prioritisation. Not all non-responsive suppliers are equally important. AI can rank them by materiality — using procurement spend, emissions intensity of their sector, or their contribution to your highest-risk Scope 3 categories — so your team focuses chase efforts where they matter most.

Simplified response options. For suppliers who are struggling with the full questionnaire, AI can draft a simplified data request covering only the most critical fields. Getting 60% of the data from a struggling supplier is better than getting nothing while waiting for 100%.

The goal is not to remove humans from supplier relationships — it is to give your team ready-to-send communications and a clear priority list, rather than starting from scratch each time.

What is your current supplier response rate for ESG data requests?

✎

Module 3 — Final Assessment

What is the first step before using AI to process supplier sustainability questionnaires?

Why is the 'confidence level' field important in AI-extracted ESG data?

What is the primary value of running an AI-assisted gap analysis mid-cycle?

How should AI-assisted supplier follow-up be prioritised?