Estimating greenhouse gas emissions with machine learning

With the SEC’s recent announcement of a proposal to require companies to disclose their greenhouse gas emissions (GHG), the sustainable finance industry might soon benefit from a lot more transparency into how much companies are emitting and how well their commitments to cut these emissions are going.

However, despite efforts by large multinationals in developed economies to disclose some of this information, the availability of reported emissions is currently poor and varies considerably between countries and sectors. At the time of publication, for the 2020 fiscal year, Bloomberg found either Scope 1 or Scope 2 emissions reported for just under 4,000 of the 11,800 companies that Bloomberg captures ESG data for, representing a fraction of the global total.

Estimates help financial market participants fill this gap, but how these are modeled impacts the quality of the estimate. Investment firms need GHG emissions data to assess how their investments and portfolios align with sustainability objectives and regulatory requirements. If they underestimate or overestimate the GHG emissions of companies, a material rebalancing of the portfolio may be required, or portfolio emissions reduction targets may be missed.

There are relatively simple approaches for deriving estimates, such as taking industry averages and adjust them with factors such as employees, assets and sales. However, this fails to account for characteristics of a company aside from sector classification and size. Other approaches build on this by adding more data fields to improve performance, but still treat a complex problem with relative simplicity.

At Bloomberg we’re solving for this problem through machine learning, which can identify complex relationships between large numbers of variables in datasets and handle scenarios where key pieces of data are missing. This article will discuss the benefits of applying machine learning techniques to modelling emissions estimates.

Start unlocking ESG opportunities with our new monthly newsletter.

Subscribe now

Handling non-linear relationships

To understand why this approach can be superior, we must first visit some of the pitfalls of modeling GHG emissions. Firstly, there is not necessarily a linear relationship between the fields used in the model (such as revenues, number of employees etc.) and emissions. Equally important is to model not just single relationships, but also how correlation amongst different fields impacts company GHG emissions. The gradient boosted trees model that Bloomberg is using  is capable of learning both individual non-linear relationships, and correlated non-linear relationships between fields or groups of fields and GHG emissions, to provide a robust estimate.

Take a company like Intuit as an example, a financial software company. Between 2015 and 2020, its revenues increased by 83%, its assets by 115% and the number of employees by 38% according to Bloomberg data. Using these fields in a linear model would predict a similarly sized growth in emissions. However, there is a disconnect between these fields and emissions, as the company reported that Scope 1 and 2 emissions combined decreased 82% (2015-2020) due to decreases in energy consumption (down 60% between 2017 and 2020) and the company moving to purchasing electricity from renewable sources (up from 76% to 100% of all electricity purchased between 2018 and 2020). The Bloomberg model has access to such fields and processes around 800 in total. By understanding the correlations between these fields in combination, a more accurate estimator can be created for companies that do not report emissions. A linear model, on the other hand, will struggle to make sense of these contradictory relationships.

Handling missing data

Another potential downside of emissions estimates is sensitivity to availability of key pieces of data. Electricity consumption, for example, is strongly correlated to scope 2 emissions. However, there is also a strong tendency for companies that report emissions to also report electricity consumption metrics, whereas those that don’t report emissions tend not to report electricity consumption metrics. This means that it is easy to fall into the trap of finding a model that works very well for predicting emissions for a company that reports, but loses predictive power for companies that do not report. Machine learning models can find their way around this and detect predictive relationships, making them appropriate to handling applications such as GHG emissions where accounting for missing data is important for strong estimations.

The benefit of strong handling of missing data is that the model can be used to estimate emissions for companies that do not report many types of fields, meaning such a model can help to offer a strong breadth of coverage.

Estimate confidence and distribution

Most GHG emissions estimators only produce one single number, and present only that figure for each company and each year without any supporting datapoints. Conversely, machine learning models such as Bloomberg’s can produce estimates with different levels of certainty, as well as confidence scores.  The confidence score tells our users how well the model can predict for any particular company, with a score of 1 being the lowest confidence and 10 the highest confidence. For the estimates at different levels of certainty, let’s take an example. If the 75th percentile scope 1 estimate for a company is 10,000 metric tonnes CO2 equivalent, this means that the model is 75% sure that the company’s scope emissions 1 are lower than this number. The two concepts link together of course. The higher the confidence score, the smaller the distribution of estimates between the different levels of certainty will be.

There are many benefits for having access to such distributions. Firstly, it makes it possible to apply the precautionary principle, a fundamental EU principle with respect to policies related to the environment amongst others, meaning that in the absence of certainty, caution should be taken. In the example of estimated emissions data, financial market participants can choose an estimate with 75% or 95% percent certainty, showing that reported emissions would likely be lower if the company started reporting tomorrow. Not only does this disadvantage non-reporting companies, but it also incentivizes them to start reporting emissions.

Secondly, more companies are starting to follow the trend of reporting GHG emissions in line with TCFD, SFDR, and other local requirements or even on a voluntary basis. Using estimates with distribution provides portfolio managers with a way to mitigate the risk that company-reported emissions will be very different from the estimates they use. Finally, such estimates can be used for Paris Aligned Benchmark (PAB) Indexes, which specifically require the application of the precautionary principle, meaning that estimates which overestimate are preferred to those which underestimate emissions.

The lack of company-reported emissions is set to remain a major hurdle for financial market participants until regulatory reporting is enforced. In the meantime, machine learning techniques provide more robust estimates than simpler linear techniques. In addition,  datapoints supporting the estimates allow these to be used with more confidence.

About Bloomberg’s greenhouse gas estimates

Bloomberg provides as reported and estimated carbon emissions for over 50,000 global public and private companies going back to 2010. For companies that do not report, Bloomberg has developed a proprietary machine learning model that incorporates more than 800 data points from different databases to estimate Scope 1 and 2 emissions. Bloomberg additionally provides different percentiles in the probability distribution for each estimate to allow investors to decide which estimate to use in their models. Furthermore, Bloomberg provides a confidence score that gives insight into the amount and consistency of data used to produce our estimate.

More information is available here: Bloomberg’s Greenhouse Gas Emissions Estimates Model.

Recommended for you

Request a Demo

Bloomberg quickly and accurately delivers business and financial information, news and insight around the world. Now, let us do that for you.