Bloomberg’s Greenhouse Gas Emissions Estimates model: A summary of challenges and modeling solutions
Governments, citizens, and companies around the world are increasingly taking action to reduce greenhouse gas (GHG) emissions. For investors, monitoring the GHG emissions of their portfolio companies is becoming an important part of the investment process. However, the availability of reported GHG emissions data varies tremendously across countries and business sectors, and many companies do not report their emissions at all.
In order to bridge this gap, Bloomberg developed a machine learning-based model to estimate the GHG emissions of companies. The Bloomberg GHG Model estimates direct (scope 1) and indirect (scope 2 and scope 3) emissions for companies with a sufficient amount of available data.
Reported Carbon Data 2010-2020

What companies are covered by the scope 1 & 2 model?
The total number of companies covered since 2010 is over 50,000 companies globally. 39,000 of them are publicly traded companies and the remaining 11,000 companies are private. The following tables show coverage broken down by index, region and country.
Scope 1 & 2 coverage per index

What companies are covered by the scope 3 model?
The coverage for scope 3 GHG emissions estimates is over 4,000 companies globally. Around 90% of those companies are publicly listed and the rest are private companies.Â
Scope 3 coverage per index

How did we decide on the right model to estimate GHG emissions?
Estimating the carbon footprint of companies is a complex task in itself; additionally, the data required to perform the estimation is noisy and often missing for the companies that must be estimated. Linear models offer a high degree of explainability, but they struggle when the underlying data contains interdependent relationships, missing values and categorical information — precisely the issues faced when producing GHG estimates. More intricate machine learning models, such as regression trees, can naturally learn complex relationships in the data, handle missing values, process categorical data, and model the inherent noise of GHG emissions.
What data goes into the model?
The quality of GHG emissions estimates greatly depends on the quality of the data being used to generate them, and this is where Bloomberg excels. The Bloomberg GHG model uses multiple datasets, such as company location; size; and financial, environmental, social and governance (ESG) data; the breakdown of revenue by industry sectors; and industry-specific company data. Examples of industry specific data are the energy source (e.g., fossil fuel, solar, wind) used by utilities to produce electricity, or production data by cement, steel and oil & gas companies. In total, the model leverages over 800 individual features.
What does the model output?
The model produces estimates for scope 1 and 2 emissions for all industries and scope 3 for the oil & gas and mining sectors. Every estimate will have a unique distribution based on comparable companies. This allows users to select different percentiles in the distribution and use a more aggressive or conservative estimate than the one provided by the mean of the distribution.
Another element of the Bloomberg solution is the GHG Confidence Score, which is a measure of the depth and relevance of the data points available for the calculation of the greenhouse gas emission estimate for a particular company. The GHG Confidence Score is based on comparing the available data points for a given company and the most relevant data features for all companies in that same industry.
How does the model come up with its estimates?
We train the model to learn the relationship between the data features of a company and the distribution of GHG emissions for companies with similar sets of features. The model training consists of applying a number of machine learning techniques, which are able to handle the complexity and challenges found in the data, to generate distributions. Finally, the model is able to apply these learned relationships to other companies.
Scope 3 model
At present, coverage of scope 3 is limited to oil & gas and mining companies at present, but coverage will increase as new industries are added in the near future.
The scope 3 model for these companies combines a bottom-up model with a top-down machine learning model.
The bottom-up model uses companies’ sales and production numbers on oil, gas, natural gas liquids, coal, iron ore, and more alongside carbon emission factors, i.e., the amount of CO2 equivalent emitted per unit of product. It then calculates the indirect emissions produced when using or processing those products. The top-down machine learning model sits on top of the bottom-up model and estimates carbon emissions by learning the relationship between calculated scope 3 emissions, revenue per industry and other key factors.
Using sales and production metrics for these two sectors works well because the most significant contribution to scope 3 emissions for oil & gas and mining companies comes from the downstream processing and use of their products.