Ground Control to Super-Major Tom: Does Your Organization Have a Multiple Sources of Truth Problem?

At Validere, we work with a lot of companies to help them organize their emissions and operations data. This data can include lab composition, production volumes, as well as emission measurements and estimates.  

There are two primary themes we encounter with our customers:

  1. Disparate data: Data comes from many different sources, in many different formats. This forces staff to embark on a game of telephone in order to find the data they need.
  2. Data lake: Data has been aggregated into one location from multiple sources.

Most often, even with larger companies, we see the first scenario. And while the second scenario may sound ideal, it does present its own set of challenges.

Scenario 1: Disparate data

In the first scenario, the biggest problem companies face is version control. The root of this issue stems from the fact that various departments usually use specific software or perform calculations separate from the original data source to get their work done. 

For example, production accounting may be interested in proper allocation and reconciliation of the books, while the environmental team may need to estimate and report emissions, often as a function of production volumes. Additionally, the marketing team needs to understand composition and volumes to find the best netbacks.  

For each of these scenarios, the data is often manipulated and kept in different silos within each department. Since each department needs the same data, but in different formats, this can lead to a drawn out game of telephone, where each step risks misinterpretation of the data.

Scenario 2: Data lake 

On the surface, the second scenario seems like a solution to the first scenario. Data lakes are created by IT teams, where they unite the different data streams into a single data repository.

However, these solutions are usually implemented without the industry expertise needed to know what data is relevant, when to use unit conversions and how the data is organized for specific tasks.  This unorganized approach results in separate departments pulling the data they need and then manipulating the data to complete their work but rarely sending the information back into the central repository. 

In the end, the data lake scenario has the same problem as the first scenario: version control.

In both scenarios, poor version control is inefficient and costly

In our experience working with client data, we have found that 2–4% of data is incorrectly keyed in and 7% of field data is lost to disorganization. This means almost 10% of the data you are working from is inaccurate or was just completely lost.

These errors can be extremely expensive, with the largest single erroneous measurement we’ve seen leading to a $294,000 dollar impact each month. 

Unnecessary measurements are not the solution, they just add confusion

Some organizations think that the solution to the version control problem is to take more measurements. If operations need a piece of data or the ESG team needs an emissions measurement, taking a measurement just for that purpose will solve the problem. For example, a measurement may be taken at a remote site but since that data is not quickly updated, another piece of equipment is installed to measure the same thing as it reaches a terminal or processing facility. 

However, with this solution, the negatives are twofold — redundant equipment costs time and money to install and operate, but more importantly the two different data sources may not always agree which takes time and energy to reconcile. The time and energy spent harmonizing conflicting data adds stress and uncertainty to all corners of the business from operations, to marketing and emissions reporting.

While more data can be a good thing, a process of understanding the need for multiple data points and for reconciling data is needed in order to gain full value from additional measurements.

The answer: minimize risk with a single source of truth

The best solution is one where data is organized and validated within the same platform, enabling all the functions of a company and eliminating the need for each team to independently manipulate the data. 

A more robust solution is more than just a data lake or repository and serves as a single source of truth where operational insights or emissions mitigation opportunities can be found on the platform transparently and seen across the organization. With a completely unified platform, version control is a problem of the past.

A platform with a powerful data validation aids with the reconciliation of data from different sources, by either flagging equipment that is not taking accurate measurements or describing the relationship between the two measurements. For multiple data sources measuring the same thing, strong validation is a must have.

At Validere, we believe the time is now for data science and engineering to transform the energy industry by eliminating the problems with version control. With cloud technology and the widespread availability of broadband, data that was understandably siloed in remote locations like West Texas or Northern Alberta can now be validated and used across an organization instantly, without costly misinterpretation or mistranscription.  

Is version control a problem for your energy business? Contact us.

Trevor Cross