Data Quality Gate at Sopht
How to empower data stewards with a dedicated data quality reporting
Published on : May 27, 2025
| Lastly edited on : May 27, 2025
| 7 minutes read

Lirav DUVSHANI
Series : How to build a Scalable Open Source Data Platform - Sopht - Project Plan
Sopht Data Platform Implementation - Project Plan
Sopht Data Platform Implementation - New Architecture
Kestra's success story at Sopht
How to ensure quality in the development of data pipelines
How to empower data stewards with a dedicated data quality reporting
About Sopht
Sopht is a French start-up building a unique Green ITOps data-driven solution to lead environmental and financial performance with automated decarbonization recommendations & guided actionability.
The context
Sopht was designing a new data architecture to ensure the scalability of their data platform. To know more about the complete project, you can have a look at our dedicated article.
After the implementation of the new data architecture, the data quality was at the top of the priorities.
When Data Analysts spend more time validating data rather than analyzing it
Data was present and Data Analysts were spending lots of time on day-to-day data validation, whether due to poor data quality in the source data provided by the customer, issues in the internal calculation layer, or new cases to take into account in the different internal calculations.
All these issues, even when linked to poor data quality from the source data, decrease the trust from users on your data application. Having the ability to explain in an automated way the various issues that were linked to the data
In this context and with the aim of streamlining data quality checks and enabling Data Analysts to focus on higher-value tasks, a new data quality reporting to empower data stewards was built.
A Data Quality Gate for the Data Stewards
The goal of the Data Quality Gate was to give data stewards, product teams and engineers a new tool to detect, assign and track data quality issues overtime.
Behind this goal The Data Quality Gate was to meet the following requirements :
- Detection of issues in data quality : Identify anomalies, inconsistencies, missing values, and other data integrity issues.
- UI to provide the automated detection : All data quality checks should be available through a visualisation to provide visual actionable insights
- Assignment of issues to appropriate teams : Issues could originate from various teams and thus a proper assignment should be planned
- Have customer specific scopes : Ensure the independence of each customer's data and flow execution.
- Track the evolution of quality KPIs : Enable the monitoring of data quality with specific KPI provide a validity score on all the checks
- Enable export of lowest granularity of the data : Enabling deep dives into row-level issues
- Integrate in the existing data platform : The solution should integrate seamlessly with the existing data infrastructure and tools.
How was the Data Quality Gate made pertinent and easy to use ?
Detailed view & Assignment to responsible teams
The data stewards require the detailed view to analyze the issues detected by the automated processes.
Here below is a sample view from the UI where the list of manufacturer that are unknown by the internal referential, which require the Product team to check.
In this example, we can see that on the second column, we have many different types of issues :
- "OKI DATA" seems to be a manufacturer, thus it might be missing inside the internal referential
- "manufacturer not shared by client" is clearly a wrong manufacturer name, and should be assigned to the "Customer" team
- "Z" is also probably a wrong manufacturer name
Data Freshness report
To ensure the data is available, a view was provided to identify in one glance what processes were not executed correctly, e.g. all the processes with more than 1 day since last update.
Data Quality Health Tracking - Validity Score
Sopht data platform is intended for differents customers, and each have different level of quality on their data. Having a new to know the level of quality of the data per customer, enables an overall data quality comparison.
We created the notion of a validity score to determine the level of quality based on the checks done. This was applicable only to some checks
On these checks, we applied a status to each calculation, which enabled us to determine a standard score out of 100 (100 meaning the entire data is perfectly "clean").
Also, specific to some data quality checks, it was pertinent to track the improvement of the data quality based on this data quality health KPI.
Below, you can see the evolution of the validity score with the line in green and also the evolution of customer warnings detected (in the orange histogram).
We can see that about 1% of the data provided by customer is considered in error starting on January, 9th 2025. And that the validity score actually is very stable after activation of the data quality check
How it works : The architecture behind the Data Quality Gate
To match the medallion architecture put in place and the usage of PostgreSQL for Gold and Platinum layers, the following choices were made :
- The user interface is developed using the Python Streamlit library
- The data storage is in PostgreSQL within a dedicated schema named "data_quality"
- The data pipelines to execute the verifications is implemented within the Python data pipeline project
- The scheduling of execution is managed by Kestra.
Adoption & Results
The DQG wasn’t just a tool — it drove process and culture change within Sopht:
- A weekly product review meeting was created to assess open data quality issues and assign responsibility.
- Data stewards were designated by domain, creating ownership and accountability.
- For every new client onboarded, automated DQG checks provided early insight into potential data issues — preventing future escalations.
If you want your data platform to scale with confidence, don’t wait for users to flag broken data. Build a system that detects, explains, and tracks data quality from day one.
At Sopht, the Data Quality Gate made that shift possible — and it’s now a cornerstone of the data operations.
This article is part of a series showcasing the design and implementation of a scalable open source Data Platform for Sopht, a French Green ITOps start-up.