Home > Resources > Data Architecture Medallion

The augmented medallion architecture - Silver Layer

"Silver" layer

Published on : April 15, 2025

| Lastly edited on : April 15, 2025

| 11 minutes read

Lirav DUVSHANI

Series : The augmented medallion architecture

The Silver layer - Where the data is cleaned/curated

Silver Overview

Purpose of the layer

The main purpose of this layer is to provide standardized cleaned and structured data, optimized for analytical use.

The secondary purposes of this layer are to :

Schema enforcement : Enforce schema to ensure consistency with expected format
Deduplication & Filtering : Ensure data is accurate and relevant for analytical use

Characteristics of the layer

Storage should be optimized for analytical use
- The data should be stored in a query-friendly structures, including primary key definition, partitionning and indexing
Allowed Data Structures
- Structured, can be converted from semi-structured and unstructured data
Granularity of data
- The data should be detailed, at the same level of granularity as the source data
Useful metadata information
- All information available in bronze layer
  - The extraction timestamp
  - The load in bronze layer timestamp (also called ingestion timestamp)
  - (if available) The last update of records in the source system
- The load in silver layer timestamp

Good practices

In this layer, the transformation applied are mostly cleaning and technical formatting of data (like semi-structured data flattening). For all these transformations, it is very useful to keep inside the Silver layer structure both the original and cleaned data.

This ensures that during data analysis, not only the result of cleaning is available but also the original value prior to cleaning.

As in all the other layers, tracking as much metadata information as possible is very beneficial for data analysis purposes and for monitoring reporting. Among these metadata information, the most important ones would be the source extraction timestamp and the ingestion timestamp.

Standard transformations applied in this layer

Structure-level Selection and transformation
- Data flattening : Transform semi-structured data in a more user and query friendly format
- Last Available Data selection : Selection of the last version of any record
Column-level transformations
- Cleaning of the source data
  - Technical cleaning : Empty values, which should not be allowed
  - Functional cleaning : Incoherent values should be cleaned, like invalid dates or any type of incoherent entries
- Standardization of referentials
  - Transposition of source data into custom referential : When multiple sources are present

Transformation Category	Transformation	Use Case	Before Cleaning	After Cleaning
Technical Cleaning	Empty values	Fill empty metric values with 0	NULL 1	0 1
Functional Cleaning	Valid Date	Default to NULL if date not coherent	2024-13-13 2024-02-12	NULL 2024-02-12
Standardisation of referential	Unify country names into code	Transpose the country names to ISO 3166 alpha-2	France Fr. Germ. GERMANY	FR FR GE GE
Data flattening	Empty values	Fill empty metric values with 0	{"sector": "Retail", "earnings_range": "<10M"}	Into 2 columns sector = Retail earnings_range = <10M

Transformation Examples

All the examples are provided considering a storage in a database, e.g. in tables.

Data Flattening

Flattening of a field can be done in multiple way :

Stored inside a single table
Stored in multiple tables

Flattening inside a single table

Here is an example of a customer table with a key named "Customer ID".

Bronze Table - Customer

Customer ID	Customer Name	Customer Tags	Ingestion Timestamp
1	Customer A	{"sector": "Industry", "earnings_range": ">100M"}	2025-01-02T03:55:02.728Z
2	Customer B	{"sector": "IT", "earnings_range": "10-100M"}	2025-01-02T03:55:02.728Z
3	Customer C	{"sector": "Retail", "earnings_range": "<10M"}	2025-01-02T03:55:02.728Z
1	Customer A	{"sector": "Industry", "earnings_range": ">100M"}	2025-01-03T03:52:03.127Z
3	Customer C	{"sector": "Retail", "earnings_range": "<100M" , "billing_currency": "EUR"}	2025-01-04T04:02:42.009Z

Silver Table - Customer

Customer ID	Customer Name	Sector	Earnings Range	Billing Currency	Ingestion Timestamp	Silver Load Timestamp
1	Customer A	Industry	>100M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
2	Customer B	IT	10-100M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
3	Customer C	Retail	<10M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
1	Customer A	Industry	>100M	∅	2025-01-03T03:52:03.127Z	2025-01-05T05:12:02.000Z
3	Customer C	Retail	<100M	EUR	2025-01-04T04:02:42.009Z	2025-01-05T05:12:02.000Z

All the tags have been transformed as columns providing information about a customer.

Flattening in multiple tables

Here is an example of a customer table with a key named "Customer ID".

Bronze Table - Customer

Customer ID	Customer Name	Customer Tags	Ingestion Timestamp
1	Customer A	{"sector": "Industry", "earnings_range": ">100M"}	2025-01-02T03:55:02.728Z
2	Customer B	{"sector": "IT", "earnings_range": "10-100M"}	2025-01-02T03:55:02.728Z
3	Customer C	{"sector": "Retail", "earnings_range": "<10M"}	2025-01-02T03:55:02.728Z
1	Customer A	{"sector": "Industry", "earnings_range": ">100M"}	2025-01-03T03:52:03.127Z
3	Customer C	{"sector": "Retail", "earnings_range": "<100M" , "billing_currency": "EUR"}	2025-01-04T04:02:42.009Z

Silver Table - Customer (Main)

Customer ID	Customer Name	Ingestion Timestamp	Silver Load Timestamp
1	Customer A	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
2	Customer B	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
3	Customer C	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
1	Customer A	2025-01-03T03:52:03.127Z	2025-01-05T05:12:02.000Z
3	Customer C	2025-01-04T04:02:42.009Z	2025-01-05T05:12:02.000Z

Silver Table - Customer (Tags)

Customer ID	Sector	Earnings Range	Billing Currency	Ingestion Timestamp	Silver Load Timestamp
1	Industry	>100M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
2	IT	10-100M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
3	Retail	<10M	∅	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
1	Industry	>100M	∅	2025-01-03T03:52:03.127Z	2025-01-05T05:12:02.000Z
3	Retail	<100M	EUR	2025-01-04T04:02:42.009Z	2025-01-05T05:12:02.000Z

All the tags have been transformed as columns providing information about a customer.

Last Available Data selection

The last available data selection transformation provides a Silver table which contains only the last available relevant information. Thus, the Silver table could be identical to the table in the source system, with a single record per key.

Here is an example of a product table with a key named "Product ID".

Bronze Table - Customer

Product ID	Product Name	Product Category	Ingestion Timestamp
1	Product A	Wood	2025-01-02T03:55:02.728Z
2	Product B	IT Service	2025-01-02T03:55:02.728Z
3	Product C	Clothes	2025-01-02T03:55:02.728Z
1	Product A	Floors - Wood	2025-01-03T03:52:03.127Z
3	Product C	Women's clothes	2025-01-04T04:02:42.009Z

Silver Table - Customer

Product ID	Product Name	Product Category	Ingestion Timestamp	Silver Load Timestamp
2	Product B	IT Service	2025-01-02T03:55:02.728Z	2025-01-05T05:12:02.000Z
1	Product A	Floors - Wood	2025-01-03T03:52:03.127Z	2025-01-05T05:12:02.000Z
3	Product C	Women's clothes	2025-01-04T04:02:42.009Z	2025-01-05T05:12:02.000Z

Only the last record for "Product ID" 1 and 3, based on the ingestion timestamp, have been selected and loaded into the silver table

Data Cleaning

Whatever the format of the source data in Bronze, in the Silver layer, the data should be clean and ready to used to analysis, reporting and any kind of analysis.

Here is an example of a purchase order table with a key named "Purchase Order ID".

This presents all the steps from Bronze, including the selection of the latest available data.

Bronze Table - Order

Purchase Order ID	Customer Name	Product Name	Order Status	Order Unit Of Measure	Order Quantity	Billing Currency	Billed Amount	Ingestion Timestamp
PO-1	Customer 1	Product A	Invoiced	PC	3	EUR	10,02	2025-01-02T03:55:02.728Z
PO-2	Customer 1	Product B	Ordered	PC	1002	USD	35,99	2025-01-02T03:55:02.728Z
PO-3	Customer 2	Product C	Planned	∅	30	GBP	20,00	2025-01-02T03:55:02.728Z
PO-4	Customer 1	Product A	Planned	∅	∅	EUR	∅	2025-01-02T03:55:02.728Z
PO-2	Customer 2	Product C	Received	PC	1002	USD	35,99	2025-01-04T04:02:42.009Z

In the load of the Silver table, here we'll consider that only the last available data will be stored in Silver. Thus, a first step will only select the latest information for each order.

Silver Table - Orders (Latest Data) before cleaning

Purchase Order ID	Customer Name	Product Name	Order Status	Order Unit Of Measure	Order Quantity	Billing Currency	Billed Amount	Ingestion Timestamp
PO-1	Customer 1	Product A	Invoiced	PC	3	EUR	10,02	2025-01-02T03:55:02.728Z
PO-2	Customer 2	Product C	Received	PC	1002	USD	35,99	2025-01-04T04:02:42.009Z
PO-3	Customer 2	Product C	Planned	∅	30	GBP	20,00	2025-01-02T03:55:02.728Z
PO-4	Customer 1	Product A	Planned	∅	∅	EUR	∅	2025-01-02T03:55:02.728Z

Silver Table - Orders (Latest Data) after cleaning

The cleaning of the orders presented here is as follows :

All Unit of measure and currency columns should be defaulted to #, for analytical purposes
All quantity and amounts should be defaulted to 0, for analytical purposes
The order quantity is defaulted when no order unit of measure available

Purchase Order ID	Customer Name	Product Name	Order Status	Order Unit Of Measure	Order Quantity	Billing Currency	Billed Amount
PO-1	Customer 1	Product A	Invoiced	PC	3	EUR	10,02
PO-2	Customer 2	Product C	Received	PC	1002	USD	35,99
PO-3	Customer 2	Product C	Planned	# (defaulted from ∅)	0 (Cleaned from 30)	GBP	20,00
PO-4	Customer 1	Product A	Planned	# (defaulted from ∅)	0 (defaulted from ∅)	EUR	0 (defaulted from ∅)

All the cleaning done here can also be linked to the data quality layer to provide direct feedback to the data stewards and owners

Referential Transposition

The source data can be very heteregenous even on similar concepts where a similar referential data (like product, customer, ...) is maintained in multiple operational tools.

In the Silver layer, data should be aligned to enable usage of the data regardless of its source.

The goal here is to provide standard axis of analysis which match a general definition of an axis for all the data stored in Silver.

Here is an example of purchase orders linked to a delivery country which is provided not normalized at all in the data source. A specific algorithm is thus implemented to clean this data into a more normalized referential.

This presents all the steps from Bronze, including the selection of the latest available data.

Bronze Table - Purchase Orders

Purchase Order ID	Delivery Country	Ingestion Timestamp
PO-1	FR	2025-01-02T03:55:02.728Z
PO-2	FranCe	2025-01-02T03:55:02.728Z
PO-3	FR.	2025-01-02T03:55:02.728Z
PO-4	GE	2025-01-02T03:55:02.728Z

Parameter Table - Country transposition table

Depending on the use case, either an intelligent algorithm can be develop to clean the data, or a transposition can be coded to clean the data from the known delivery country filled in the source to the standard referential.

In this example, we consider using a parameter table as follows :

Delivery Country (original)	Delivery Country (cleaned)
FR	France
FranCe	France
FR.	France
GE	Germany

Silver Table - Orders after cleaning

Purchase Order ID	Delivery Country (original)	Delivery Country (cleaned)	Ingestion Timestamp
PO-1	FR	France	2025-01-02T03:55:02.728Z
PO-2	FranCe	France	2025-01-02T03:55:02.728Z
PO-3	FR.	France	2025-01-02T03:55:02.728Z
PO-4	GE	Germany	2025-01-02T03:55:02.728Z

All the cleaning done here can also be linked to the data quality layer to provide direct feedback to the data stewards and owners

This article is part of a series of article describing the augmented medallion architecture.