The augmented medallion architecture - Bronze Layer

"Bronze" layer

Published on : April 15, 2025

|   Lastly edited on : April 15, 2025

|   5 minutes read

Lirav DUVSHANI

Lirav DUVSHANI


The Bronze layer - Where all the source data is stored

Bronze Overview

Purpose of the layer

The main purpose of this layer is to provide a similar structure for all data, whatever the data source, in a similar format enabling data analysis on raw data. While also being the source for all the data pipelines to be built in the Silver and Gold layers.


The secondary purposes of this layer are to :

  • Complete data retention : Ensure the complete history of the data is available
  • Reproductibility for Silver and Gold layers : With the complete data retention, any change in rules in the Silver and Gold layers can easily be reproduced in the past, thus allowing reprocessing of the data based on the Bronze layer

Although this layer should be complete, it might be required to anonymize the data or to keep only a specific scope of data.

This should be done in the Bronze data layer to ensure no data should be present in the platform that should not be.


Characteristics of the layer

  • Storage should be immutable
    • The data should never be deleted, and the load should be append-only, to preserve the complete history of the data
  • Allowed Data Structures
    • Structured
    • Semi-Structured (JSON, XML, YAML, ...)
    • Unstructured (text files, images, videos)
  • Granularity of data
    • The data should be detailed, at the same level of granularity as the source data
  • Useful metadata information
    • The extraction timestamp
    • The load in bronze layer timestamp (also called ingestion timestamp)
    • (if available) The last update of records in the source system

Good practices

In this layer, it is a good practice to structure this layer in a standard way for all types of data structures.

Also, tracking as much metadata information as possible is very beneficial for data analysis purposes and for monitoring reporting. Such information are the source extraction timestamp or the ingestion timestamp.

Examples

Structured data

Structure data can be stored as :

  • Inside an object storage
    • As a file format for flatten data (like csv, parquet or avro)
    • As a compressed file containing flatten data (like gzip, zip or any other compressed format)
  • Inside a database
    • Inside a table

The data should be structured to enable easy access to the metadata information, like timestamp of load in bronze layer


Considering a bronze layer in an object storage, the files should be structured as follows :


  • Sample csv storage
    • 2025-01-02T03:55:02.728Z
      • data.csv
    • 2025-01-03T03:52:03.127Z
      • data.csv
    • 2025-01-04T04:02:42.009Z
      • data.csv
  • Sample gzip storage
    • 2025-01-02T12:55:02.728Z
      • data.gzip
    • 2025-01-03T02:52:03.127Z
      • data.gzip
    • 2025-01-04T08:02:42.009Z
      • data.gzip

Considering a bronze layer in a database, the data should be structured as follows :


Customer IDCustomer NameIngestion Timestamp
1Customer A2025-01-02T03:55:02.728Z
2Customer B2025-01-02T03:55:02.728Z
3Customer C2025-01-02T03:55:02.728Z
1Customer A. Corp2025-01-03T03:52:03.127Z
3Customer C. Corp2025-01-04T04:02:42.009Z

Semi-structured data

Semi-Structure data can be stored as :

  • Inside an object storage
    • As a file format for flatten data (like json, yaml, ...)
    • As a compressed file containing flatten data (like gzip, zip or any other compressed format)
  • Inside a database
    • Inside a table

The data should be structured to enable easy access to the metadata information, like timestamp of load in bronze layer


Considering a bronze layer in an object storage, the files should be structured as follows :


  • Sample json storage
    • 2025-01-02T03:55:02.728Z
      • data.json
    • 2025-01-03T03:52:03.127Z
      • data.json
    • 2025-01-04T04:02:42.009Z
      • data.json
  • Sample gzip storage
    • 2025-01-02T12:55:02.728Z
      • data.gzip
    • 2025-01-03T02:52:03.127Z
      • data.gzip
    • 2025-01-04T08:02:42.009Z
      • data.gzip

Considering a bronze layer in a database, the data should be structured as follows :


Customer IDCustomer NameCustomer TagsIngestion Timestamp
1Customer A{"sector": "Industry", "earnings_range": ">100M"}2025-01-02T03:55:02.728Z
2Customer B{"sector": "IT", "earnings_range": "10-100M"}2025-01-02T03:55:02.728Z
3Customer C{"sector": "Retail", "earnings_range": "<10M"}2025-01-02T03:55:02.728Z
1Customer A{"sector": "Industry", "earnings_range": ">100M"}2025-01-03T03:52:03.127Z
3Customer C{"sector": "Retail", "earnings_range": "<100M"
, "billing_currency": "EUR"}
2025-01-04T04:02:42.009Z

Non structured data

Non-structured data can be stored as :

  • Inside an object storage
    • As a file
    • As a compressed file

Storage for non-structured data can be simpler than for structured and semi-structured data, as this data is mostly loaded into this bronze layer for storage and direct usage.




This article is part of a series of article describing the augmented medallion architecture.