Data Lakes & Warehouses
While data historians excel at storing time-series sensor data, a digital oilfield produces many other types of data - well reports, geological models, maintenance logs, contracts, and more. Data lakes and data warehouses provide the broader storage infrastructure to handle this variety.
Data Lake vs Data Warehouse
Data Lake
Stores raw data in its original format - structured, semi-structured, or unstructured. Schema is applied when data is read (schema-on-read).
- • Stores everything: sensor data, PDFs, images, logs
- • Flexible and cost-effective for large volumes
- • Ideal for data science and exploration
- • Risk of becoming a "data swamp" without governance
Example: An Azure Data Lake stores 10 TB of raw well test reports, seismic files, and sensor exports for a data science team.
Data Warehouse
Stores cleaned, structured, and organised data optimised for querying and reporting. Schema is applied when data is written (schema-on-write).
- • Highly structured with defined schemas
- • Fast query performance for dashboards
- • Ideal for business reporting and KPIs
- • Requires ETL pipelines to load data
Example: A Snowflake data warehouse holds curated production, deferment, and cost data used by Power BI dashboards.
The Data Lakehouse
The data lakehouse is a newer architecture that combines the flexibility of a data lake with the performance and structure of a data warehouse. Technologies like Databricks Delta Lake and Apache Iceberg enable this by adding ACID transactions and schema enforcement on top of data lake storage.
Many oil and gas companies are adopting the lakehouse pattern because it eliminates the need to maintain separate lake and warehouse systems. Raw sensor data, well reports, and curated production tables all live in the same platform - reducing data duplication and simplifying the architecture.
ETL / ELT Pipelines
Data does not magically appear in a lake or warehouse. ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) pipelines move data from source systems, clean and transform it, and load it into the target storage.
Extract
Pull data from historians, SCADA, ERP, maintenance systems
Transform
Clean, validate, convert units, join related datasets
Load
Write to data lake, warehouse, or lakehouse tables
