Data Warehouse
A data warehouse (DWH) is a centralised repository optimised for analytical queries across multiple source systems — ERP, CRM, e-commerce, marketing, finance — rather than the transactional patterns of operational databases. Data warehousing has been the foundation of business intelligence since the 1990s, with cloud-native warehouses (Snowflake, Google BigQuery, Amazon Redshift, Databricks) dominating new deployments since the late 2010s.
Architecture and modelling
Two dominant modelling paradigms. Kimball dimensional modelling: facts (measurements like sales, order quantities) plus dimensions (descriptive attributes like time, customer, product) organised in star schemas. Optimised for query performance and business-user readability. Inmon corporate information factory: normalised data warehouse with data marts derived for specific analytical needs. More rigid but cleaner enterprise data structure. Modern cloud-warehouse practice often combines both: raw layer (operational source data preserved), staging layer (cleaned and conformed), presentation layer (dimensional for end-user consumption). Tools like dbt (data build tool) have standardised the SQL-based transformation patterns that build these layers.
Cloud data warehouses
Snowflake — cloud-native, multi-cloud (AWS, Azure, Google), strong DACH presence in mid-market and enterprise. Google BigQuery — serverless, pay-per-query, deep ML and AI integration. Amazon Redshift — AWS-native, more traditional warehouse model. Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse) and Microsoft Fabric — integrated with the Microsoft data stack, common in DACH organisations on Microsoft 365. Databricks — data-lakehouse platform combining warehouse and data-lake capabilities. SAP Datasphere (formerly Data Warehouse Cloud) — SAP-native with direct S/4HANA integration. Open source: Apache Druid, ClickHouse, DuckDB (single-machine analytics). For DACH mid-market starting fresh, Snowflake and Microsoft Fabric are the most-commonly evaluated options.
Data lake versus lakehouse
Adjacent categories worth distinguishing. Data lake: raw-data storage in flexible formats (parquet, JSON, CSV) on object storage (S3, ADLS, GCS), supporting unstructured and semi-structured data. Lower cost than warehouse, less optimised for SQL queries. Data lakehouse: combines lake-style storage with warehouse-style ACID transactions and SQL performance. Apache Iceberg, Delta Lake and Apache Hudi are the open table formats enabling lakehouse architectures; Databricks and Snowflake both support lakehouse patterns. For ERP-centric analytics in DACH mid-market, the classical data-warehouse pattern suits most use cases; lakehouse becomes valuable when unstructured data (logs, IoT, documents) joins the analytics scope.
ERP-to-DWH integration
ERP data flows into the warehouse through ETL or ELT pipelines (see ETL). Common patterns. (1) Replication via CDC: tools like SLT (SAP), Debezium or Fivetran capture every ERP database change and stream to the warehouse with minute-level latency. (2) Scheduled batch extracts: nightly or hourly batch pulls of changed records via OData or vendor APIs. (3) Vendor data hubs: SAP Datasphere, Oracle Autonomous Data Warehouse, Microsoft Dataverse provide ERP-vendor-managed data layers that simplify integration. Mid-market in DACH typically combines tools: Fivetran or Airbyte for SaaS sources, vendor connectors for SAP and Oracle, custom Python for legacy on-premises ERPs. Downstream BI tools (Power BI, Tableau, Qlik) consume the warehouse for analytics and dashboards.
Related Topics
Frequently Asked Questions
Snowflake or Microsoft Fabric for DACH mid-market?
Microsoft Fabric for organisations standardised on Microsoft 365 and Power BI — the bundled licensing and tight integration are compelling. Snowflake for organisations with heterogeneous data sources, multi-cloud strategy, or specific Snowflake-only features (data sharing, marketplace). Both are credible mid-market choices.
Do we need a separate data warehouse if our ERP has BI built in?
For ERP-only analytics, often no. Modern ERP BI (SAP Analytics Cloud embedded in S/4HANA, Microsoft Dataverse plus Power BI for Dynamics 365, Oracle Cloud Reporting) handles common needs. A separate warehouse becomes valuable when analytics span multiple source systems beyond ERP, or when historical data needed for trend analysis exceeds ERP retention limits.
How fresh does ERP data need to be in the warehouse?
For management reporting and trend analysis: daily refresh is typically sufficient. For operational analytics (live inventory dashboards, real-time order pipeline): 5-15 minute refresh via CDC. True real-time (sub-second) is rarely justified for analytical use cases; operational integration handles that need.
