Post

Database Materialization: Advantages, Disadvantages, and Practical Use Cases

Introduction

Materialization in databases refers to the process of precomputing and physically storing the results of a query or view, rather than recomputing them each time they are accessed. Instead of executing expensive joins, aggregations, or transformations at query time, the result set is persisted on disk and served directly.

This is a fundamental trade-off: spend storage and write overhead upfront to gain faster reads later.

Two databases that handle materialization in very different but practical ways are PostgreSQL and Snowflake. Understanding both helps you make the right choice depending on your workload.

Core Concepts

Regular Views vs. Materialized Views

A regular view is a saved SQL query. Every time you query it, the underlying SQL executes from scratch.

1
2
3
4
5
6
7
-- Regular view: recomputes on every access
CREATE VIEW monthly_revenue AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS total_revenue
FROM orders
GROUP BY 1;

A materialized view stores the result of that query physically on disk.

1
2
3
4
5
6
7
-- Materialized view: result is stored, not recomputed on access
CREATE MATERIALIZED VIEW monthly_revenue_mv AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS total_revenue
FROM orders
GROUP BY 1;

The distinction matters enormously for large datasets where aggregations over millions of rows take seconds or minutes.

Materialization in PostgreSQL

PostgreSQL supports materialized views natively. Once created, they must be refreshed manually or via a scheduled job.

Creating and Refreshing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Create the materialized view
CREATE MATERIALIZED VIEW product_sales_summary AS
SELECT
    p.product_id,
    p.name,
    p.category,
    COUNT(oi.order_id)   AS total_orders,
    SUM(oi.quantity)     AS total_units_sold,
    SUM(oi.unit_price * oi.quantity) AS total_revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY p.product_id, p.name, p.category;

-- Refresh without blocking reads (PostgreSQL 9.4+)
REFRESH MATERIALIZED VIEW CONCURRENTLY product_sales_summary;

CONCURRENTLY allows reads to continue during the refresh. It requires a unique index on the view.

1
2
-- Required for CONCURRENTLY refresh
CREATE UNIQUE INDEX ON product_sales_summary (product_id);

Querying the Materialized View

1
2
3
4
5
-- Fast: reads from stored data, no recomputation
SELECT category, SUM(total_revenue) AS category_revenue
FROM product_sales_summary
GROUP BY category
ORDER BY category_revenue DESC;

Indexing Materialized Views

You can add indexes to materialized views just like regular tables, further accelerating lookups.

1
2
CREATE INDEX ON product_sales_summary (category);
CREATE INDEX ON product_sales_summary (total_revenue DESC);

Automating Refresh with pg_cron

For regular refresh schedules, use pg_cron (available on managed Postgres services like RDS and Cloud SQL):

1
2
3
4
5
6
-- Refresh every hour at minute 0
SELECT cron.schedule(
    'refresh-product-sales',
    '0 * * * *',
    'REFRESH MATERIALIZED VIEW CONCURRENTLY product_sales_summary'
);

Materialization in Snowflake

Snowflake’s approach to materialization is more automated. Snowflake maintains and refreshes materialized views automatically in the background when the underlying table data changes.

Creating a Materialized View

1
2
3
4
5
6
7
8
9
-- Snowflake: automatic incremental maintenance
CREATE OR REPLACE MATERIALIZED VIEW daily_active_users AS
SELECT
    DATE_TRUNC('day', event_time) AS event_date,
    COUNT(DISTINCT user_id)       AS active_users,
    COUNT(*)                       AS total_events
FROM user_events
WHERE event_type IN ('login', 'purchase', 'search')
GROUP BY 1;

Snowflake handles incremental refresh automatically: only the rows affected by DML operations on the base table are recomputed.

Dynamic Tables (Snowflake’s Modern Alternative)

Snowflake’s Dynamic Tables extend materialization with configurable lag targets and pipeline-aware refresh semantics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE OR REPLACE DYNAMIC TABLE customer_lifetime_value
    TARGET_LAG = '1 hour'
    WAREHOUSE = compute_wh
AS
SELECT
    c.customer_id,
    c.email,
    c.signup_date,
    COUNT(DISTINCT o.order_id)        AS total_orders,
    SUM(o.total_amount)               AS lifetime_value,
    AVG(o.total_amount)               AS avg_order_value,
    MAX(o.order_date)                 AS last_order_date,
    DATEDIFF('day', MAX(o.order_date), CURRENT_DATE) AS days_since_last_order
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.email, c.signup_date;

TARGET_LAG defines the maximum acceptable staleness. Snowflake automatically schedules refreshes to meet that SLA, charging compute credits only when refresh actually runs.

Querying in Snowflake

1
2
3
4
5
6
-- Transparent to the query: optimizer may route to materialized view automatically
SELECT *
FROM customer_lifetime_value
WHERE lifetime_value > 1000
ORDER BY lifetime_value DESC
LIMIT 100;

Snowflake’s query optimizer can also rewrite queries against base tables to use matching materialized views transparently, without the caller needing to reference the view directly.

Advantages of Materialization

1. Dramatically Faster Read Performance

The most significant benefit. Complex aggregations, multi-table joins, and window functions that take minutes on raw data complete in milliseconds from a materialized result.

A query that scans 100 million rows and aggregates them into 1,000 summary rows benefits enormously: reads go from full table scans to small index lookups.

2. Reduced Compute Costs

In Snowflake and other cloud data warehouses billed by compute usage, materialized views and dynamic tables reduce the number of expensive queries executed against large base tables. A dashboard refreshing every 30 seconds against a raw table can be replaced by a single periodic materialized view refresh.

3. Simplified Application Queries

Applications and BI tools query the materialized view directly using simple SELECT statements, instead of embedding complex SQL with multiple joins and aggregations. This improves maintainability and reduces the chance of query mistakes across different clients.

4. Offloads Repeated Computation

When multiple users or services run the same expensive query independently, materialization centralizes that computation into one refresh and serves many readers from the cached result.

5. Indexes on Precomputed Results

In PostgreSQL, you can add arbitrary indexes to materialized views. This enables efficient filtering and sorting on derived columns that would otherwise require full scans of joined tables.

Disadvantages of Materialization

1. Data Staleness

Materialized views hold a snapshot of data at the time of the last refresh. In PostgreSQL, stale data persists until REFRESH MATERIALIZED VIEW is explicitly called. In Snowflake, staleness is bounded by the TARGET_LAG but not eliminated.

For use cases requiring real-time consistency (e.g., inventory levels, account balances), materialization introduces unacceptable lag.

2. Storage Overhead

Materialized views duplicate data. A summary over a 500 GB table may itself consume tens of gigabytes. In cloud environments with per-GB storage costs, this accumulates.

3. Refresh Complexity and Cost

In PostgreSQL, developers must manage refresh scheduling, concurrency, and failure handling themselves. A failed or skipped refresh leaves stale data indefinitely.

In Snowflake, automatic refresh consumes virtual warehouse compute credits on every maintenance cycle, which can be costly for high-frequency updates on large tables.

4. Limited Supported Syntax

Both PostgreSQL and Snowflake impose restrictions on what SQL is allowed inside materialized views.

PostgreSQL does not support:

  • DISTINCT ON
  • ORDER BY at the top level (without aggregation)
  • Certain set operations in some versions

Snowflake materialized views do not support:

  • Joins between multiple tables
  • Window functions (use Dynamic Tables instead)
  • Non-deterministic functions like CURRENT_TIMESTAMP

5. Refresh Contention in PostgreSQL

REFRESH MATERIALIZED VIEW without CONCURRENTLY acquires an exclusive lock, blocking all reads for the duration of the refresh. For large views refreshed frequently, this can cause significant application-level timeouts.

6. Cascading Refresh Dependencies

When materialized views depend on other materialized views, a change to the base table can require a chain of refreshes in the correct order. Managing these dependency graphs manually in PostgreSQL is error-prone.

Practical Use Cases

Use Case 1: Analytics Dashboard Aggregations (Snowflake)

Business intelligence dashboards querying sales, revenue, or user metrics typically do not require second-level freshness. A 15-minute lag is acceptable, and the same aggregation is queried by dozens of users simultaneously.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Dynamic Table powering a sales dashboard
CREATE OR REPLACE DYNAMIC TABLE sales_dashboard_metrics
    TARGET_LAG = '15 minutes'
    WAREHOUSE = reporting_wh
AS
SELECT
    r.region_name,
    p.product_category,
    DATE_TRUNC('day', o.order_date)   AS order_day,
    COUNT(DISTINCT o.order_id)         AS order_count,
    COUNT(DISTINCT o.customer_id)      AS unique_customers,
    SUM(oi.quantity * oi.unit_price)   AS gross_revenue,
    SUM(oi.quantity * oi.unit_price)
        - SUM(oi.quantity * p.cost_price) AS gross_profit
FROM orders o
JOIN order_items oi  ON o.order_id = oi.order_id
JOIN products p      ON oi.product_id = p.product_id
JOIN regions r       ON o.region_id = r.region_id
GROUP BY 1, 2, 3;

Dashboard queries against this table execute in under a second regardless of underlying table size.

Use Case 2: Search and Filter Acceleration (PostgreSQL)

An e-commerce platform needs to filter products by category, price range, and availability in under 100ms. The underlying data requires joining products, inventory, and pricing tables.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Materialized view for product search
CREATE MATERIALIZED VIEW product_search_index AS
SELECT
    p.product_id,
    p.name,
    p.slug,
    p.category,
    p.brand,
    p.description,
    pr.current_price,
    pr.discount_percent,
    i.available_stock,
    (i.available_stock > 0) AS in_stock,
    p.created_at
FROM products p
JOIN pricing pr   ON p.product_id = pr.product_id AND pr.is_active = true
JOIN inventory i  ON p.product_id = i.product_id
WHERE p.is_published = true;

CREATE INDEX ON product_search_index (category, in_stock, current_price);
CREATE INDEX ON product_search_index USING GIN (to_tsvector('english', name || ' ' || description));
CREATE UNIQUE INDEX ON product_search_index (product_id);

Search queries become fast index scans instead of multi-table joins:

1
2
3
4
5
6
7
8
-- Fast product search using the materialized view
SELECT product_id, name, current_price, available_stock
FROM product_search_index
WHERE category = 'electronics'
  AND in_stock = true
  AND current_price BETWEEN 50 AND 300
ORDER BY discount_percent DESC
LIMIT 20;

Refresh runs every 5 minutes via pg_cron, which is acceptable for a product catalog that changes infrequently.

Use Case 3: Pre-Aggregated Time-Series Reporting (PostgreSQL)

Monitoring systems storing millions of metric data points per day need fast rollup queries for daily and weekly views.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Hourly rollups from raw metrics
CREATE MATERIALIZED VIEW metrics_hourly AS
SELECT
    metric_name,
    host,
    DATE_TRUNC('hour', recorded_at) AS hour,
    AVG(value)   AS avg_value,
    MAX(value)   AS max_value,
    MIN(value)   AS min_value,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) AS p95_value,
    COUNT(*)     AS sample_count
FROM raw_metrics
GROUP BY metric_name, host, DATE_TRUNC('hour', recorded_at);

CREATE INDEX ON metrics_hourly (metric_name, host, hour DESC);
CREATE UNIQUE INDEX ON metrics_hourly (metric_name, host, hour);

A weekly summary query then reads from the hourly rollup rather than raw data:

1
2
3
4
5
6
7
8
9
10
-- Week-over-week comparison reading from materialized hourly data
SELECT
    metric_name,
    DATE_TRUNC('week', hour) AS week,
    AVG(avg_value)           AS weekly_avg,
    MAX(max_value)           AS weekly_max
FROM metrics_hourly
WHERE hour >= NOW() - INTERVAL '14 days'
GROUP BY 1, 2
ORDER BY 1, 2;

Use Case 4: Customer Segmentation Pipeline (Snowflake)

A marketing platform needs to segment customers daily based on behavioral data for campaign targeting. The pipeline reads from multiple event streams.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Stage 1: Aggregate raw events
CREATE OR REPLACE DYNAMIC TABLE user_event_summary
    TARGET_LAG = '1 hour'
    WAREHOUSE = pipeline_wh
AS
SELECT
    user_id,
    COUNT_IF(event_type = 'purchase')          AS purchase_count,
    COUNT_IF(event_type = 'page_view')         AS page_view_count,
    SUM(IFF(event_type = 'purchase', amount, 0)) AS total_spent,
    MAX(event_time)                            AS last_seen,
    MIN(event_time)                            AS first_seen,
    DATEDIFF('day', MIN(event_time), MAX(event_time)) AS active_days
FROM user_events
WHERE event_time >= DATEADD('day', -90, CURRENT_DATE)
GROUP BY user_id;

-- Stage 2: Segment classification built on Stage 1
CREATE OR REPLACE DYNAMIC TABLE customer_segments
    TARGET_LAG = '2 hours'
    WAREHOUSE = pipeline_wh
AS
SELECT
    u.user_id,
    u.email,
    s.purchase_count,
    s.total_spent,
    s.last_seen,
    CASE
        WHEN s.purchase_count >= 10 AND s.total_spent >= 500 THEN 'vip'
        WHEN s.purchase_count >= 3  AND s.active_days >= 30   THEN 'loyal'
        WHEN s.last_seen >= DATEADD('day', -7, CURRENT_DATE)  THEN 'active'
        WHEN s.last_seen < DATEADD('day', -30, CURRENT_DATE)  THEN 'at_risk'
        ELSE 'new'
    END AS segment
FROM users u
JOIN user_event_summary s ON u.user_id = s.user_id;

Snowflake automatically manages the refresh order: user_event_summary refreshes before customer_segments, which depends on it.

Use Case 5: Report Caching for Multi-Tenant SaaS (PostgreSQL)

A SaaS application needs per-tenant usage reports that are expensive to compute on demand but only need to be current to the previous day.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Per-tenant daily usage summary
CREATE MATERIALIZED VIEW tenant_daily_usage AS
SELECT
    tenant_id,
    DATE_TRUNC('day', created_at) AS usage_date,
    resource_type,
    SUM(units_consumed)           AS total_units,
    COUNT(DISTINCT user_id)       AS active_users,
    COUNT(*)                      AS total_requests
FROM usage_events
GROUP BY tenant_id, DATE_TRUNC('day', created_at), resource_type;

CREATE INDEX ON tenant_daily_usage (tenant_id, usage_date DESC);
CREATE UNIQUE INDEX ON tenant_daily_usage (tenant_id, usage_date, resource_type);

The report endpoint queries directly:

1
2
3
4
5
6
-- Tenant report: fast index scan, no aggregation at query time
SELECT usage_date, resource_type, total_units, active_users
FROM tenant_daily_usage
WHERE tenant_id = $1
  AND usage_date >= NOW() - INTERVAL '30 days'
ORDER BY usage_date DESC, resource_type;

Choosing Between PostgreSQL and Snowflake for Materialization

FactorPostgreSQLSnowflake
Refresh controlManual / pg_cronAutomatic (TARGET_LAG)
Incremental refreshNot native (full refresh)Native incremental
Join support in viewsFull SQL supportSingle table only (use Dynamic Tables for joins)
Storage costServer storageCloud object storage (low cost)
Compute costCPU on your serverVirtual warehouse credits per refresh
Staleness guaranteeNo bound without schedulingBounded by TARGET_LAG
Dependency managementManualAutomatic (Dynamic Tables)
Best fitOLTP apps, web backendsAnalytics, data pipelines, BI

When Not to Use Materialization

Materialization is not appropriate for every scenario:

  • Real-time data requirements: If accuracy matters at the second level (fraud detection, live inventory), read from the base table directly or use streaming materialization (e.g., Kafka Streams, Flink).
  • Rapidly changing data: High-frequency insert/update workloads make materialized views expensive to maintain and often stale by the time they are refreshed.
  • Simple queries: If the base query runs in under 10ms, the overhead of managing a materialized view is not justified.
  • Highly normalized OLTP schemas: Materialization is most effective in analytical workloads with wide aggregations. OLTP queries on indexed primary keys rarely benefit.

Conclusion

Materialization is a deliberate engineering trade-off: accept storage overhead and controlled data staleness in exchange for predictable, fast query performance. It is one of the most effective performance tools available for analytical and reporting workloads.

PostgreSQL gives you full SQL flexibility and fine-grained index control with manual refresh management. Snowflake automates maintenance and provides incremental refresh and declarative lag guarantees, making it well-suited for data pipeline layers and continuously updated analytical tables.

The right approach depends on your freshness requirements, query patterns, and the volume of concurrent readers. Start by identifying the queries that are slow, repeated, and tolerate some staleness. Those are the best candidates for materialization.

This post is licensed under CC BY 4.0 by the author.