Why Apache Iceberg is the Future of Data Lakehouses

1The Problems with Traditional Data Lakes

Data lakes promised cheap, scalable storage for all your data. But anyone who's built one knows the reality: they often become "data swamps" — unreliable, hard to maintain, and slow to query.

No ACID Transactions

Concurrent writes corrupt data. Failed jobs leave partial results. No rollback capability.

Schema Chaos

Adding columns breaks readers. No safe way to evolve schemas. Data and metadata drift apart.

Partition Hell

Changing partition schemes requires full data rewrite. Users must know partitions to query efficiently.

Slow Queries

Full file listings on every query. No statistics for pruning. Small files kill performance.

These problems led to the rise of the Data Lakehouse — an architecture that combines the flexibility of data lakes with the reliability of data warehouses. Apache Iceberg is the leading open table format making this possible.

2What is Apache Iceberg?

Apache Iceberg is an open table format for huge analytic datasets. Originally developed at Netflix to solve their petabyte-scale data lake challenges, it's now an Apache top-level project used by Apple, LinkedIn, Adobe, and thousands of companies worldwide.

Table Format vs Storage Format

Iceberg is a table format, not a storage format. Your data is still stored as Parquet, ORC, or Avro files. Iceberg adds a metadata layer that tracks which files belong to a table, their statistics, and how to read them.

Iceberg Architecture

Catalog (AWS Glue / Hive / Nessie)

Points to current metadata location

↓

Metadata Layer

metadata.json

Table schema, partitioning

Manifest List

Snapshot → manifests

Manifest Files

File paths + stats

↓

Data Files (S3)

Parquet / ORC / Avro files organized by partition

3Key Features That Set Iceberg Apart

ACID Transactions

Full serializable isolation. Concurrent writes are safe. Failed jobs don't corrupt data. Atomic commits across thousands of files.

Time Travel

Query any previous version of your data. Audit changes. Reproduce ML experiments. Rollback bad writes instantly.

Schema Evolution

Add, rename, drop, or reorder columns safely. No data rewrite required. Full compatibility tracking.

Hidden Partitioning

Users write queries without knowing partitions. Iceberg handles partition pruning automatically. Change partition schemes without rewriting data.

Fast Query Planning

Column-level statistics in metadata. Prune files without opening them. O(1) file listing instead of O(n) directory scans.

Creating an Iceberg Table

SQLcreate_table.sql

-- Create an Iceberg table in AWS Glue Catalog
CREATE TABLE glue_catalog.analytics.events (
    event_id        STRING,
    user_id         STRING,
    event_type      STRING,
    event_timestamp TIMESTAMP,
    properties      MAP<STRING, STRING>,
    created_at      DATE
)
USING iceberg
PARTITIONED BY (days(event_timestamp), event_type)
LOCATION 's3://my-data-lake/analytics/events'
TBLPROPERTIES (
    'format-version' = '2',
    'write.metadata.compression-codec' = 'gzip'
);

4Time Travel & Snapshot Isolation

Every write to an Iceberg table creates a new snapshot. Snapshots are immutable — they point to a set of data files that represent the table at that moment. This enables powerful capabilities:

Audit & Compliance

Query exactly what the data looked like at any point in time

Reproducibility

Recreate ML training datasets or report outputs exactly

Instant Rollback

Undo bad writes without restoring from backup

SQLtime_travel.sql

-- Query data as it existed at a specific point in time
SELECT * FROM analytics.events
FOR SYSTEM_TIME AS OF TIMESTAMP '2025-01-01 00:00:00';

-- Query data at a specific snapshot
SELECT * FROM analytics.events
FOR SYSTEM_VERSION AS OF 1234567890;

-- View snapshot history
SELECT * FROM analytics.events.snapshots;

-- Rollback to a previous snapshot
CALL glue_catalog.system.rollback_to_snapshot(
    'analytics.events',
    1234567890
);

Snapshot Retention

Snapshots consume storage. Set up a retention policy to expire old snapshots (we recommend keeping 5-7 days for most use cases). Expired snapshots can't be queried via time travel.

5Schema Evolution Without Pain

Traditional data lakes make schema changes painful. Add a column? Update every reader. Rename a field? Risk breaking downstream jobs. Iceberg solves this with full schema evolution:

Operation	Traditional Lake	Iceberg
Add column	Breaks old readers	✓ Safe, null for old data
Rename column	Full rewrite or alias hacks	✓ Metadata only
Drop column	Data still present	✓ Metadata only, files unchanged
Widen type	Full rewrite	✓ Automatic promotion
Reorder columns	Not possible	✓ Metadata only

SQLschema_evolution.sql

-- Add new columns (safe, backward compatible)
ALTER TABLE analytics.events
ADD COLUMNS (
    session_id STRING,
    device_type STRING
);

-- Rename columns (safe with Iceberg v2)
ALTER TABLE analytics.events
RENAME COLUMN properties TO event_properties;

-- Change column types (widening only)
ALTER TABLE analytics.events
ALTER COLUMN user_id TYPE BIGINT;

-- Drop columns (metadata only, no rewrite)
ALTER TABLE analytics.events
DROP COLUMN deprecated_field;

Partition Evolution

Iceberg's killer feature: change your partitioning strategy without rewriting data. Old data stays in old partitions, new data uses new partitions, and queries work seamlessly across both.

SQLpartition_evolution.sql

-- Original partitioning by day
CREATE TABLE events (...)
PARTITIONED BY (days(event_timestamp));

-- Evolve to hourly partitioning for recent data
-- (No data rewrite required!)
ALTER TABLE analytics.events
ADD PARTITION FIELD hours(event_timestamp);

-- Iceberg handles both partition schemes automatically
-- Old data: daily partitions
-- New data: hourly partitions
-- Queries work seamlessly across both

6Iceberg vs Delta Lake vs Hudi

Three open table formats dominate the lakehouse space. Here's how they compare:

Feature	Apache Iceberg	Delta Lake	Apache Hudi
Origin	Netflix	Databricks	Uber
Hidden Partitioning	✓ Yes	✗ No	✗ No
Partition Evolution	✓ Yes	✗ No	✗ No
Schema Evolution	✓ Full	Partial	Partial
Multi-Engine Support	✓ Excellent	Spark-centric	Good
AWS Native Support	✓ Athena, EMR, Glue	Limited	✓ EMR, Glue
Streaming Ingestion	✓ Yes	✓ Yes	✓ Excellent
Vendor Lock-in	None	Databricks features	None

Our Recommendation

For AWS-centric architectures, Iceberg is the clear winner. Native support in Athena, EMR, and Glue, plus features like hidden partitioning and partition evolution, make it the most flexible choice. Choose Hudi if you need record-level streaming updates, or Delta if you're already heavily invested in Databricks.

7AWS Integration (Glue, Athena, EMR)

AWS provides first-class support for Iceberg across its analytics services:

Amazon Athena

Query Iceberg tables with standard SQL. Time travel, schema evolution, MERGE INTO all supported.

Athena v3+

AWS Glue

Native Iceberg connector for ETL jobs. Use Glue Catalog as your Iceberg catalog.

Glue 4.0+

Amazon EMR

Run Spark, Trino, or Presto with Iceberg. Full feature support including maintenance procedures.

EMR 6.5+

Reading Iceberg with PySpark on EMR

Pythonspark_iceberg.py

from pyspark.sql import SparkSession

# Configure Spark with Iceberg
spark = SparkSession.builder \
    .appName("IcebergAnalytics") \
    .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.warehouse", "s3://my-data-lake/") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .getOrCreate()

# Read Iceberg table
df = spark.read.format("iceberg").load("glue_catalog.analytics.events")

# Time travel query
df_historical = spark.read \
    .option("as-of-timestamp", "2025-01-01 00:00:00") \
    .format("iceberg") \
    .load("glue_catalog.analytics.events")

# Incremental read (changes since last snapshot)
df_changes = spark.read \
    .format("iceberg") \
    .option("start-snapshot-id", "123456") \
    .option("end-snapshot-id", "789012") \
    .load("glue_catalog.analytics.events")

MERGE INTO for Upserts

Iceberg supports full DML operations — INSERT, UPDATE, DELETE, and MERGE. This enables CDC patterns and incremental processing:

SQLmerge_into.sql

-- Upsert pattern with MERGE INTO
MERGE INTO analytics.events AS target
USING staging.new_events AS source
ON target.event_id = source.event_id
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *;

-- Delete with conditions
DELETE FROM analytics.events
WHERE event_timestamp < TIMESTAMP '2024-01-01'
  AND event_type = 'debug';

-- Update specific records
UPDATE analytics.events
SET properties = map('status', 'processed')
WHERE event_id = 'evt_123';

8Production Best Practices

Table Maintenance

Iceberg tables require regular maintenance for optimal performance. Set up scheduled jobs for these operations:

SQLmaintenance.sql

-- Expire old snapshots (keep last 5 days)
CALL glue_catalog.system.expire_snapshots(
    table => 'analytics.events',
    older_than => TIMESTAMP '2025-01-10 00:00:00',
    retain_last => 10
);

-- Remove orphan files
CALL glue_catalog.system.remove_orphan_files(
    table => 'analytics.events',
    older_than => TIMESTAMP '2025-01-01 00:00:00'
);

-- Compact small files
CALL glue_catalog.system.rewrite_data_files(
    table => 'analytics.events',
    options => map(
        'target-file-size-bytes', '134217728',  -- 128MB
        'min-input-files', '5'
    )
);

-- Rewrite manifests for faster planning
CALL glue_catalog.system.rewrite_manifests(
    'analytics.events'
);

Recommended Maintenance Schedule

Operation	Frequency	Purpose
Expire Snapshots	Daily	Remove old metadata, enable file cleanup
Remove Orphan Files	Weekly	Delete unreferenced data files
Compact Data Files	Daily/Weekly	Merge small files, improve query performance
Rewrite Manifests	Weekly	Optimize query planning

Additional Best Practices

Target 128MB-512MB file sizes for optimal query performance
Use sort orders to cluster related data and improve compression
Enable write.metadata.delete-after-commit.enabled for automatic cleanup
Monitor table size growth and adjust partition granularity as needed
Use branches/tags for testing schema changes before production
Set up CloudWatch alarms for maintenance job failures

Conclusion

Apache Iceberg transforms data lakes from fragile file collections into reliable, feature-rich data platforms. With ACID transactions, time travel, schema evolution, and hidden partitioning, it brings data warehouse reliability to data lake economics.

For AWS users, Iceberg is now the recommended table format — native support in Athena, Glue, and EMR means you can adopt it without additional infrastructure. Start with a single table, prove the value, then migrate your lakehouse.

Ready to Build Your Data Lakehouse?

PATHSDATA specializes in modern data platforms on AWS. Let us help you design and implement your Iceberg-based lakehouse.

Schedule a Consultation Data Engineering Services