Why Apache Iceberg is the Future of Data Lakehouses
Discover how Apache Iceberg solves the challenges of traditional data lakes with ACID transactions, schema evolution, time travel, and hidden partitioning.
1The Problems with Traditional Data Lakes
Data lakes promised cheap, scalable storage for all your data. But anyone who's built one knows the reality: they often become "data swamps" — unreliable, hard to maintain, and slow to query.
No ACID Transactions
Concurrent writes corrupt data. Failed jobs leave partial results. No rollback capability.
Schema Chaos
Adding columns breaks readers. No safe way to evolve schemas. Data and metadata drift apart.
Partition Hell
Changing partition schemes requires full data rewrite. Users must know partitions to query efficiently.
Slow Queries
Full file listings on every query. No statistics for pruning. Small files kill performance.
These problems led to the rise of the Data Lakehouse — an architecture that combines the flexibility of data lakes with the reliability of data warehouses. Apache Iceberg is the leading open table format making this possible.
2What is Apache Iceberg?
Apache Iceberg is an open table format for huge analytic datasets. Originally developed at Netflix to solve their petabyte-scale data lake challenges, it's now an Apache top-level project used by Apple, LinkedIn, Adobe, and thousands of companies worldwide.
Table Format vs Storage Format
Iceberg is a table format, not a storage format. Your data is still stored as Parquet, ORC, or Avro files. Iceberg adds a metadata layer that tracks which files belong to a table, their statistics, and how to read them.
Iceberg Architecture
Catalog (AWS Glue / Hive / Nessie)
Points to current metadata location
Metadata Layer
metadata.json
Table schema, partitioning
Manifest List
Snapshot → manifests
Manifest Files
File paths + stats
Data Files (S3)
Parquet / ORC / Avro files organized by partition
3Key Features That Set Iceberg Apart
ACID Transactions
Full serializable isolation. Concurrent writes are safe. Failed jobs don't corrupt data. Atomic commits across thousands of files.
Time Travel
Query any previous version of your data. Audit changes. Reproduce ML experiments. Rollback bad writes instantly.
Schema Evolution
Add, rename, drop, or reorder columns safely. No data rewrite required. Full compatibility tracking.
Hidden Partitioning
Users write queries without knowing partitions. Iceberg handles partition pruning automatically. Change partition schemes without rewriting data.
Fast Query Planning
Column-level statistics in metadata. Prune files without opening them. O(1) file listing instead of O(n) directory scans.
Creating an Iceberg Table
-- Create an Iceberg table in AWS Glue Catalog
CREATE TABLE glue_catalog.analytics.events (
event_id STRING,
user_id STRING,
event_type STRING,
event_timestamp TIMESTAMP,
properties MAP<STRING, STRING>,
created_at DATE
)
USING iceberg
PARTITIONED BY (days(event_timestamp), event_type)
LOCATION 's3://my-data-lake/analytics/events'
TBLPROPERTIES (
'format-version' = '2',
'write.metadata.compression-codec' = 'gzip'
);4Time Travel & Snapshot Isolation
Every write to an Iceberg table creates a new snapshot. Snapshots are immutable — they point to a set of data files that represent the table at that moment. This enables powerful capabilities:
Audit & Compliance
Query exactly what the data looked like at any point in time
Reproducibility
Recreate ML training datasets or report outputs exactly
Instant Rollback
Undo bad writes without restoring from backup
-- Query data as it existed at a specific point in time
SELECT * FROM analytics.events
FOR SYSTEM_TIME AS OF TIMESTAMP '2025-01-01 00:00:00';
-- Query data at a specific snapshot
SELECT * FROM analytics.events
FOR SYSTEM_VERSION AS OF 1234567890;
-- View snapshot history
SELECT * FROM analytics.events.snapshots;
-- Rollback to a previous snapshot
CALL glue_catalog.system.rollback_to_snapshot(
'analytics.events',
1234567890
);Snapshot Retention
Snapshots consume storage. Set up a retention policy to expire old snapshots (we recommend keeping 5-7 days for most use cases). Expired snapshots can't be queried via time travel.
5Schema Evolution Without Pain
Traditional data lakes make schema changes painful. Add a column? Update every reader. Rename a field? Risk breaking downstream jobs. Iceberg solves this with full schema evolution:
| Operation | Traditional Lake | Iceberg |
|---|---|---|
| Add column | Breaks old readers | ✓ Safe, null for old data |
| Rename column | Full rewrite or alias hacks | ✓ Metadata only |
| Drop column | Data still present | ✓ Metadata only, files unchanged |
| Widen type | Full rewrite | ✓ Automatic promotion |
| Reorder columns | Not possible | ✓ Metadata only |
-- Add new columns (safe, backward compatible)
ALTER TABLE analytics.events
ADD COLUMNS (
session_id STRING,
device_type STRING
);
-- Rename columns (safe with Iceberg v2)
ALTER TABLE analytics.events
RENAME COLUMN properties TO event_properties;
-- Change column types (widening only)
ALTER TABLE analytics.events
ALTER COLUMN user_id TYPE BIGINT;
-- Drop columns (metadata only, no rewrite)
ALTER TABLE analytics.events
DROP COLUMN deprecated_field;Partition Evolution
Iceberg's killer feature: change your partitioning strategy without rewriting data. Old data stays in old partitions, new data uses new partitions, and queries work seamlessly across both.
-- Original partitioning by day
CREATE TABLE events (...)
PARTITIONED BY (days(event_timestamp));
-- Evolve to hourly partitioning for recent data
-- (No data rewrite required!)
ALTER TABLE analytics.events
ADD PARTITION FIELD hours(event_timestamp);
-- Iceberg handles both partition schemes automatically
-- Old data: daily partitions
-- New data: hourly partitions
-- Queries work seamlessly across both6Iceberg vs Delta Lake vs Hudi
Three open table formats dominate the lakehouse space. Here's how they compare:
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
|---|---|---|---|
| Origin | Netflix | Databricks | Uber |
| Hidden Partitioning | ✓ Yes | ✗ No | ✗ No |
| Partition Evolution | ✓ Yes | ✗ No | ✗ No |
| Schema Evolution | ✓ Full | Partial | Partial |
| Multi-Engine Support | ✓ Excellent | Spark-centric | Good |
| AWS Native Support | ✓ Athena, EMR, Glue | Limited | ✓ EMR, Glue |
| Streaming Ingestion | ✓ Yes | ✓ Yes | ✓ Excellent |
| Vendor Lock-in | None | Databricks features | None |
Our Recommendation
For AWS-centric architectures, Iceberg is the clear winner. Native support in Athena, EMR, and Glue, plus features like hidden partitioning and partition evolution, make it the most flexible choice. Choose Hudi if you need record-level streaming updates, or Delta if you're already heavily invested in Databricks.
7AWS Integration (Glue, Athena, EMR)
AWS provides first-class support for Iceberg across its analytics services:
Amazon Athena
Query Iceberg tables with standard SQL. Time travel, schema evolution, MERGE INTO all supported.
Athena v3+AWS Glue
Native Iceberg connector for ETL jobs. Use Glue Catalog as your Iceberg catalog.
Glue 4.0+Amazon EMR
Run Spark, Trino, or Presto with Iceberg. Full feature support including maintenance procedures.
EMR 6.5+Reading Iceberg with PySpark on EMR
from pyspark.sql import SparkSession
# Configure Spark with Iceberg
spark = SparkSession.builder \
.appName("IcebergAnalytics") \
.config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.glue_catalog.warehouse", "s3://my-data-lake/") \
.config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.getOrCreate()
# Read Iceberg table
df = spark.read.format("iceberg").load("glue_catalog.analytics.events")
# Time travel query
df_historical = spark.read \
.option("as-of-timestamp", "2025-01-01 00:00:00") \
.format("iceberg") \
.load("glue_catalog.analytics.events")
# Incremental read (changes since last snapshot)
df_changes = spark.read \
.format("iceberg") \
.option("start-snapshot-id", "123456") \
.option("end-snapshot-id", "789012") \
.load("glue_catalog.analytics.events")MERGE INTO for Upserts
Iceberg supports full DML operations — INSERT, UPDATE, DELETE, and MERGE. This enables CDC patterns and incremental processing:
-- Upsert pattern with MERGE INTO
MERGE INTO analytics.events AS target
USING staging.new_events AS source
ON target.event_id = source.event_id
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED THEN
INSERT *;
-- Delete with conditions
DELETE FROM analytics.events
WHERE event_timestamp < TIMESTAMP '2024-01-01'
AND event_type = 'debug';
-- Update specific records
UPDATE analytics.events
SET properties = map('status', 'processed')
WHERE event_id = 'evt_123';8Production Best Practices
Table Maintenance
Iceberg tables require regular maintenance for optimal performance. Set up scheduled jobs for these operations:
-- Expire old snapshots (keep last 5 days)
CALL glue_catalog.system.expire_snapshots(
table => 'analytics.events',
older_than => TIMESTAMP '2025-01-10 00:00:00',
retain_last => 10
);
-- Remove orphan files
CALL glue_catalog.system.remove_orphan_files(
table => 'analytics.events',
older_than => TIMESTAMP '2025-01-01 00:00:00'
);
-- Compact small files
CALL glue_catalog.system.rewrite_data_files(
table => 'analytics.events',
options => map(
'target-file-size-bytes', '134217728', -- 128MB
'min-input-files', '5'
)
);
-- Rewrite manifests for faster planning
CALL glue_catalog.system.rewrite_manifests(
'analytics.events'
);Recommended Maintenance Schedule
| Operation | Frequency | Purpose |
|---|---|---|
| Expire Snapshots | Daily | Remove old metadata, enable file cleanup |
| Remove Orphan Files | Weekly | Delete unreferenced data files |
| Compact Data Files | Daily/Weekly | Merge small files, improve query performance |
| Rewrite Manifests | Weekly | Optimize query planning |
Additional Best Practices
- Target 128MB-512MB file sizes for optimal query performance
- Use sort orders to cluster related data and improve compression
- Enable write.metadata.delete-after-commit.enabled for automatic cleanup
- Monitor table size growth and adjust partition granularity as needed
- Use branches/tags for testing schema changes before production
- Set up CloudWatch alarms for maintenance job failures
Conclusion
Apache Iceberg transforms data lakes from fragile file collections into reliable, feature-rich data platforms. With ACID transactions, time travel, schema evolution, and hidden partitioning, it brings data warehouse reliability to data lake economics.
For AWS users, Iceberg is now the recommended table format — native support in Athena, Glue, and EMR means you can adopt it without additional infrastructure. Start with a single table, prove the value, then migrate your lakehouse.
Ready to Build Your Data Lakehouse?
PATHSDATA specializes in modern data platforms on AWS. Let us help you design and implement your Iceberg-based lakehouse.
PATHSDATA Team
AWS Select Tier Consulting Partner
We help enterprises build modern data platforms on AWS. Specializing in data lakehouses, real-time pipelines, and analytics infrastructure.
