AWS Glue Adds Auto Compaction for Faster Apache Iceberg Queries
Amazon Web Services has introduced an automatic compaction feature in AWS Glue to improve query performance on Apache Iceberg tables. This addresses challenges with small files generated by real-time data ingestion.

Amazon Web Services (AWS) has released a new automatic compaction capability for its AWS Glue service, aimed at accelerating query speeds on Apache Iceberg tables within data lakes.
Apache Iceberg is an open table format that provides ACID transactions and better data management compared to traditional data lake architectures. The new auto-compaction feature tackles the common issue of numerous small files that often result from real-time data streaming and ingestion processes. Previously, managing these small files required complex Extract, Transform, Load (ETL) processes or custom-built solutions.
By automating the compaction of small files into larger, more efficient ones, AWS Glue helps optimize table performance. This can lead to faster data retrieval, reduced query costs, and improved overall efficiency for analytics workloads. The feature is part of AWS's ongoing efforts to simplify data lake management and enhance data processing capabilities.
This enhancement is particularly beneficial for organizations leveraging data lakes for diverse use cases, including real-time analytics and application synchronization. The automated optimization supports data quality and governance requirements in demanding environments, reducing operational overhead and complexity.