Skip to content

Latest commit

 

History

History
 
 

presto-iceberg

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

This plugin allows Presto to interact with Iceberg tables.

Status

Currently, this plugin supports create, CTAS, drop, rename, and reading from Iceberg tables. It also supports adding, dropping, and renaming columns.

Configuration

Iceberg supports the same metastore configuration properties as the Hive connector. At a minimum, hive.metastore.uri must be configured:

connector.name=iceberg
hive.metastore.uri=thrift://localhost:9083

Unpartitioned Tables

CREATE TABLE iceberg.testdb.sample (
    i int, 
    s varchar
)

Partitioned Tables

Iceberg supports partitioning by specifying transforms over the table columns. A partition is created for each unique tuple value produced by the transforms. Identity transforms are simply the column name. Other transforms:

  • year(ts)
  • month(ts)
  • day(ts)
  • hour(ts)
  • bucket(x, 512)
  • truncate(s, 16)
CREATE TABLE iceberg.testdb.sample_partitioned (
    b boolean,
    dateint integer,
    l bigint,
    f real,
    d double,
    de decimal(12,2),
    dt date,
    ts timestamp,
    s varchar,
    bi varbinary
 )
WITH (partitioning = ARRAY['dateint', 's'])

Migrating existing tables

The connector can read from or write to Hive tables that have been migrated to Iceberg. Currently, there is no Presto support to migrate Hive tables to Presto, so you will need to use either the Icerberg API or Spark.

System Tables and Columns

The connector supports table$partitions as a substitute for Hive's SHOW PARTITIONS. The differences are that it returns some partition metrics for each partition value and you can also use it for unpartitioned tables.

Iceberg supports $snapshot_id and $snapshot_timestamp_ms as hidden columns. These columns allow users to query an old version of the table. Think of this as a time travel feature which lets you query your table's snapshot at a given time.

Rolling back to a previous Snapshot

The connector provides a system snapshots table for each Iceberg table. Snapshots are identified by BIGINT snapshot ids. You can find the latest snapshot id for table foo by incanting:

SELECT snapshot_id FROM 'foo$snapshots' ORDER BY committed_at DESC LIMIT 1

An SQL procedure allows the caller to roll back the state of the table to a previous snapshot id, thusly:

CALL system.rollback_to_snapshot(schema_name, table_name, snapshot_id)

TODO

  • Update the README to reflect the current status, and convert it to proper connector documentation before announcing the connector as ready for use.
  • Add tests for truncate on numeric types.
  • Test Iceberg table names like my_table@456 and my_table@456$partitions.
  • Test partition transforms on structured types.
  • Add tests for non-Iceberg tables
    • listing tables in a schema
    • listing columns in a schema
      • filtered for an Iceberg table
      • filtered for a non-Iceberg table
      • no filters
    • describing a table
    • selecting from a table
  • Add procedures for migrate table and rollback table to snapshot.