presto/presto-iceberg at master · bourne1116/presto

Name	Name	Last commit message	Last commit date
parent directory ..
src	src
README.md	README.md
pom.xml	pom.xml

This plugin allows Presto to interact with Iceberg tables.

Status

Currently, this plugin supports create, CTAS, drop, rename, and reading from Iceberg tables. It also supports adding, dropping, and renaming columns.

Configuration

Iceberg supports the same metastore configuration properties as the Hive connector. At a minimum, hive.metastore.uri must be configured:

connector.name=iceberg
hive.metastore.uri=thrift://localhost:9083

Unpartitioned Tables

CREATE TABLE iceberg.testdb.sample (
    i int, 
    s varchar
)

Partitioned Tables

Iceberg supports partitioning by specifying transforms over the table columns. A partition is created for each unique tuple value produced by the transforms. Identity transforms are simply the column name. Other transforms:

year(ts)
month(ts)
day(ts)
hour(ts)
bucket(x, 512)
truncate(s, 16)

CREATE TABLE iceberg.testdb.sample_partitioned (
    b boolean,
    dateint integer,
    l bigint,
    f real,
    d double,
    de decimal(12,2),
    dt date,
    ts timestamp,
    s varchar,
    bi varbinary
 )
WITH (partitioning = ARRAY['dateint', 's'])

Migrating existing tables

The connector can read from or write to Hive tables that have been migrated to Iceberg. Currently, there is no Presto support to migrate Hive tables to Presto, so you will need to use either the Icerberg API or Spark.

System Tables and Columns

The connector supports table$partitions as a substitute for Hive's SHOW PARTITIONS. The differences are that it returns some partition metrics for each partition value and you can also use it for unpartitioned tables.

Iceberg supports $snapshot_id and $snapshot_timestamp_ms as hidden columns. These columns allow users to query an old version of the table. Think of this as a time travel feature which lets you query your table's snapshot at a given time.

Rolling back to a previous Snapshot

The connector provides a system snapshots table for each Iceberg table. Snapshots are identified by BIGINT snapshot ids. You can find the latest snapshot id for table foo by incanting:

SELECT snapshot_id FROM 'foo$snapshots' ORDER BY committed_at DESC LIMIT 1

An SQL procedure allows the caller to roll back the state of the table to a previous snapshot id, thusly:

CALL system.rollback_to_snapshot(schema_name, table_name, snapshot_id)

TODO

Update the README to reflect the current status, and convert it to proper connector documentation before announcing the connector as ready for use.
Add tests for truncate on numeric types.
Test Iceberg table names like my_table@456 and my_table@456$partitions.
Test partition transforms on structured types.
Add tests for non-Iceberg tables
- listing tables in a schema
- listing columns in a schema
  - filtered for an Iceberg table
  - filtered for a non-Iceberg table
  - no filters
- describing a table
- selecting from a table
Add procedures for migrate table and rollback table to snapshot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

presto-iceberg

presto-iceberg

README.md

Status

Configuration

Unpartitioned Tables

Partitioned Tables

Migrating existing tables

System Tables and Columns

Rolling back to a previous Snapshot

TODO

Files

presto-iceberg

Directory actions

More options

Directory actions

More options

Latest commit

History

presto-iceberg

Folders and files

parent directory

README.md

Status

Configuration

Unpartitioned Tables

Partitioned Tables

Migrating existing tables

System Tables and Columns

Rolling back to a previous Snapshot

TODO