Skip to content

ankane/delta-ruby

Repository files navigation

delta-ruby

Delta Lake for Ruby

Supports local files and Amazon S3

Build Status

Installation

Add this line to your application’s Gemfile:

gem "deltalake-rb"

It can take 5-10 minutes to compile the gem.

Getting Started

Write data

df = Polars::DataFrame.new({"id" => [1, 2], "value" => [3.0, 4.0]})
DeltaLake.write("./events", df)

Load a table

dt = DeltaLake::Table.new("./events")
df = dt.to_polars

Get a lazy frame

lf = dt.to_polars(eager: false)

Append rows

DeltaLake.write("./events", df, mode: "append")

Overwrite a table

DeltaLake.write("./events", df, mode: "overwrite")

Add a constraint

dt.alter.add_constraint({"id_gt_0" => "id > 0"})

Drop a constraint

dt.alter.drop_constraint("id_gt_0")

Delete rows

dt.delete("id > 1")

Vacuum

dt.vacuum(dry_run: false)

Perform small file compaction

dt.optimize.compact

Colocate similar data in the same files

dt.optimize.z_order(["category"])

Load a previous version of a table

dt = DeltaLake::Table.new("./events", version: 1)
# or
dt.load_as_version(1)

Get the schema

dt.schema

Get metadata

dt.metadata

Get history

dt.history

API

This library follows the Delta Lake Python API (with a few changes to make it more Ruby-like). You can follow Python tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/delta-ruby.git
cd delta-ruby
bundle install
bundle exec rake compile
bundle exec rake test

About

Delta Lake for Ruby

Resources

License

Stars

Watchers

Forks

Packages

No packages published