forked from apache/incubator-xtable
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
addressed all current feedback including: removed unused images, corr…
…ected missing TMs, removed unnecessary newline changes, deleted outdated FAQ entry
- Loading branch information
Showing
23 changed files
with
47 additions
and
438 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,6 @@ of a few interfaces, which we believe will facilitate the expansion of supported | |
future. | ||
|
||
# Building the project and running tests. | ||
|
||
1. Use Java11 for building the project. If you are using some other java version, you can | ||
use [jenv](https://github.com/jenv/jenv) to use multiple java versions locally. | ||
2. Build the project using `mvn clean package`. Use `mvn clean package -DskipTests` to skip tests while building. | ||
|
@@ -23,17 +22,14 @@ future. | |
4. Similarly, use `mvn clean verify` or `mvn verify` to run integration tests. | ||
|
||
# Style guide | ||
|
||
1. We use [Maven Spotless plugin](https://github.com/diffplug/spotless/tree/main/plugin-maven) and | ||
[Google java format](https://github.com/google/google-java-format) for code style. | ||
2. Use `mvn spotless:check` to find out code style violations and `mvn spotless:apply` to fix them. Code style check is | ||
tied to compile phase by default, so code style violations will lead to build failures. | ||
|
||
# Running the bundled jar | ||
|
||
1. Get a pre-built bundled jar or create the jar with `mvn install -DskipTests` | ||
2. create a yaml file that follows the format below: | ||
|
||
```yaml | ||
sourceFormat: HUDI | ||
targetFormats: | ||
|
@@ -53,12 +49,10 @@ datasets: | |
- tableBasePath: abfs://[email protected]/multi-partition-dataset | ||
tableName: multi_partition_dataset | ||
``` | ||
- `sourceFormat` is the format of the source table that you want to convert | ||
- `targetFormats` is a list of formats you want to create from your source tables | ||
- `tableBasePath` is the basePath of the table | ||
- `tableDataPath` is an optional field specifying the path to the data files. If not specified, the tableBasePath will | ||
be used. For Iceberg source tables, you will need to specify the `/data` path. | ||
- `tableDataPath` is an optional field specifying the path to the data files. If not specified, the tableBasePath will be used. For Iceberg source tables, you will need to specify the `/data` path. | ||
- `namespace` is an optional field specifying the namespace of the table and will be used when syncing to a catalog. | ||
- `partitionSpec` is a spec that allows us to infer partition values. This is only required for Hudi source tables. If | ||
the table is not partitioned, leave it blank. If it is partitioned, you can specify a spec with a comma separated list | ||
|
@@ -72,10 +66,8 @@ datasets: | |
- `HOUR`: same as `YEAR` but with hour granularity | ||
- `format`: if your partition type is `YEAR`, `MONTH`, `DAY`, or `HOUR` specify the format for the date string as it | ||
appears in your file paths | ||
|
||
3. The default implementations of table format clients can be replaced with custom implementations by specifying a | ||
client configs yaml file in the format below: | ||
|
||
```yaml | ||
# sourceClientProviderClass: The class name of a table format's client factory, where the client is | ||
# used for reading from a table of this format. All user configurations, including hadoop config | ||
|
@@ -85,49 +77,41 @@ datasets: | |
# used for writing to a table of this format. | ||
# configuration: A map of configuration values specific to this client. | ||
tableFormatsClients: | ||
HUDI: | ||
sourceClientProviderClass: io.onetable.hudi.HudiSourceClientProvider | ||
DELTA: | ||
targetClientProviderClass: io.onetable.delta.DeltaClient | ||
configuration: | ||
spark.master: local[2] | ||
spark.app.name: onetableclient | ||
HUDI: | ||
sourceClientProviderClass: io.onetable.hudi.HudiSourceClientProvider | ||
DELTA: | ||
targetClientProviderClass: io.onetable.delta.DeltaClient | ||
configuration: | ||
spark.master: local[2] | ||
spark.app.name: onetableclient | ||
``` | ||
|
||
4. A catalog can be used when reading and updating Iceberg tables. The catalog can be specified in a yaml file and | ||
passed in with the `--icebergCatalogConfig` option. The format of the catalog config file is: | ||
|
||
```yaml | ||
catalogImpl: io.my.CatalogImpl | ||
catalogName: name | ||
catalogOptions: # all other options are passed through in a map | ||
key1: value1 | ||
key2: value2 | ||
``` | ||
|
||
5. run | ||
with `java -jar utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.yaml [--hadoopConfig hdfs-site.xml] [--clientsConfig clients.yaml] [--icebergCatalogConfig catalog.yaml]` | ||
The bundled jar includes hadoop dependencies for AWS, Azure, and GCP. Authentication for AWS is done with | ||
`com.amazonaws.auth.DefaultAWSCredentialsProviderChain`. To override this setting, specify a different implementation | ||
with the `--awsCredentialsProvider` option. | ||
|
||
# Contributing | ||
|
||
## Setup | ||
|
||
For setting up the repo on IntelliJ, open the project and change the java version to Java11 in File->ProjectStructure | ||
![img.png](style/IDE.png) | ||
|
||
You have found a bug, or have a cool idea you that want to contribute to the project ? Please file a GitHub | ||
issue [here](https://github.com/onetable-io/onetable/issues) | ||
You have found a bug, or have a cool idea you that want to contribute to the project ? Please file a GitHub issue [here](https://github.com/onetable-io/onetable/issues) | ||
|
||
## Adding a new target format | ||
|
||
Adding a new target format requires a developer | ||
implement [TargetClient](./api/src/main/java/io/onetable/spi/sync/TargetClient.java). Once you have implemented that | ||
interface, you can integrate it into the [OneTableClient](./core/src/main/java/io/onetable/client/OneTableClient.java). | ||
If you think others may find that target useful, please raise a Pull Request to add it to the project. | ||
|
||
## Overview of the sync process | ||
|
||
![img.png](assets/images/sync_flow.jpg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.