[iceberg] support migrating iceberg table which had suffered schema evolution #5083
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #xxx
In pr #4639 and #4878, we had supported migrating iceberg table managed by hadoop-catalog or hive-catalog to paimon. This pr aims to support migrating the iceberg table which had suffered once or several times schema evolution.
Paimon stores the schema-id in each DataFileMeta for reading data files which had suffered schema evolution, so we extract the schema-id used by each iceberg data file and record it in the corresponding paimon DataFileMeta, and this makes paimon can handle the schema evolution case.
Tests
IcebergMigrateTest#testDeleteColumn
IcebergMigrateTest#testRenameColumn
IcebergMigrateTest#testAddColumn
IcebergMigrateTest#testReorderColumn
IcebergMigrateTest#testUpdateColumn
IcebergMigrateTest#testMigrateWithRandomIcebergEvolution
API and Format
Documentation