Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

[Requires Arrow spec] Added support for decimal 32 and 64 #896

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Mar 8, 2022

This is being worked over the arrow mailing list and apache/arrow#8578 . It is marked as draft until it is officially supported / abandoned if abandoned by the official implementation.

Note that this PR is backward incompatible: DataType::Decimal(usize,usize) needs a new argument to differentiate between different physical types backing the (logical) decimal type.

Before:

use arrow2::datatypes::DataType;

let precision = 5;
let scale = 5;
let data_type = DataType::Decimal(precision, scale);

After:

use arrow2::datatypes::{DataType, DecimalType};

let precision = 5;
let scale = 5;
let data_type = DataType::Decimal(DecimalType::Int128, precision, scale);

where DecimalType is an enum with 3 different variants (Int32, Int64, Int128).

Because the other variants have an existing physical representation (i32 and i64 respectively), most of this crate works out of the box (growable, filter, take, comparison (including simd), etc.). There is still some work for the aggregations (decimal is not yet supported), cast, and arithmetics (arithmetics for decimals work a bit different due to the precision/scale constraints).

We also need to think about how to support reading from and writing to parquet (i.e. we may be able to read a parquet decimal i32 to a decimal i32 instead of i128 as we do it now).

@jorgecarleitao jorgecarleitao added the investigation Issues or PRs that are investigations. Prs may or may not be merged. label Mar 8, 2022
@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #896 (cfb7ff5) into main (735218b) will decrease coverage by 13.43%.
The diff coverage is 65.34%.

❗ Current head cfb7ff5 differs from pull request most recent head fd66c1f. Consider uploading reports for the commit fd66c1f to get more accurate results

@@             Coverage Diff             @@
##             main     #896       +/-   ##
===========================================
- Coverage   83.67%   70.24%   -13.44%     
===========================================
  Files         365      347       -18     
  Lines       35865    18925    -16940     
===========================================
- Hits        30009    13293    -16716     
+ Misses       5856     5632      -224     
Impacted Files Coverage Δ
src/compute/aggregate/min_max.rs 65.16% <0.00%> (-15.39%) ⬇️
src/compute/arithmetics/decimal/mod.rs 100.00% <ø> (+1.33%) ⬆️
src/compute/arithmetics/mod.rs 69.81% <ø> (-4.30%) ⬇️
src/compute/comparison/mod.rs 55.78% <ø> (+1.04%) ⬆️
src/compute/take/mod.rs 65.85% <ø> (-23.04%) ⬇️
src/io/avro/write/schema.rs 64.86% <0.00%> (-12.22%) ⬇️
src/io/json_integration/write/schema.rs 0.00% <0.00%> (ø)
src/io/odbc/read/schema.rs 0.00% <ø> (ø)
src/io/parquet/read/schema/convert.rs 74.34% <0.00%> (-19.88%) ⬇️
src/types/mod.rs 0.00% <0.00%> (-100.00%) ⬇️
... and 395 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 735218b...fd66c1f. Read the comment docs.

@jorgecarleitao jorgecarleitao marked this pull request as draft March 8, 2022 21:20
@jorgecarleitao jorgecarleitao force-pushed the decimal branch 2 times, most recently from 9518c6c to cfb7ff5 Compare March 12, 2022 21:38
@jorgecarleitao jorgecarleitao changed the title Added support for decimal 32 and 64 [Requires Arrow spec] Added support for decimal 32 and 64 May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
investigation Issues or PRs that are investigations. Prs may or may not be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant