Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cast of float to pl.Decimal silently fails but also changes float values #12775

Closed
2 tasks done
Julian-J-S opened this issue Nov 29, 2023 · 1 comment · Fixed by #20999
Closed
2 tasks done

cast of float to pl.Decimal silently fails but also changes float values #12775

Julian-J-S opened this issue Nov 29, 2023 · 1 comment · Fixed by #20999
Labels
A-dtype-decimal Area: decimal data type bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@Julian-J-S
Copy link
Contributor

Julian-J-S commented Nov 29, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.DataFrame({"x": [1.5]}).with_columns(
    x2=pl.col("x").cast(pl.Decimal),  # >>>>>> still f64 but value changes to 1.0 ^^
    x3=pl.col("x").cast(pl.Utf8).cast(pl.Decimal),
)

shape: (1, 3)
┌─────┬─────┬──────────────┐
│ xx2x3           │
│ ---------          │
│ f64f64decimal[*,1] │
╞═════╪═════╪══════════════╡
│ 1.51.01.5          │
└─────┴─────┴──────────────┘

Log output

No response

Issue description

casting a float to a decimal silently fails and does NOT change the type but somehow changes the data (1.5 -> 1.0)

Expected behavior

option 1: it should just work

option 2: show error that only string can be converted to decimal and tell user to cast to uft8 first

Installed versions

0.19.17
@Julian-J-S Julian-J-S added bug Something isn't working python Related to Python Polars labels Nov 29, 2023
@cgevans
Copy link
Contributor

cgevans commented Jan 9, 2024

There is a combination of two problems here:

  • You don't have decimals enabled (pl.Config(activate_decimal=True)). It seems like quite a lot of decimal functionality is now leaking through that setting when it is False, resulting in the column still being f64. It looks like it is actually converting to decimal, then converting back.
  • You are not specifying a scale for the decimal. With strings, casting to a decimal without specifying scale infers the minimum necessary scale, in this case, 1. However, the default value for scale in pl.Decimal is 0, rather than being an int | None, and 0 is a valid scale (essentially, just an i128). It appears that, here, the lack of inference means that scale=0 is used, with smaller parts being truncated/rounded. If you specify a scale, the code works as intended:
In [5]:  pl.DataFrame({"x": [1.5]}).with_columns(
   ...: x2=pl.col("x").cast(pl.Decimal(scale=4)),
   ...: x3=pl.col("x").cast(pl.Utf8).cast(pl.Decimal(scale=4)),
   ...: )
Out[5]: 
shape: (1, 3)
┌─────┬───────────────┬──────────────┐
│ xx2x3           │
│ ---------          │
│ f64decimal[38,4] ┆ decimal[*,4] │
╞═════╪═══════════════╪══════════════╡
│ 1.51.50001.5000       │
└─────┴───────────────┴──────────────┘

Scale inference for floats would likely be problematic or infeasible: after all, if there was a perfect way to encode decimal numbers in floats, there wouldn't be much of a need for a decimal type! I think the most reasonable option here would be to not allow casts from floats to decimals without a specified scale, and truncate or round at that scale. This would not be consistent with string to decimal casts (infer scale if not specified, fail if number can't be exactly represented at a specified scale), but would make more sense for floats, and would make reliable casts viable in many circumstances.

The problem with doing this, however, is that with the current arrangement, there doesn't appear to be a way to distinguish between Decimal(scale=0) being a specified scale, meaning the user wants rounding to integers, and between it being a default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-decimal Area: decimal data type bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants