[backport 2.3.x] BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) (#61909) #61963

jorisvandenbossche · 2025-07-26T12:24:19Z

Backport of #61909

…or string dtype (empty string instead of 0) (pandas-dev#61909)

jorisvandenbossche · 2025-08-20T08:17:29Z

pandas/core/groupby/groupby.py

+            if using_string_dtype() and method == "sum":
+                if isinstance(output, Series) and isinstance(output.dtype, StringDtype):
+                    d["fill_value"] = ""
+                    return output.reindex(**d)  # type: ignore[arg-type]
+                elif isinstance(output, DataFrame) and any(
+                    isinstance(dtype, StringDtype) for dtype in output.dtypes
+                ):
+                    orig_dtypes = output.dtypes
+                    indices = np.nonzero(output.dtypes == "string")[0]
+                    for idx in indices:
+                        output.isetitem(idx, output.iloc[:, idx].astype(object))
+                    output = output.reindex(**d)
+                    for idx in indices:
+                        col = output.iloc[:, idx]
+                        output.isetitem(
+                            idx, col.mask(col == 0, "").astype(orig_dtypes.iloc[idx])
+                        )
+                    return output


The original fix from the PR does not actually work for the 2.3.x branch, because here we still have the _reindex_output helper method to deal with the observed=False case.

And then we have the problem that output.reindex(..) doesn't work if you want to fill with multiple fill values (0 for numerical columns, "" for string columns). The above is a bit an ugly solution, but I also don't directly know a cleaner way how to do this with the current pandas APIs .. (and given this is only for 2.3.x and not to keep forever, and only behind the option flag, it's maybe OK?)

Yea - I don't see any nice solution here, I'm good with the above. Just have the CI failing.

BUG: fix fill value for gouped sum in case of unobserved categories f…

7f6206c

…or string dtype (empty string instead of 0) (pandas-dev#61909)

jorisvandenbossche added this to the 2.3.2 milestone Jul 26, 2025

jorisvandenbossche requested review from rhshadrach and WillAyd as code owners July 26, 2025 12:24

jorisvandenbossche mentioned this pull request Jul 26, 2025

BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) #61909

Merged

rhshadrach and others added 2 commits August 18, 2025 16:11

Merge branch '2.3.x' into backport-61909

66caaae

fix reindex to work for string dtype

b193cd5

jorisvandenbossche commented Aug 20, 2025

View reviewed changes

jorisvandenbossche requested a review from mroeschke August 20, 2025 17:19

jorisvandenbossche modified the milestones: 2.3.2, 2.3.3 Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[backport 2.3.x] BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) (#61909) #61963

[backport 2.3.x] BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) (#61909) #61963

Uh oh!

jorisvandenbossche commented Jul 26, 2025

Uh oh!

jorisvandenbossche Aug 20, 2025 •

edited

Loading

Uh oh!

rhshadrach Aug 24, 2025

Uh oh!

Uh oh!

Uh oh!

[backport 2.3.x] BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) (#61909) #61963

Are you sure you want to change the base?

[backport 2.3.x] BUG: fix fill value for gouped sum in case of unobserved categories for string dtype (empty string instead of 0) (#61909) #61963

Uh oh!

Conversation

jorisvandenbossche commented Jul 26, 2025

Uh oh!

jorisvandenbossche Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche Aug 20, 2025 •

edited

Loading