Consistent Naming Standard When Using Dict in GroupBy .agg #21806

WillAyd · 2018-07-07T23:03:05Z

closes API: inconsistencies between grouped and grouped[['col']] in groupby #21790
[] tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This is a WIP as all of the tests in the groupby directory pass, but there are some outside of that which need touch ups. This is a relatively aggressive change so I figured I'd get buy in before going outside of that to modify tests.

This not only addresses the referenced issue but provides consistent expectations around columns returned when using a dict argument in a DataFrameGroupBy, namely that the returned object will have a MultiIndex where the top level is the column name and the subsequent level is the aggregation performed.

To play devil's advocate, this may be considered undesirable as it now returns a MultiIndex in some cases where that was flat before. However, I'd counter that this:

Behavior is more consistent and easier to communicate
Provides a return value whose column labeling is less ambiguous AND
Simplifies the code base, providing us a clear path forward for whatever the renaming solution actually SHOULD be

Curious to hear other's feedback @jreback @TomAugspurger @jorisvandenbossche

pep8speaks · 2018-07-07T23:03:08Z

Hello @WillAyd! Thanks for submitting the PR.

In the file pandas/tests/groupby/aggregate/test_other.py, following are the PEP8 issues :

Line 210:1: E302 expected 2 blank lines, found 1

WillAyd · 2018-07-07T23:03:44Z

pandas/tests/groupby/test_groupby.py

+    {'B': ['sum'], 'C': ['min']},  # Lists
+    {'B': {'sum': 'sum'}, 'C': {'min': 'min'}}  # deprecated call
+])
+def test_agg_dict_naming_consistency(select_columns, agg_argument):


FYI plan to move this to the aggregate sub-directory of tests

WillAyd · 2018-08-17T17:17:10Z

Closing as this should be part of larger discussion

WillAyd added 5 commits July 7, 2018 14:46

Added failing test

e90a050

Standardized return value of dict argument to .agg

56aeab0

Fixed most broken tests

9304f32

Merge remote-tracking branch 'upstream/master' into consistent-grp-names

83bb247

Removed xfail and test hacks

ee7b72c

WillAyd commented Jul 7, 2018

View reviewed changes

gfyoung added Groupby API Design labels Jul 12, 2018

WillAyd closed this Aug 17, 2018

WillAyd deleted the consistent-grp-names branch February 28, 2019 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent Naming Standard When Using Dict in GroupBy .agg #21806

Consistent Naming Standard When Using Dict in GroupBy .agg #21806

WillAyd commented Jul 7, 2018

pep8speaks commented Jul 7, 2018

WillAyd Jul 7, 2018

WillAyd commented Aug 17, 2018

Consistent Naming Standard When Using Dict in GroupBy .agg #21806

Consistent Naming Standard When Using Dict in GroupBy .agg #21806

Conversation

WillAyd commented Jul 7, 2018

pep8speaks commented Jul 7, 2018

WillAyd Jul 7, 2018

Choose a reason for hiding this comment

WillAyd commented Aug 17, 2018