Skip to content

Consistent Naming Standard When Using Dict in GroupBy .agg #21806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jul 7, 2018

This is a WIP as all of the tests in the groupby directory pass, but there are some outside of that which need touch ups. This is a relatively aggressive change so I figured I'd get buy in before going outside of that to modify tests.

This not only addresses the referenced issue but provides consistent expectations around columns returned when using a dict argument in a DataFrameGroupBy, namely that the returned object will have a MultiIndex where the top level is the column name and the subsequent level is the aggregation performed.

To play devil's advocate, this may be considered undesirable as it now returns a MultiIndex in some cases where that was flat before. However, I'd counter that this:

  • Behavior is more consistent and easier to communicate
  • Provides a return value whose column labeling is less ambiguous AND
  • Simplifies the code base, providing us a clear path forward for whatever the renaming solution actually SHOULD be

Curious to hear other's feedback @jreback @TomAugspurger @jorisvandenbossche

@pep8speaks
Copy link

Hello @WillAyd! Thanks for submitting the PR.

Line 210:1: E302 expected 2 blank lines, found 1

{'B': ['sum'], 'C': ['min']}, # Lists
{'B': {'sum': 'sum'}, 'C': {'min': 'min'}} # deprecated call
])
def test_agg_dict_naming_consistency(select_columns, agg_argument):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI plan to move this to the aggregate sub-directory of tests

@WillAyd
Copy link
Member Author

WillAyd commented Aug 17, 2018

Closing as this should be part of larger discussion

@WillAyd WillAyd closed this Aug 17, 2018
@WillAyd WillAyd deleted the consistent-grp-names branch February 28, 2019 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: inconsistencies between grouped and grouped[['col']] in groupby
3 participants