You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, pandas.DataFrame.groupby() requires users to explicitly specify both the grouping columns and the aggregation functions. This can be repetitive and inefficient, especially during exploratory data analysis on large DataFrames with many columns. A common use case like “group by all categorical columns and compute the mean of numeric columns” requires verbose, manual setup.
Feature Description
Add a new method to DataFrame called smart_groupby(), which intelligently infers grouping and aggregation behavior based on the column types of the DataFrame.
Proposed behavior:
If no parameters are passed:
Group by all columns of type object, category, or bool
Aggregate all remaining numeric columns using the mean
Optional keyword parameters:
by: specify grouping columns explicitly
agg: specify aggregation function(s) (default is "mean")
exclude: exclude specific columns from grouping or aggregation
Alternative Solutions
Currently, users must write verbose code to accomplish the same:
group_cols = [col for col in df.columns if df[col].dtype == 'category']
agg_cols = [col for col in df.columns if pd.api.types.is_numeric_dtype(df[col])]
df.groupby(group_cols)[agg_cols].mean()
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion but I would be -1 including this in pandas. pandas is moving toward explicit and less automatic behaviors, and the snippet you posted is short enough to be wrapped in a custom helper function
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Currently, pandas.DataFrame.groupby() requires users to explicitly specify both the grouping columns and the aggregation functions. This can be repetitive and inefficient, especially during exploratory data analysis on large DataFrames with many columns. A common use case like “group by all categorical columns and compute the mean of numeric columns” requires verbose, manual setup.
Feature Description
Add a new method to DataFrame called smart_groupby(), which intelligently infers grouping and aggregation behavior based on the column types of the DataFrame.
Proposed behavior:
Alternative Solutions
Currently, users must write verbose code to accomplish the same:
Additional Context
No response
The text was updated successfully, but these errors were encountered: