Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-40579][PS]
GroupBy.first
should skip NULLs
### What changes were proposed in this pull request? make `GroupBy.first` skip nulls ### Why are the changes needed? to fix the behavior difference ``` In [1]: ...: import pandas as pd ...: import numpy as np ...: import pyspark.pandas as ps ...: ...: pdf = pd.DataFrame({"A": [1, 2, 1, 2],"B": [-1.5, np.nan, -3.2, 0.1],}) ...: psdf = ps.from_pandas(pdf) ...: In [2]: pdf.groupby("A").first() Out[2]: B A 1 -1.5 2 0.1 In [3]: psdf.groupby("A").first() B A 1 -1.5 2 NaN ``` ### Does this PR introduce _any_ user-facing change? yes, updated `GroupBy.first` will skip NULLs ### How was this patch tested? added UT Closes apache#38017 from zhengruifeng/ps_first_skip_na. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information