Skip to content

ENH: Default negative location in pandas insert #49496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
Eschivo opened this issue Nov 3, 2022 · 11 comments
Closed
1 of 3 tasks

ENH: Default negative location in pandas insert #49496

Eschivo opened this issue Nov 3, 2022 · 11 comments
Assignees
Labels
API Design Closing Candidate May be closeable, needs more eyeballs DataFrame DataFrame data structure Enhancement

Comments

@Eschivo
Copy link

Eschivo commented Nov 3, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

In dataframe insert why not change the loc argument to a keyword argument with a default value of -1 ? Since in many cases I don't care where the column will be positioned my idea is that if the loc argument has a negative value (as in the maybe future default case) the column will be inserted as the last column of the dataset, i.e. if loc<0: loc=len(columns) . Then is also possible to remove the constraint 0 <= loc .

Feature Description

In pandas/core/frame.py insert definition

def insert(
        self,
        column: Hashable,
        value: Scalar | AnyArrayLike,
        loc: int = -1,
        allow_duplicates: bool | lib.NoDefault = lib.no_default,
    ) -> None:
...
    if not isinstance(loc, int):
            raise TypeError("loc must be int")
    
    if loc < 0:
        loc = len(self.columns)

    value = self._sanitize_column(value)
...

Alternative Solutions

I don't see any existing functionality or third-party package that can solve this issue

Additional Context

Usage example after the implementation:

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
>>> df.insert("newcol", [99, 99])
>>> df
   col1  col2  newcol
0     1     3     99
1     2     4     99

If you find this feature useful I can try to implement it.

@Eschivo Eschivo added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 3, 2022
@jenniferclarsson
Copy link

take

@topper-123
Copy link
Contributor

I think this would be a better API, but it is backward incompatible, unfortunately.

Could be done by wrapping the method in a deprecate_nonkeyword_arguments in pandas v2.x, then expediate this in pandas v3.0.

@topper-123 topper-123 added API Design DataFrame DataFrame data structure and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 11, 2023
@Eschivo
Copy link
Author

Eschivo commented May 15, 2023

Thank you for replying! I don't know how to handle the deprecation cycle you mentioned. If you think that this edit is worthwhile, I can try to implement it (with a bit of guidance)

@topper-123
Copy link
Contributor

topper-123 commented May 15, 2023

If we were to do this, we could do it like this in pandas 2.x:

@deprecate_nonkeyword_arguments(version=None, allowed_args=[])
def insert(
    self,
    loc: int,
    column: Hashable,
    value: Scalar | AnyArrayLike,
    allow_duplicates: bool | lib.NoDefault = lib.no_default,
) -> None:
...

Then in pandas 3.0 change the signature to:

def insert(
    self,
    column: Hashable,
    value: Scalar | AnyArrayLike,
    *,
    loc: int = -1,
    allow_duplicates: bool | lib.NoDefault = lib.no_default,
) -> None:
...

This will somewhat disruptive for the 2.x cycle, so IDK if other core devs would prefer to keep the current signature?

@topper-123
Copy link
Contributor

@pandas-dev/pandas-core.

@bashtage
Copy link
Contributor

I am -1 on this for two reasons:

  • insert is very similar to list.insert aside from the extra column parameter.
  • The canonical way to add a new column is just df["new_column_name"] = value. Having df.insert("new_column_name", value) do this same by default seems redundant. The point of insert is for when you want columns in a particular order, e.g., for exporting to a CSV file with a particular structure.

@rhshadrach
Copy link
Member

Agreed with @bashtage; I don't see the value of changing this.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented May 16, 2023

I'm -1 as well. Based on the example, the easy way to accomplish this using DataFrame.insert() is with:

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
>>> df.insert(len(df.columns), "newcol", [99, 99])
>>> df
   col1  col2  newcol
0     1     3      99
1     2     4      99

@topper-123
Copy link
Contributor

Ok, there's probably not support for this change, but I'll leave it open a while more, to see if it facilitates more discussion

@topper-123 topper-123 added the Closing Candidate May be closeable, needs more eyeballs label May 16, 2023
@rhshadrach
Copy link
Member

rhshadrach commented May 17, 2023

I didn't read the OP carefully enough; I would support negative indices for the loc (just like Python's list.insert does), but not changing loc to having a default value.

If we are to do this, then negative indices should behave in the same manner that Python treats them rather than as mentioned in the OP; e.g.

x = [1, 2 ,3]
x.insert(-2, 4)
print(x)
# [1, 4, 2, 3]

@mroeschke
Copy link
Member

Since it appears there was overall negative sentiment about this feature, as well as inactivity, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Closing Candidate May be closeable, needs more eyeballs DataFrame DataFrame data structure Enhancement
Projects
None yet
Development

No branches or pull requests

7 participants