Skip to content
Merged
46 changes: 46 additions & 0 deletions doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1732,3 +1732,49 @@ Why does assignment fail when using chained indexing?
This means that chained indexing will never work.
See :ref:`this section <copy_on_write_chained_assignment>`
for more context.

.. _indexing.series_assignment:

Series Assignment and Index Alignment
-------------------------------------

When assigning a Series to a DataFrame column, pandas performs automatic alignment
based on index labels. This is a fundamental behavior that can be surprising to
new users who might expect positional assignment.

Key Points:
~~~~~~~~~~~

* Series values are matched to DataFrame rows by index label
* Position/order in the Series doesn't matter
* Missing index labels result in NaN values
* This behavior is consistent across df[col] = series and df.loc[:, col] = series

Examples:
.. ipython:: python

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'values': [1, 2, 3]}, index=['x', 'y', 'z'])

# Series with matching indices (different order)
s1 = pd.Series([10, 20, 30], index=['z', 'x', 'y'])
df['aligned'] = s1 # Aligns by index, not position
print(df)

# Series with partial index match
s2 = pd.Series([100, 200], index=['x', 'z'])
df['partial'] = s2 # Missing 'y' gets NaN
print(df)

# Series with non-matching indices
s3 = pd.Series([1000, 2000], index=['a', 'b'])
df['nomatch'] = s3 # All values become NaN
print(df)


#Avoiding Confusion:
#If you want positional assignment instead of index alignment:
# reset the Series index to match DataFrame index
df['s1_values'] = s1.reindex(df.index)
83 changes: 83 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4213,6 +4213,89 @@ def isetitem(self, loc, value) -> None:
self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)

def __setitem__(self, key, value) -> None:
"""
Set item(s) in DataFrame by key.

This method allows you to set the values of one or more columns in the
DataFrame using a key. If the key does not exist, a new
column will be created.

Parameters
----------
key : The object(s) in the index which are to be assigned to
Column label(s) to set. Can be a single column name, list of column names,
or tuple for MultiIndex columns.
value : scalar, array-like, Series, or DataFrame
Value(s) to set for the specified key(s).

Returns
-------
None
This method does not return a value.

See Also
--------
DataFrame.loc : Access and set values by label-based indexing.
DataFrame.iloc : Access and set values by position-based indexing.
DataFrame.assign : Assign new columns to a DataFrame.

Notes
-----
When assigning a Series to a DataFrame column, pandas aligns the Series
by index labels, not by position. This means:

* Values from the Series are matched to DataFrame rows by index label
* If a Series index label doesn't exist in the DataFrame index, it's ignored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the difference between this and the line directly following it with the distinction of ignored versus NaN - can you help me understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Series index labels NOT in DataFrame → IGNORED
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
s = pd.Series([10, 20, 30, 40, 50], index=['x', 'y', 'a', 'b', 'z'])
df['B'] = s
df
   A   B
x  1  10
y  2  20
z  3  50
# Values for 'a' and 'b' are completely ignored!
  1. DataFrame index labels NOT in Series → NaN
df = pd.DataFrame({'A': [1, 2, 3, 4]}, index=['x', 'y', 'z', 'w'])
s = pd.Series([100, 300], index=['x', 'z'])  # Missing 'y' and 'w'
df['B'] = s
df
   A      B
x  1  100.0
y  2    NaN
z  3  300.0
w  4    NaN
# Rows 'y' and 'w' get NaN because they're missing from Series

Added the ignored example to documentation.

* If a DataFrame index label doesn't exist in the Series index, NaN is assigned
* The order of values in the Series doesn't matter; only the index labels matter

Examples
--------
Basic column assignment:

>>> df = pd.DataFrame({"A": [1, 2, 3]})
>>> df["B"] = [4, 5, 6] # Assigns by position
>>> df
A B
0 1 4
1 2 5
2 3 6

Series assignment with index alignment:

>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2])
>>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df
>>> df["B"] = s # Assigns by index label, not position
>>> df
A B
0 1 NaN
1 2 10
2 3 NaN

Series assignment with partial index match:

>>> df = pd.DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"])
>>> s = pd.Series([100, 200], index=["b", "d"])
>>> df["B"] = s
>>> df
A B
a 1 NaN
b 2 100
c 3 NaN
d 4 200

Series index labels NOT in DataFrame, ignored:

>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=["x", "y", "z"])
>>> s = pd.Series([10, 20, 30, 40, 50], index=["x", "y", "a", "b", "z"])
>>> df["B"] = s
>>> df
A B
x 1 10
y 2 20
z 3 50
# Values for 'a' and 'b' are completely ignored!
"""
if not PYPY:
if sys.getrefcount(self) <= 3:
warnings.warn(
Expand Down
16 changes: 16 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,22 @@ def loc(self) -> _LocIndexer:

Please see the :ref:`user guide<advanced.advanced_hierarchical>`
for more details and explanations of advanced indexing.

**Assignment with Series**

When assigning a Series to .loc[row_indexer, col_indexer], pandas aligns
the Series by index labels, not by order or position.

Series assignment with .loc and index alignment:

>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2])
>>> s = pd.Series([10, 20], index=[1, 0]) # Note reversed order
>>> df.loc[:, "B"] = s # Aligns by index, not order
>>> df
A B
0 1 20.0
1 2 10.0
2 3 NaN
"""
return _LocIndexer("loc", self)

Expand Down
Loading