Skip to content

⚡️ Speed up function _facet_grid by 11% #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented May 24, 2025

📄 11% (0.11x) speedup for _facet_grid in plotly/figure_factory/_facet_grid.py

⏱️ Runtime : 1.40 seconds 1.25 seconds (best of 7 runs)

📝 Explanation and details

Here are optimized versions of the functions based on the profiling data. The functions with the highest impact on runtime are: make_subplots, _facet_grid, _annotation_dict, _make_trace_for_scatter and, to a much lesser degree, _return_label. The key bottleneck is calling make_subplots (especially if it is not cached), iterating inefficiently over DataFrames, repeatedly allocating dicts in inner loops, and unnecessary computation.

Optimizations below:

  • Pre-allocate and reuse objects when possible.
  • Avoid repeated DataFrame lookups (e.g., avoid repeated groupby and unique(), instead reuse the results).
  • Use list comprehensions and bulk operations when possible.
  • Minimize string formatting/creation within tight loops.
  • Pre-calculate constants used in many places within a function.
  • Use tuple unpacking and fast indexing.
  • Inline simple dictionary lookups.
  • Avoid rebuilding markers and dicts on every loop iteration if the content does not depend on loop variables.

Note: Changes are only made for performance - all signatures, basic logic, and original comments are preserved.


Key details and changes:

  • All marker/trace dicts are only constructed once when possible, and reused.
  • All DataFrame column extraction uses .values, which returns fast NumPy arrays, and Index objects for uniqueness.
  • Grouping is always performed once, and group lookups are done using dicts.
  • String formatting in _return_label uses fast f-strings.
  • In _annotation_dict, all constant expressions are at the top and each branch is as small as possible.
  • The plotting assembly (append_trace) is unaltered but should be the only high cost left.

This approach maximally preserves compatibility and interface and will be noticeably faster especially for larger datasets and higher numbers of plots. If you need even more extreme performance (such as heavy annotation/label generation for massive grids), let me know and further vectorization approaches (possibly outside of Plotly itself) can be considered.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 31 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import pandas as pd
# imports
import pytest  # used for our unit tests
from plotly.figure_factory._facet_grid import _facet_grid

# function to test (already defined above as _facet_grid)


# Helper function to create a simple DataFrame for tests
def make_simple_df():
    return pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "group": ["A", "A", "B", "B"],
        "col": ["X", "Y", "X", "Y"]
    })

# Helper for marker/trace kwargs
def default_kwargs():
    return {
        "trace_type": "scatter",
        "flipped_rows": True,
        "flipped_cols": True,
        "show_boxes": False,
        "SUBPLOT_SPACING": 0.05,
        "marker_color": "#123456",
        "kwargs_trace": {},
        "kwargs_marker": {"line": {"color": "#000", "width": 1}}
    }

# =========================
# 1. Basic Test Cases
# =========================

def test_basic_no_facets():
    """Test with no facet_row or facet_col - should yield a single trace in (1,1)"""
    df = make_simple_df()
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, None, 1, 1, None, None, **args
    )

def test_basic_facet_row():
    """Test with facet_row only - should yield two traces in two rows"""
    df = make_simple_df()
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", None, 2, 1, None, None, **args
    )
    # Check annotation text matches group names
    ann_texts = [a["text"] for a in annotations]

def test_basic_facet_col():
    """Test with facet_col only - should yield two traces in two columns"""
    df = make_simple_df()
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, "col", 1, 2, None, None, **args
    )
    # Check annotation text matches col names
    ann_texts = [a["text"] for a in annotations]

def test_basic_facet_row_col():
    """Test with both facet_row and facet_col - 2x2 grid, 4 traces"""
    df = make_simple_df()
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", "col", 2, 2, None, None, **args
    )
    ann_texts = [a["text"] for a in annotations]

def test_basic_with_labels_dict():
    """Test with facet_row_labels and facet_col_labels as dicts"""
    df = make_simple_df()
    args = default_kwargs()
    facet_row_labels = {"A": "Alpha", "B": "Beta"}
    facet_col_labels = {"X": "Ex", "Y": "Why"}
    fig, annotations = _facet_grid(
        df, "x", "y", "group", "col", 2, 2, facet_row_labels, facet_col_labels, **args
    )
    ann_texts = [a["text"] for a in annotations]

def test_basic_with_labels_str():
    """Test with facet_row_labels and facet_col_labels as string (should prefix)"""
    df = make_simple_df()
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", "col", 2, 2, "label", "label", **args
    )
    ann_texts = [a["text"] for a in annotations]

# =========================
# 2. Edge Test Cases
# =========================

def test_empty_dataframe():
    """Test with an empty DataFrame - should yield one trace with no data"""
    df = pd.DataFrame(columns=["x", "y"])
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, None, 1, 1, None, None, **args
    )
    # The trace should have empty x and y
    trace = fig.data[0]

def test_missing_facet_value():
    """Test with some facet row/col combinations missing (should fill with None)"""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [10, 20],
        "group": ["A", "B"],
        "col": ["X", "Y"]
    })
    # Remove one combination: only (A,X) and (B,Y)
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", "col", 2, 2, None, None, **args
    )
    # At least one trace should have only None values
    empty_traces = [t for t in fig.data if all(v is None for v in t["x"]) and all(v is None for v in t["y"])]

def test_single_row_facet():
    """Test with only one unique value in facet_row"""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "group": ["A", "A", "A"]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", None, 1, 1, None, None, **args
    )

def test_single_col_facet():
    """Test with only one unique value in facet_col"""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "col": ["Z", "Z", "Z"]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, "col", 1, 1, None, None, **args
    )

def test_nan_in_facet_col():
    """Test with NaN values in facet_col"""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "col": ["A", None, "B"]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, "col", 1, 3, None, None, **args
    )

def test_nan_in_facet_row():
    """Test with NaN values in facet_row"""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "row": ["A", None, "B"]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "row", None, 3, 1, None, None, **args
    )

def test_no_x_or_y():
    """Test with no x or y column (should not fail, but trace will be empty)"""
    df = pd.DataFrame({"foo": [1, 2], "bar": [3, 4]})
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, None, None, None, None, 1, 1, None, None, **args
    )
    trace = fig.data[0]

def test_custom_marker_kwargs():
    """Test that marker kwargs are passed through correctly"""
    df = make_simple_df()
    args = default_kwargs()
    args["kwargs_marker"] = {"line": {"color": "red", "width": 2}, "size": 10}
    fig, _ = _facet_grid(
        df, "x", "y", None, None, 1, 1, None, None, **args
    )
    trace = fig.data[0]

def test_non_string_facet_labels():
    """Test facet labels that are not strings (e.g. integers, tuples)"""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "group": [1, 1, 2, 2],
        "col": [(1, 'a'), (1, 'a'), (2, 'b'), (2, 'b')]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "group", "col", 2, 2, None, None, **args
    )
    # Should be 4 traces, annotation texts should include 1, 2, (1, 'a'), (2, 'b')
    ann_texts = [str(a["text"]) for a in annotations]

# =========================
# 3. Large Scale Test Cases
# =========================

def test_large_number_of_facets():
    """Test with a large number of facet_row and facet_col values"""
    n = 20  # 20x20 = 400 subplots
    df = pd.DataFrame({
        "x": [i for i in range(n * n)],
        "y": [i * 2 for i in range(n * n)],
        "row": [i // n for i in range(n * n)],
        "col": [i % n for i in range(n * n)]
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", "row", "col", n, n, None, None, **args
    )

def test_large_dataframe_no_facets():
    """Test with a large DataFrame but no facets"""
    df = pd.DataFrame({
        "x": list(range(1000)),
        "y": list(range(1000))
    })
    args = default_kwargs()
    fig, annotations = _facet_grid(
        df, "x", "y", None, None, 1, 1, None, None, **args
    )
    # Should have all 1000 points in x and y
    trace = fig.data[0]





import pandas as pd
# imports
import pytest
from plotly.figure_factory._facet_grid import _facet_grid

# function to test (already provided above)
# _facet_grid is assumed to be defined as in the prompt

# --------------------
# Basic Test Cases
# --------------------

def test_basic_no_facets():
    # Basic: No facet_row or facet_col, just a scatter plot
    df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col=None,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )

def test_basic_facet_row():
    # Basic: Facet by row
    df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8], 'cat': ['a', 'a', 'b', 'b']})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='cat',
        facet_col=None,
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='blue',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 2, 'color': 'gray'}}
    )
    # Each trace should have the correct data
    xs = [list(trace['x']) for trace in fig.data]
    ys = [list(trace['y']) for trace in fig.data]
    # Annotation text should contain 'a' and 'b'
    texts = [a['text'] for a in annotations]

def test_basic_facet_col():
    # Basic: Facet by column
    df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8], 'cat': ['a', 'a', 'b', 'b']})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col='cat',
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='green',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 2, 'color': 'gray'}}
    )
    xs = [list(trace['x']) for trace in fig.data]
    ys = [list(trace['y']) for trace in fig.data]
    texts = [a['text'] for a in annotations]

def test_basic_facet_row_and_col():
    # Basic: Facet by row and col
    df = pd.DataFrame({
        'x': [1, 2, 3, 4],
        'y': [5, 6, 7, 8],
        'cat1': ['a', 'a', 'b', 'b'],
        'cat2': ['c', 'd', 'c', 'd']
    })
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='cat1',
        facet_col='cat2',
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='purple',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
    texts = [a['text'] for a in annotations]

def test_basic_trace_type_scattergl():
    # Basic: scattergl trace type
    df = pd.DataFrame({'x': [1, 2], 'y': [3, 4]})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col=None,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scattergl',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='orange',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )

# --------------------
# Edge Test Cases
# --------------------

def test_edge_empty_dataframe():
    # Edge: Empty dataframe
    df = pd.DataFrame({'x': [], 'y': []})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col=None,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )

def test_edge_missing_facet_values():
    # Edge: Some facet combinations missing
    df = pd.DataFrame({
        'x': [1, 2, 3],
        'y': [4, 5, 6],
        'cat1': ['a', 'a', 'b'],
        'cat2': ['c', 'd', 'c']
    })
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='cat1',
        facet_col='cat2',
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
    # At least one trace should have only NaN values
    found_nan = False
    for trace in fig.data:
        if all(pd.isna(val) for val in trace['x']):
            found_nan = True

def test_edge_facet_labels_dict_and_str():
    # Edge: facet_row_labels as dict, facet_col_labels as str
    df = pd.DataFrame({
        'x': [1, 2, 3, 4],
        'y': [5, 6, 7, 8],
        'cat1': ['a', 'a', 'b', 'b'],
        'cat2': ['c', 'd', 'c', 'd']
    })
    facet_row_labels = {'a': 'Alpha', 'b': 'Beta'}
    facet_col_labels = 'label'
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='cat1',
        facet_col='cat2',
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
    texts = [a['text'] for a in annotations]

def test_edge_flipped_rows_and_cols():
    # Edge: flipped_rows and flipped_cols True
    df = pd.DataFrame({
        'x': [1, 2, 3, 4],
        'y': [5, 6, 7, 8],
        'cat1': ['a', 'a', 'b', 'b'],
        'cat2': ['c', 'd', 'c', 'd']
    })
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='cat1',
        facet_col='cat2',
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
    # Annotations should have yanchor or xanchor set as per flipped logic
    for a in annotations:
        pass

def test_edge_no_x_or_y():
    # Edge: x or y is None
    df = pd.DataFrame({'foo': [1, 2, 3]})
    fig, annotations = _facet_grid(
        df=df,
        x=None,
        y=None,
        facet_row=None,
        facet_col=None,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color='red',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )

def test_edge_kwargs_marker_line_missing():
    # Edge: kwargs_marker without 'line' key
    df = pd.DataFrame({'x': [1], 'y': [2]})
    # Should raise KeyError if 'line' not in kwargs_marker
    with pytest.raises(KeyError):
        _facet_grid(
            df=df,
            x='x',
            y='y',
            facet_row=None,
            facet_col=None,
            num_of_rows=1,
            num_of_cols=1,
            facet_row_labels=None,
            facet_col_labels=None,
            trace_type='scatter',
            flipped_rows=False,
            flipped_cols=False,
            show_boxes=False,
            SUBPLOT_SPACING=0.1,
            marker_color='red',
            kwargs_trace={},
            kwargs_marker={}  # missing 'line'
        )

# --------------------
# Large Scale Test Cases
# --------------------

def test_large_many_facets():
    # Large: 10x10 facets (100 subplots)
    n = 10
    rows = []
    for i in range(n):
        for j in range(n):
            rows.append({'x': i, 'y': j, 'row': f'row{i}', 'col': f'col{j}'})
    df = pd.DataFrame(rows)
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row='row',
        facet_col='col',
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color='blue',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )

def test_large_many_points_per_facet():
    # Large: 10 facets, each with 50 points
    n_facets = 10
    n_points = 50
    rows = []
    for i in range(n_facets):
        for j in range(n_points):
            rows.append({'x': j, 'y': j*2, 'facet': f'f{i}'})
    df = pd.DataFrame(rows)
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col='facet',
        num_of_rows=1,
        num_of_cols=n_facets,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color='green',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
    for trace in fig.data:
        pass

def test_large_performance_under_1000_elements():
    # Large: 999 elements, single facet
    n = 999
    df = pd.DataFrame({'x': range(n), 'y': range(n)})
    fig, annotations = _facet_grid(
        df=df,
        x='x',
        y='y',
        facet_row=None,
        facet_col=None,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type='scatter',
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color='black',
        kwargs_trace={},
        kwargs_marker={'line': {'width': 1, 'color': 'black'}}
    )
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_facet_grid-mb2f798b and push.

Codeflash

Here are optimized versions of the functions based on the profiling data. The functions with the highest impact on runtime are: `make_subplots`, `_facet_grid`, `_annotation_dict`, `_make_trace_for_scatter` and, to a much lesser degree, `_return_label`. The key bottleneck is calling `make_subplots` (especially if it is not cached), iterating inefficiently over DataFrames, repeatedly allocating dicts in inner loops, and unnecessary computation.

**Optimizations below:**
- **Pre-allocate** and reuse objects when possible.
- **Avoid repeated DataFrame lookups** (e.g., avoid repeated groupby and `unique()`, instead reuse the results).
- Use **list comprehensions and bulk operations** when possible.
- **Minimize string formatting/creation** within tight loops.
- Pre-calculate constants used in many places within a function.
- Use **tuple unpacking and fast indexing**.
- Inline simple dictionary lookups.
- **Avoid rebuilding markers and dicts** on every loop iteration if the content does not depend on loop variables.

**Note**: Changes are **only** made for performance - all signatures, basic logic, and original comments are preserved.




---

**Key details and changes:**

- **All marker/trace dicts** are only constructed once when possible, and reused.
- **All DataFrame column extraction** uses `.values`, which returns fast NumPy arrays, and Index objects for uniqueness.
- Grouping is always performed once, and **group lookups are done using dicts**.
- String formatting in `_return_label` uses fast f-strings.
- In `_annotation_dict`, all constant expressions are at the top and each branch is as small as possible.
- The plotting assembly (`append_trace`) is unaltered but should be the only high cost left.

This approach maximally preserves compatibility and interface and will be noticeably faster especially for larger datasets and higher numbers of plots. If you need even more extreme performance (such as heavy annotation/label generation for massive grids), let me know and further vectorization approaches (possibly outside of Plotly itself) can be considered.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 24, 2025
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 May 24, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants