Skip to content

TYP: Reverse operators with pandas returning wrong types #29486

@Dr-Irv

Description

@Dr-Irv

Describe the issue:

In pandas-stubs, we are trying to properly type the reverse operators, e.g., __radd__(). If a user has code that adds a numpy array to a pandas Series, the result is a Series. But the way that numpy is typed causes the static typing to report the result as an ndarray.

import pandas as pd
import numpy as np
s = pd.Series([1,2,3])
na = np.array([4,5,6])
r1 = s + na  # Is a Series
r2 = na + s  # Also a Series

What is happening is that the declaration of __add__() in numpy typing is causing r2 to be revealed as a numpy type.

I created a simple (and definitely incomplete) example that illustrates the behavior.

Reproduce the code example:

from __future__ import annotations
from typing import Any, reveal_type

import numpy as np


class MySeries:
    def __init__(self, arr: list[int]):
        self._arr = arr

    def __array__(self, dtype=...):
        return np.array(self._arr)

    def __array_ufunc__(
        self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
    ):
        if ufunc == np.add:
            i1 = inputs[0]
            i2 = inputs[1]
            assert isinstance(i1, np.ndarray)
            assert isinstance(i2, MySeries)
            return i2.__radd__(i1)
        else:
            raise RuntimeError("Not supported")

    def __add__(self, other: MySeries) -> MySeries:
        if len(self._arr) != len(other._arr):
            raise RuntimeError("Must t be same length")
        result = [x + y for (x, y) in zip(self._arr, other._arr)]
        return MySeries(result)

    def __radd__(self, other: np.ndarray | list[int]) -> MySeries:
        result = [val + other[i] for i, val in enumerate(self._arr)]
        return MySeries(result)

    def __repr__(self) -> str:
        return "MySeries: " + self._arr.__repr__()


x = MySeries([1, 2, 3])
y = [4, 5, 6]
result = y + x
print(type(result), result)
reveal_type(result)

z = np.array(y)
result2 = z + x
print(type(result2), result2)
reveal_type(result2)

Error message:

For `result2`, the revealed type by `pyright` is Type of "result2" is "ndarray[tuple[int, ...], dtype[numpy.bool[builtins.bool]]]"

Python and NumPy Versions:

python 3.12, numpy 2.2.1

Type-checker version and settings:

pyright 1.1.403, default settings

Additional typing packages.

None that are relevant.

More on this:
At runtime, result2 will be MySeries.

So the question is whether the static typing in numpy can reflect that __radd__() is called when __array__() and __array_ufunc__() are implemented (as they are in pandas)

This might be a case where we need "negative typing", i.e., we don't want numpy __add__() to match if both __array__() and __array_ufunc__() are implemented

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions