-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Description
Describe the issue:
In pandas-stubs
, we are trying to properly type the reverse operators, e.g., __radd__()
. If a user has code that adds a numpy
array to a pandas Series
, the result is a Series
. But the way that numpy
is typed causes the static typing to report the result as an ndarray
.
import pandas as pd
import numpy as np
s = pd.Series([1,2,3])
na = np.array([4,5,6])
r1 = s + na # Is a Series
r2 = na + s # Also a Series
What is happening is that the declaration of __add__()
in numpy
typing is causing r2
to be revealed as a numpy type.
I created a simple (and definitely incomplete) example that illustrates the behavior.
Reproduce the code example:
from __future__ import annotations
from typing import Any, reveal_type
import numpy as np
class MySeries:
def __init__(self, arr: list[int]):
self._arr = arr
def __array__(self, dtype=...):
return np.array(self._arr)
def __array_ufunc__(
self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
):
if ufunc == np.add:
i1 = inputs[0]
i2 = inputs[1]
assert isinstance(i1, np.ndarray)
assert isinstance(i2, MySeries)
return i2.__radd__(i1)
else:
raise RuntimeError("Not supported")
def __add__(self, other: MySeries) -> MySeries:
if len(self._arr) != len(other._arr):
raise RuntimeError("Must t be same length")
result = [x + y for (x, y) in zip(self._arr, other._arr)]
return MySeries(result)
def __radd__(self, other: np.ndarray | list[int]) -> MySeries:
result = [val + other[i] for i, val in enumerate(self._arr)]
return MySeries(result)
def __repr__(self) -> str:
return "MySeries: " + self._arr.__repr__()
x = MySeries([1, 2, 3])
y = [4, 5, 6]
result = y + x
print(type(result), result)
reveal_type(result)
z = np.array(y)
result2 = z + x
print(type(result2), result2)
reveal_type(result2)
Error message:
For `result2`, the revealed type by `pyright` is Type of "result2" is "ndarray[tuple[int, ...], dtype[numpy.bool[builtins.bool]]]"
Python and NumPy Versions:
python 3.12, numpy 2.2.1
Type-checker version and settings:
pyright 1.1.403, default settings
Additional typing packages.
None that are relevant.
More on this:
At runtime, result2
will be MySeries
.
So the question is whether the static typing in numpy
can reflect that __radd__()
is called when __array__()
and __array_ufunc__()
are implemented (as they are in pandas)
This might be a case where we need "negative typing", i.e., we don't want numpy __add__()
to match if both __array__()
and __array_ufunc__()
are implemented