Skip to content

BUG: random: Fix handling of very small p in Generator.binomial. #29522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 6, 2025

Conversation

WarrenWeckesser
Copy link
Member

Use log1p to compute an expression of 1 - p with more precision. To not change the legacy interface, a copy of the unaltered function random_binomial_inversion is used in the legacy distribution code.


This example is from the unit test added in the PR. Before this fix, or when using the legacy interface, the values in x will all be 0:

In [28]: n = 50000000000
    ...: p = 5e-17
    ...: sample_size = 20000000

In [29]: x = np.random.binomial(n, p, size=sample_size)  # Legacy interface

In [30]: x.max()  # Values are all 0
Out[30]: np.int64(0)

With the fixed Generator.binomial method:

In [31]: rng = np.random.default_rng()

In [32]: x = rng.binomial(n, p, size=sample_size)  # Fixed binomial method

In [33]: x.mean()
Out[33]: np.float64(2.25e-06)

In [34]: n*p  # Expected mean
Out[34]: 2.4999999999999998e-06

Use `log1p` to compute an expression of `1 - p` with more precision.
To not change the legacy interface, a copy of the unaltered function
`random_binomial_inversion` is used in the legacy distributions code.
@WarrenWeckesser
Copy link
Member Author

WarrenWeckesser commented Aug 6, 2025

Hmmm... why are we flagging an expression such as 6*sigma as incorrectly formatted? That's PEP 8 compliant. See https://peps.python.org/pep-0008/#other-recommendations; in particular, the example hypot2 = x*x + y*y. Just below that, the line hypot2 = x * x + y * y is given as an example of what to not do.


FWIW, chatgpt gets it right:

Question:

According to PEP 8, how should the expression 'abs(x-a)<3*b' be formatted in Python?

Answer:

According to PEP 8, the expression:

[... yada yada ...]

Final answer:

abs(x - a) < 3*b

@WarrenWeckesser
Copy link
Member Author

ruff rule E226 isn't smart enough to understand the nuance of PEP 8's whitespace recommendations. Any objections to disabling it?

@ngoldbaum
Copy link
Member

ngoldbaum commented Aug 6, 2025

ruff rule E226 isn't smart enough to understand the nuance of PEP 8's whitespace recommendations. Any objections to disabling it?

IMO the version ruff suggests is clearer.

I agree we're being more strict than PEP 8 but IMO that isn't a reason not to format stuff in a uniform way. I also sometimes am annoyed by formatting choices in codebases but then I remember how much time I'm saving not having nitpicky arguments about formatting in code review or getting confused by different formatting conventions in different parts of a codebase.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the underlying statistics well enough to review the math but the implementation looks fine to me.

@WarrenWeckesser
Copy link
Member Author

WarrenWeckesser commented Aug 6, 2025

I'm fine with following guidelines that I don't necessarily agree with that have been adopted by the community, but I guess I'm not sure the current behavior was explicitly agreed to by the core devs (my activity waxes and wanes, so I might have missed an essential discussion), or is something that we ended up with accidentally, because it just happened to be the ruff default. If everyone really prefers "whitespace around all the operators"--e.g. 2 * i + 1 rather than (clearer-to-me and PEP 8-recommended) 2*i + 1--then so be it.

(B u t t o m e , i t i s j u s t t o o m u c h w h i t e s p a c e !)

@charris
Copy link
Member

charris commented Aug 6, 2025

IIRC, it came in when I committed the reformatting :) We are slowly working our way towards automatic formatting compatibility. If the original formatting is clearly better than the replacement, exceptions can be added.

@ngoldbaum
Copy link
Member

I don't mind too much either way - happy to defer to Warren's tastes on formatting math expressions :)

@WarrenWeckesser would you mind opening a separate PR for the ruff config change? And maybe ping @jorenham and @DimitriPapadopoulos since they did the bulk of the formatting PRs.

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Aug 6, 2025

I'm OK with it. While I like this rule in the context of smaller projects with fewer calculations, I tend to agree it might not fit projects such as NumPy. I just used the default rule set here, and probably shouldn't have.

@WarrenWeckesser
Copy link
Member Author

Another hmmm...

Perhaps someday--maybe someday soon!--we'll use a clever LLM for linting, and it will "understand" the readability talked about in PEP 8 well enough to know that, say, 2*i + 1 is preferable to 2 * i + 1, and x**2 + y**2 is preferable to x ** 2 + y ** 2, but an expression such as np.euler_gamma*longvarname[2*i-1]/np.pi*np.arctan2(x,x**2+1) really needs spaces around * and / to break up that long stream of characters, so it would recommend something like np.euler_gamma * longvarname[2*i - 1] / np.pi * np.arctan2(x, x**2 + 1). Until that time, if we want automated, standardized, mindless (and I mean that in a good way) formatting, I guess "spaces around all the arithmetic operators" is the best we can do. Sorry for the noise if that's what y'all figured out long ago.

So I will conform, and not lead the revolt against E226.

@charris charris merged commit 0653d08 into numpy:main Aug 6, 2025
86 of 87 checks passed
@charris
Copy link
Member

charris commented Aug 6, 2025

Thanks Warren.

@WarrenWeckesser WarrenWeckesser deleted the binomial-tiny branch August 6, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants