Skip to content

Conversation

swtaarrs
Copy link
Member

@swtaarrs swtaarrs commented May 7, 2024

drop_gil() assumes that its caller is attached, which means that the current thread holds the GIL if and only if the GIL is enabled, and the enabled-state of the GIL won't change. This isn't true, though, because detach_thread() calls _PyEval_ReleaseLock() after detaching and _PyThreadState_DeleteCurrent() calls it after removing the current thread from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last attached, in PyThreadState._status.holds_gil, and check this in drop_gil() instead of gil->enabled. As part of this, add an explicit thread_dying argument to drop_gil(), separating that concept from whether or not it's safe to dereference tstate.

This fixes a crash in test_multiprocessing_pool_circular_import(), so I've reenabled it.

@swtaarrs swtaarrs added skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels May 7, 2024
@swtaarrs swtaarrs requested a review from colesbury May 7, 2024 23:19
@swtaarrs swtaarrs marked this pull request as ready for review May 7, 2024 23:30
@swtaarrs swtaarrs added the 3.13 bugs and security fixes label May 7, 2024
@swtaarrs swtaarrs removed the 3.13 bugs and security fixes label May 7, 2024
@swtaarrs
Copy link
Member Author

swtaarrs commented May 8, 2024

I realized while writing up an explanation of my changes that the commit I just pushed isn't quite right. I changed drop_gil() to always take a non-NULL tstate by adding an explicit thread_dying argument. But in this case, tstate can be a dangling pointer so it's actually important to not dereference it. I'll come up with a different approach.

@swtaarrs swtaarrs marked this pull request as draft May 8, 2024 19:25
@swtaarrs swtaarrs marked this pull request as ready for review May 8, 2024 20:58
@swtaarrs
Copy link
Member Author

swtaarrs commented May 8, 2024

This should be ready now - I updated the description to reflect the changes.

@swtaarrs swtaarrs changed the title gh-118727: Don't drop the GIL during thread deletion unless the current thread holds it gh-118727: Don't drop the GIL in drop_gil() unless the current thread holds it May 8, 2024
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. I think it's worth having tstate->_status.holds_gil in the default build as well.

@swtaarrs swtaarrs added the needs backport to 3.13 bugs and security fixes label May 9, 2024
@swtaarrs
Copy link
Member Author

swtaarrs commented May 9, 2024

(moved from the issue where I accidentally posted this)

test_importlib got stuck in the tsan build again, which this should've fixed. I'll see what I can figure out.

@ericsnowcurrently
Copy link
Member

Does applying @suppress_immortalization() make a difference. I'm not suggesting that as a solution (there's probably a deeper issue to solve), but whether or not the decorator makes a difference might be helpful to know.

@colesbury
Copy link
Contributor

@swtaarrs - I observed a hang when running pool_in_threads.py locally with this PR. (It may or may not be the same as the TSan hang).

I think there's an issue with FORCE_SWITCHING and the dynamic enabling of the GIL:

  • A thread disables (and releases) the GIL via _PyEval_DisableGIL. If _PY_GIL_DROP_REQUEST_BIT is set, it waits in the FORCE_SWITCHING section until another thread acquires the GIL
  • A second thread acquires the GIL. However, it sees that the GIL is now disabled and releases the GIL without notifying the first thread

@swtaarrs
Copy link
Member Author

I think there's an issue with FORCE_SWITCHING and the dynamic enabling of the GIL

Thanks, good catch. I should probably pull drop_gil_impl() back out for use by _PyEval_DisableGIL() then. The FORCE_SWITCHING code doesn't apply once the GIL is disabled, and nothing else in drop_gil() applies to _PyEval_DisableGIL() either. I'll also take another look at everything at the end of take_gil() that's skipped in this case.

@swtaarrs
Copy link
Member Author

The commit I just pushed should address the FORCE_SWITCHING issue by reverting my drop_gil_impl() change. I also added a line to clear _PY_GIL_DROP_REQUEST_BIT when disabling the GIL, so that thread doesn't keep hitting its eval_breaker once it resumes executing.

I'm running the tsan tests in a loop locally to see if it hangs again.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colesbury
Copy link
Contributor

@ericsnowcurrently - would you like to re-review this or should I go ahead and merge it?

@ericsnowcurrently
Copy link
Member

I'll take a quick look tomorrow.

@swtaarrs
Copy link
Member Author

@ericsnowcurrently will you have time to take a look today?

@ericsnowcurrently
Copy link
Member

Sorry for the delay. I don't have any objections. If you both are confident about the solution then I'm on board. 😄

@colesbury
Copy link
Contributor

Great! As a heads up, there is a different bug that affects the same test case (#119369).

@colesbury colesbury merged commit be1dfcc into python:main May 23, 2024
@miss-islington-app
Copy link

Thanks @swtaarrs for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 23, 2024
…t thread holds it (pythonGH-118745)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
(cherry picked from commit be1dfcc)

Co-authored-by: Brett Simmers <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 23, 2024

GH-119474 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label May 23, 2024
colesbury pushed a commit that referenced this pull request May 23, 2024
…nt thread holds it (GH-118745) (#119474)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
(cherry picked from commit be1dfcc)

Co-authored-by: Brett Simmers <[email protected]>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…t thread holds it (python#118745)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) skip news topic-free-threading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants