Skip to content

Conversation

kei01234kei
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

To resolve the issue "Confusing use of TooManyRequests error for eviction."

Which issue(s) this PR is related to:

Fixes #106286

Special notes for your reviewer:

#106286 (comment)

Does this PR introduce a user-facing change?

Added the correct error when eviction is blocked due to the failSafe mechanism of the DisruptionController.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 21, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 21, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @kei01234kei. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 21, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 21, 2025
@kei01234kei
Copy link
Contributor Author

/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 21, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jul 21, 2025
@kei01234kei
Copy link
Contributor Author

kei01234kei commented Jul 22, 2025

Let me also assign you as reviewers because I saw you in the issue discussion.
/cc @tallclair @liggitt @atiratree

@janetkuo
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 30, 2025
err := errors.NewTooManyRequests("Cannot evict pod as it would violate the pod's disruption budget.", 0)
err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: policyv1.DisruptionBudgetCause, Message: fmt.Sprintf("The disruption budget %s needs %d healthy pods and has %d currently", pdb.Name, pdb.Status.DesiredHealthy, pdb.Status.CurrentHealthy)})
condition := meta.FindStatusCondition(pdb.Status.Conditions, policyv1.DisruptionAllowedCondition)
if condition.Status == metav1.ConditionFalse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition.Status will panic if condition is nil, which FindStatusCondition will return if the condition is not present

I'd suggest customizing how we construct the message based on CurrentHealthy / DesiredHealthy / presence of a SyncFailedReason or other False condition, with a sensible generic fallback to avoid being confusing. I'd also suggest keeping the existing message as-is if CurrentHealthy <= DesiredHealthy since that is not confusing.

condition := meta.FindStatusCondition(pdb.Status.Conditions, policyv1.DisruptionAllowedCondition)

var msg string
switch {
case pdb.Status.CurrentHealthy <= pdb.Status.DesiredHealthy:
  msg = fmt.Sprintf("The disruption budget %s needs %d healthy pods and has %d currently", pdb.Name, pdb.Status.DesiredHealthy, pdb.Status.CurrentHealthy)
case condition != nil && condition.Status == metav1.ConditionFalse && len(condition.Message) > 0 && condition.Reason == policy.SyncFailedReason:
  msg = fmt.Sprintf("The disruption budget %s does not allow evicting pods currently because it failed sync: %v", pdb.Name, condition.Message)
case condition != nil && condition.Status == metav1.ConditionFalse && len(condition.Message) > 0:
  msg = fmt.Sprintf("The disruption budget %s does not allow evicting pods currently: %v", pdb.Name, condition.Message)
default:
  msg = fmt.Sprintf("The disruption budget %s does not allow evicting pods currently", pdb.Name)
}

err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: policyv1.DisruptionBudgetCause, Message: msg})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the above flow

Perhaps, this part could also output conditions without a condition.Message and print also the condition.Reason

switch {
...
case condition != nil && condition.Status == metav1.ConditionFalse && len(condition.Message) > 0:
  msg = fmt.Sprintf("The disruption budget %s does not allow evicting pods currently: %v", pdb.Name, condition.Message)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt
Thank you for the good advice. I modified the code following your advice.

@atiratree
I changed the code to output the condition.Reason too, in condition != nil && condition.Status == metav1.ConditionFalse && len(condition.Message) > 0 case.

Resolve confusing use of TooManyRequests error for eviction

Copy link
Member

@atiratree atiratree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kei01234kei It would be great if we could test the new errors. I think we can add a new cases to this unit test

func TestEvictionIgnorePDB(t *testing.T) {

Btw, the name TestEvictionIgnorePDB does not describe all of its test cases well anymore. Because not all the cases ignore PDBs. The easiest way to fix this is as follows, IMO:

  • s/TestEviction/TestEvictionWithETCD
  • s/TestEvictionIgnorePDB/TestEviction

err := errors.NewTooManyRequests("Cannot evict pod as it would violate the pod's disruption budget.", 0)
err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: policyv1.DisruptionBudgetCause, Message: fmt.Sprintf("The disruption budget %s needs %d healthy pods and has %d currently", pdb.Name, pdb.Status.DesiredHealthy, pdb.Status.CurrentHealthy)})
condition := meta.FindStatusCondition(pdb.Status.Conditions, policyv1.DisruptionAllowedCondition)
if condition.Status == metav1.ConditionFalse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the above flow

Perhaps, this part could also output conditions without a condition.Message and print also the condition.Reason

switch {
...
case condition != nil && condition.Status == metav1.ConditionFalse && len(condition.Message) > 0:
  msg = fmt.Sprintf("The disruption budget %s does not allow evicting pods currently: %v", pdb.Name, condition.Message)
...

@kei01234kei kei01234kei force-pushed the resolve_confusiong_use_of_toomanerequests_error_for_eviction branch from ef05923 to ef94aed Compare August 30, 2025 06:33
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@kei01234kei
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 4, 2025
@k8s-ci-robot k8s-ci-robot requested a review from liggitt September 4, 2025 04:45
@kei01234kei
Copy link
Contributor Author

/label tide/merge-method-squash
/unhold

@k8s-ci-robot k8s-ci-robot added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 4, 2025
@kei01234kei kei01234kei force-pushed the resolve_confusiong_use_of_toomanerequests_error_for_eviction branch from ac72fae to aff1940 Compare September 4, 2025 05:07
@kei01234kei
Copy link
Contributor Author

unit test looks related

Yes. I modified the test.
modify test "the error includes the reason when the condition.Status …

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 4, 2025
@kei01234kei kei01234kei force-pushed the resolve_confusiong_use_of_toomanerequests_error_for_eviction branch from e9085b3 to aff1940 Compare September 4, 2025 07:55
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 4, 2025
@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

thanks, please squash to a single commit

modify test "the error includes the reason when the condition.Status is False"
@kei01234kei kei01234kei force-pushed the resolve_confusiong_use_of_toomanerequests_error_for_eviction branch from aff1940 to d014398 Compare September 4, 2025 14:43
@kei01234kei
Copy link
Contributor Author

kei01234kei commented Sep 4, 2025

thanks, please squash to a single commit

Done.
(Would it be better for me to squash the commits even if the tide/merge-method-squash label is added?)

@kei01234kei
Copy link
Contributor Author

/retest

@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 4, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7c5e5d13cb5cebc0e93e7dba9e956eaec5e9b5a9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kei01234kei, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kei01234kei
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit ddb015f into kubernetes:master Sep 4, 2025
13 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Sep 4, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Apps Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Confusing use of TooManyRequests error for eviction
6 participants