Skip to content

Conversation

rosecodym
Copy link
Contributor

@rosecodym rosecodym commented Jun 4, 2025

Description:

We have several metrics that have captured job ID as a dimension. This is, in a strict sense, "wrong" because the cardinality is unbounded, but our job counts have been low enough that we've been able to get away with it. However, recent exploration of a new distributed job technique has caused job counts to substantially increase, and this extra, "incorrect" dimension is now harder to justify.

We have been recently seeing some trouble with some of our Prometheus scrapes, and while I haven't drawn a direct connection between that trouble and the increased cardinality, I do know that we don't use the job ID dimension anywhere. We always sum it away! So this PR removes it. We're shouldn't be doing it, we're not using it, and it might be causing problems we're seeing.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rosecodym rosecodym requested review from a team as code owners June 4, 2025 22:29
@rosecodym rosecodym merged commit 5581f08 into main Jun 5, 2025
13 checks passed
@rosecodym rosecodym deleted the remove-job-id-from-scan-rate-metrics branch June 5, 2025 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants