Skip to content

MQE: Count samples saved on the CSE #12169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Konstantinov-Innokentii
Copy link
Contributor

@Konstantinov-Innokentii Konstantinov-Innokentii commented Jul 23, 2025

What this PR does

When detecting a duplicate expression, set the sampleCountMultiplicator on the Vector/Matrix selector in the duplicated path. It's needed to preserve "original" number of samples processed, as there were no CSE applied. The multiplication is set on a planning stage - code
Note that during this work, I found an edge case, when MQE returns 0 samples processed when CSE is applied - plan is to tackle it separately. Here is a skipped test covering it.

Checklist

  • Tests updated.
  • [N/A] Documentation added.
  • [N/A] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • [N/A] about-versioning.md updated with experimental features.

@Konstantinov-Innokentii Konstantinov-Innokentii changed the title Count samples saved on the CSE. MQE: Count samples saved on the CSE Jul 23, 2025
@Konstantinov-Innokentii Konstantinov-Innokentii added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Jul 23, 2025
When detecring a duplicate expression, set the sampleCountMultiplicator on the Vector/Matrix selector in the duplicated path.

Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
@Konstantinov-Innokentii Konstantinov-Innokentii marked this pull request as ready for review July 24, 2025 07:25
@Konstantinov-Innokentii Konstantinov-Innokentii requested a review from a team as a code owner July 24, 2025 07:25
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -42,6 +42,20 @@ type Selector struct {
series *seriesList

seriesIdx int

// Sample count factor for stats counting when applying CSE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Sample count factor for stats counting when applying CSE.
// Sample count factor for stats counting at each step when applying common sub-expression elimination.

if testCase.skipCompareWithPrometheus == "" {
require.Equal(t, testCase.expectedTotalSamples, prometheusSamplesStats.TotalSamples, "invalid test case: expected total samples does not match value from Prometheus' engine")
require.Equal(t, testCase.expectedTotalSamplesPerStep, prometheusSamplesStats.TotalSamplesPerStep, "invalid test case: expected per stepsamples does not match value from Prometheus' engine")
}
Copy link
Contributor

@56quarters 56quarters Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log the reason for skipping here?

} else {
    t.Log(testCase.skipCompareWithPrometheus)
}

Copy link
Contributor

@lamida lamida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall almost looks good to me. But I left some suggestions anyway.

@@ -375,3 +377,23 @@ func (p *path) String() string {
b.WriteRune(']')
return b.String()
}

// setSampleCountFactor sets SampleCountFactor on the VectorSelector and MatrixSelector nodes.
// It sets it only if multipilcator was not set before to avoid overwriting it for nested duplicate expressions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits. I think multiplier is a more common word for this.

Suggested change
// It sets it only if multipilcator was not set before to avoid overwriting it for nested duplicate expressions.
// It sets it only if multiplier was not set before to avoid overwriting it for nested duplicate expressions.

Comment on lines +82 to +84
if m.SampleCountFactor != 0 {
selector.SetSampleCountFactor(m.SampleCountFactor)
}
Copy link
Contributor

@lamida lamida Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just push the non zero check != 0 inside SetSampleCountFactor method itself? With that we can remove this if branch and also the one in vector_selector.go.

Comment on lines +50 to +52
func (s *Selector) SetSampleCountFactor(factor uint32) {
s.sampleCountFactor = factor
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in another comment, since we have setter method here, just add logic to set only for non-zero factor. Hence we can remove non-zero check in the caller.

Suggested change
func (s *Selector) SetSampleCountFactor(factor uint32) {
s.sampleCountFactor = factor
}
func (s *Selector) SetSampleCountFactor(factor uint32) {
if factor != 0 {
s.sampleCountFactor = factor
}
}

return q.Stats().Samples
}

testCases := map[string]struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the testCases below are for instant query. Should we have testCase for range query too? So that we can see the case where totalSamples is not equal with totalSamplesPerStep.

@@ -103,7 +103,7 @@ func (m *RangeVectorSelector) NextStepSamples() (*types.RangeVectorStepData, err
m.stepData.RangeStart = rangeStart
m.stepData.RangeEnd = rangeEnd

m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount())
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, (int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount())*int64(m.Selector.GetSampleCountFactor()))
Copy link
Contributor

@lamida lamida Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some castings, it takes sometime to see what is being computed. What if we extract some of the operation to local variables to make it clearer?

Suggested change
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, (int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount())*int64(m.Selector.GetSampleCountFactor()))
floatCount := int64(m.stepData.Floats.Count())
histogramCount := m.stepData.Histograms.EquivalentFloatSampleCount()
sampleFactor := int64(m.Selector.GetSampleCountFactor())
totalSamples := (floatCount + histogramCount) * sampleFactor
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, totalSamples)```

Comment on lines +384 to +398
switch selector := node.(type) {
case *core.VectorSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
return
case *core.MatrixSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
return
}
for _, child := range node.Children() {
setSampleCountFactor(child, factor)
}
Copy link
Contributor

@lamida lamida Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two empty returns, what if we just add default clause to catch all?

Suggested change
switch selector := node.(type) {
case *core.VectorSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
return
case *core.MatrixSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
return
}
for _, child := range node.Children() {
setSampleCountFactor(child, factor)
}
switch selector := node.(type) {
case *core.VectorSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
case *core.MatrixSelector:
if selector.SampleCountFactor == 0 {
selector.SampleCountFactor = factor
}
default:
for _, child := range node.Children() {
setSampleCountFactor(child, factor)
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-not-needed PRs that don't need a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants