-
Notifications
You must be signed in to change notification settings - Fork 619
MQE: Count samples saved on the CSE #12169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
53ce5ba
to
2335849
Compare
When detecring a duplicate expression, set the sampleCountMultiplicator on the Vector/Matrix selector in the duplicated path. Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
2335849
to
ed3cbe6
Compare
Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
Signed-off-by: Innokentii Konstantinov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -42,6 +42,20 @@ type Selector struct { | |||
series *seriesList | |||
|
|||
seriesIdx int | |||
|
|||
// Sample count factor for stats counting when applying CSE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Sample count factor for stats counting when applying CSE. | |
// Sample count factor for stats counting at each step when applying common sub-expression elimination. |
if testCase.skipCompareWithPrometheus == "" { | ||
require.Equal(t, testCase.expectedTotalSamples, prometheusSamplesStats.TotalSamples, "invalid test case: expected total samples does not match value from Prometheus' engine") | ||
require.Equal(t, testCase.expectedTotalSamplesPerStep, prometheusSamplesStats.TotalSamplesPerStep, "invalid test case: expected per stepsamples does not match value from Prometheus' engine") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log the reason for skipping here?
} else {
t.Log(testCase.skipCompareWithPrometheus)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall almost looks good to me. But I left some suggestions anyway.
@@ -375,3 +377,23 @@ func (p *path) String() string { | |||
b.WriteRune(']') | |||
return b.String() | |||
} | |||
|
|||
// setSampleCountFactor sets SampleCountFactor on the VectorSelector and MatrixSelector nodes. | |||
// It sets it only if multipilcator was not set before to avoid overwriting it for nested duplicate expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits. I think multiplier
is a more common word for this.
// It sets it only if multipilcator was not set before to avoid overwriting it for nested duplicate expressions. | |
// It sets it only if multiplier was not set before to avoid overwriting it for nested duplicate expressions. |
if m.SampleCountFactor != 0 { | ||
selector.SetSampleCountFactor(m.SampleCountFactor) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we just push the non zero check != 0
inside SetSampleCountFactor
method itself? With that we can remove this if branch and also the one in vector_selector.go.
func (s *Selector) SetSampleCountFactor(factor uint32) { | ||
s.sampleCountFactor = factor | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in another comment, since we have setter method here, just add logic to set only for non-zero factor. Hence we can remove non-zero check in the caller.
func (s *Selector) SetSampleCountFactor(factor uint32) { | |
s.sampleCountFactor = factor | |
} | |
func (s *Selector) SetSampleCountFactor(factor uint32) { | |
if factor != 0 { | |
s.sampleCountFactor = factor | |
} | |
} |
return q.Stats().Samples | ||
} | ||
|
||
testCases := map[string]struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the testCases below are for instant query. Should we have testCase for range query too? So that we can see the case where totalSamples is not equal with totalSamplesPerStep.
@@ -103,7 +103,7 @@ func (m *RangeVectorSelector) NextStepSamples() (*types.RangeVectorStepData, err | |||
m.stepData.RangeStart = rangeStart | |||
m.stepData.RangeEnd = rangeEnd | |||
|
|||
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount()) | |||
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, (int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount())*int64(m.Selector.GetSampleCountFactor())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some castings, it takes sometime to see what is being computed. What if we extract some of the operation to local variables to make it clearer?
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, (int64(m.stepData.Floats.Count())+m.stepData.Histograms.EquivalentFloatSampleCount())*int64(m.Selector.GetSampleCountFactor())) | |
floatCount := int64(m.stepData.Floats.Count()) | |
histogramCount := m.stepData.Histograms.EquivalentFloatSampleCount() | |
sampleFactor := int64(m.Selector.GetSampleCountFactor()) | |
totalSamples := (floatCount + histogramCount) * sampleFactor | |
m.Stats.IncrementSamplesAtTimestamp(m.stepData.StepT, totalSamples)``` |
switch selector := node.(type) { | ||
case *core.VectorSelector: | ||
if selector.SampleCountFactor == 0 { | ||
selector.SampleCountFactor = factor | ||
} | ||
return | ||
case *core.MatrixSelector: | ||
if selector.SampleCountFactor == 0 { | ||
selector.SampleCountFactor = factor | ||
} | ||
return | ||
} | ||
for _, child := range node.Children() { | ||
setSampleCountFactor(child, factor) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two empty return
s, what if we just add default clause to catch all?
switch selector := node.(type) { | |
case *core.VectorSelector: | |
if selector.SampleCountFactor == 0 { | |
selector.SampleCountFactor = factor | |
} | |
return | |
case *core.MatrixSelector: | |
if selector.SampleCountFactor == 0 { | |
selector.SampleCountFactor = factor | |
} | |
return | |
} | |
for _, child := range node.Children() { | |
setSampleCountFactor(child, factor) | |
} | |
switch selector := node.(type) { | |
case *core.VectorSelector: | |
if selector.SampleCountFactor == 0 { | |
selector.SampleCountFactor = factor | |
} | |
case *core.MatrixSelector: | |
if selector.SampleCountFactor == 0 { | |
selector.SampleCountFactor = factor | |
} | |
default: | |
for _, child := range node.Children() { | |
setSampleCountFactor(child, factor) | |
} | |
} |
What this PR does
When detecting a duplicate expression, set the sampleCountMultiplicator on the Vector/Matrix selector in the duplicated path. It's needed to preserve "original" number of samples processed, as there were no CSE applied. The multiplication is set on a planning stage - code
Note that during this work, I found an edge case, when MQE returns 0 samples processed when CSE is applied - plan is to tackle it separately. Here is a skipped test covering it.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. If changelog entry is not needed, please add thechangelog-not-needed
label to the PR.about-versioning.md
updated with experimental features.