-
Notifications
You must be signed in to change notification settings - Fork 9.7k
OTLP to directly writes to storage.Appender #16855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: David Ashpole <[email protected]>
Signed-off-by: David Ashpole <[email protected]>
Signed-off-by: David Ashpole <[email protected]>
ef9614b
to
119c500
Compare
Signed-off-by: David Ashpole <[email protected]>
Signed-off-by: David Ashpole <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to adopt this into a branch in mimir-prometheus to POC it there as well
} | ||
} | ||
|
||
// CombinedAppender is similar to storage.Appender, but combines updates to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's have this on the top, this is the important bit :)
When adopting to mimir-prometheus, I've ran into a problem where I have many conflicts with the customizations done in mimir-prometheus. Turns out most of it will go away with this PR, since we can hide behind interface, so I'd like to throw them away and resync the storage/remote/otlptranslator to upstream, but at which point? In the mean time #16842 came in from 3.5 release branch, which is why this PR is also conflicting. Was it intentional to have #16842 on main @ArthurSens ? I don't want to start before this is sorted out and maybe this PR #16855 is rebased @dashpole ? |
Yes, it was intentional. We probably will revert the revert now, but there were some unfortunate misalignments on the spec while 3.5 was being released, so we removed the feature. We'll get it back now |
Ack, so we don't need to rebase this PR now and I can essentially sync with upstream at the same base as this PR. And we'll just wait for the revert-revert to move forward. |
} | ||
|
||
// NewOTLPWriteHandler creates a http.Handler that accepts OTLP write requests and | ||
// writes them to the provided appendable. | ||
func NewOTLPWriteHandler(logger *slog.Logger, _ prometheus.Registerer, appendable storage.Appendable, configFunc func() config.Config, opts OTLPOptions) http.Handler { | ||
func NewOTLPWriteHandler(logger *slog.Logger, reg prometheus.Registerer, appendable storage.Appendable, configFunc func() config.Config, opts OTLPOptions) http.Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take an appendable CombinedAppender
interface , not storage.Appendable
so we can seed it with a different implementation.
Of course it would mean that in api.go NewAPI
should also take it as an input parameter (dependency inject) so we can implement in Mimir and not in mimir-prometheus - removing the deviation from upstream completely.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What public function does mimir use? NewAPI? Or NewOTLPWriteHandler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the SIG meeting. They use NewAPI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on for now, I might be able to use NewPrometheusConverter
directly. Working on it.
Adopt prometheus/prometheus#16855 Initial code with samples and exemplars only, very basic. Signed-off-by: György Krajcsovits <[email protected]>
I've finally had time to get started in Mimir: https://github.com/grafana/mimir/pull/12152/files#diff-077c782fc5e14926d78007578f8fed1d4f760875ee2311b41e8391fb16f3270dR28 The good: delete the code duplication from Prometheus and move Mimir specific code 100% into Mimir. The not so good:
|
It should be easy enough to add the storage.SeriesRef. Were you able to measure the performance impact of #2? Would using the |
👍
That's fair, I'm not basing this on measurement, just simple deduction from looking at what the code does. Slicelabels is a build tag as far as I understand, so not that easy use? Anyway, I'll add seriesref on my branch and try to do some measurements. |
Well, this turned out to not that easy actually. Nothing in the API says that data points for a metric share the same attributes, so I would have to remember the labels and compare to the previous labels to know if I can reuse the reference and also we turn histogram data points into multiple series so keeping track of those is even more complicated. It's much simpler, what you did to keep track of hashes (and later conflict overflow). I'll do the same instead and do a map of labels hash -> position in the write request. |
outOfOrderExemplars prometheus.Counter | ||
ingestCTZeroSample bool | ||
// Used to ensure we only update metadata and created timestamps once, and to share storage.SeriesRefs. | ||
refs map[uint64]storage.SeriesRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To deal with collisions, you also need to keep track of the labels.
I've updated my PR . So far my results are too good to believe them, so I'm trying to figure out if it's true or I missed something. grafana/mimir#12152 |
I thought there was a bug as I got some duplicate sample errors in Mimir when using OTEL collector+Prometheus receiver -> OTLP exporter -> Mimir. Turns out it was the Prometheus receiver faking start times that sometimes coincided with actual sample timestamps. I turned off this by using |
I'll rebase, re-run benchmarks, and mark this ready for review once @krajorama has confirmed that this approach will work for Mimir. |
refs map[uint64]storage.SeriesRef | ||
} | ||
|
||
func (b *combinedAppender) AppendSample(ls labels.Labels, meta metadata.Metadata, t, ct int64, v float64, es []exemplar.Exemplar) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding the metadata to remote write, we need to use the metric family name and the old code simply set the name in the metadata to the OTEL name which doesn't have the magic _count, _bucket, _sum suffixes. However now I get the full name only, not the base name. And there will be metadata for individual series, not families, which confuses Grafana:
The interface should pass the base name as well, so we don't have to calculate it. For cases where the storage does not yet support per series metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
// CombinedAppender is similar to storage.Appender, but combines updates to
// metadata, created timestamps, exemplars and samples into a single call.
type CombinedAppender interface {
// AppendSample appends a sample and related exemplars, metadata, and
// created timestamp to the storage.
AppendSample(metricFamily string, ls labels.Labels, meta metadata.Metadata, t, ct int64, v float64, es []exemplar.Exemplar) error
// AppendSample appends a histogram and related exemplars, metadata, and
// created timestamp to the storage.
AppendHistogram(metricFamily string, ls labels.Labels, meta metadata.Metadata, t, ct int64, h *histogram.Histogram, es []exemplar.Exemplar) error
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need it for AppendHistogram?
Remove intermediary OTLP -> PRW 1.0 step before writing to the appender. It also adds a CombinedAppender interface, which consolidates error handling, and may make it possible to re-use the library in Mimir, which currently uses it to translate to PRW 1.0.
Similar to #16827, but maintains existing code structure and unit tests.
Alternative to #16784.
Notable Changes:
otlp_without_metadata_appended_samples_total
instead ofremote_write_without_metadata_appended_samples_total
.otlp_out_of_order_exemplars_total
replaces "Error on ingesting out-of-order exemplars" warning log.appender.UpdateMetadata
with metadata. The unit is converted using theotlptranslator
library.appender.AppendHistogramCTZeroSample
andappender.AppendCTZeroSample
when the created-timestamp-zero-ingestion feature is enabled.TODO:
Benchmark summary: 36% less CPU, 38% fewer bytes, 19% fewer allocations