SEP 2 - Add worker_cache #3

johnskopis · 2019-02-16T06:05:02Z

No description provided.

johnskopis · 2019-02-17T12:07:29Z

Code examples are here: https://github.com/saltstack/salt/pull/51687/files
exposing is here: https://github.com/saltstack/salt/pull/51687/files

I can update with an example showing how to use inside an ext_pillar

Cache invalidation is handled by the user-supplied fun. The example showing intended usage will show this

cachedout · 2019-02-19T13:16:48Z

If we're going to head down this route, I would propose that instead of building this internal to each workers, we instead explore a fully-pluggable cache system be designed that workers reach out to. (One of these systems could even be sidecars to MWorkers, perhaps.)

I'd like to, for example, be able to put memcache next to a master and let it handle caching for me instead of doing it inside Salt.

This gives people to re-use whatever caching layers they may already have in their infrastructures and it avoids the overhead of putting this layer directly in the workers.

johnskopis · 2019-02-19T17:14:24Z

consider the extreme case where you need to deserialize 10MB of data from an external cache in an on-demand external pillar. a cache hit is still expensive for the worker as it pays this penalty. if you know that you're going to pay this penalty each time the pillar is rendered why not cache in memory? that said, using a shared cache as a second layer makes sense and is complimentary. I think the ext pillar author should use whatever cache is configured for the environment. if I want to cache in memory why shouldn't I be able to? the pr I sent might address some of your concerns?

…

On Tue, Feb 19, 2019, 13:16 Mike Place ***@***.***> wrote: If we're going to head down this route, I would propose that instead of building this internal to each workers, we instead explore a fully-pluggable cache system be designed that workers reach out to. (One of these systems could even be sidecars to MWorkers, perhaps.) I'd like to, for example, be able to put memcache next to a master and let it handle caching for me instead of doing it inside Salt. This gives people to re-use whatever caching layers they may already have in their infrastructures and it avoids the overhead of putting this layer directly in the workers. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgWHAyZQIM27XcHcLwoP7xBNiIbvMfVks5vO_lBgaJpZM4a-z1f> .

cachedout · 2019-02-25T17:48:56Z

@johnskopis I guess I'm not clear on the argument for an in-memory cache in the worker itself versus a caching layer external to the process that could also be memory-based. Writing a robust caching layers can be a lot of effort and I am just trying to understand the case for doing that internal to Salt when many good options could be taken off the shelf and used alongside.

johnskopis · 2019-02-25T21:13:32Z

rather than going around in circles here I think it might make sense to quantify the (de)serialization overhead I'm concerned about.

how long does it take to read 10mb of json (or yaml) from redis vs read 10mb pyobject from memory?

waynew · 2019-02-25T21:18:10Z

@johnskopis That's a good question! Have you done the comparison? I know I'm certainly interested in seeing/trying out a test case.

johnskopis · 2019-03-01T09:56:02Z

Probably a contrived example but is meant to emphasize the penalty:
https://github.com/johnskopis/microbench/tree/master/serial

An aside, I do a similar exercise every few years and am always a little surprised by the results.

[TEST] data_dir:  /tmp/tmpkzz3dr_g
[RUN] datadir:  /tmp/tmpyq56iwbv
Baseline 10 (copy) in 0.07908153533935547
<class 'dict'>
Loaded json from /tmp/tmpyq56iwbv/json/10 in 0.4411916732788086
Loaded json from redis:json/collection10 in 0.5286178588867188
<class 'dict'>
Loaded mp from /tmp/tmpyq56iwbv/mp/10 in 0.12621498107910156
Loaded mp from redis:mp/collection10 in 0.24647188186645508
<class 'format_pb2.Thing'>
Loaded pb from /tmp/tmpyq56iwbv/pb/10 in 0.21497273445129395
Loaded pb from redis:pb/collection10 in 0.35294318199157715
...
[TEST] data_dir:  /tmp/tmp19hg30ni
[RUN] datadir:  /tmp/tmpzdtm8e13
Baseline 10 (copy) in 0.0826869010925293
<class 'dict'>
Loaded json from /tmp/tmpzdtm8e13/json/10 in 0.4677577018737793
Loaded json from redis:json/collection10 in 0.6676075458526611
<class 'dict'>
Loaded mp from /tmp/tmpzdtm8e13/mp/10 in 0.16786527633666992
Loaded mp from redis:mp/collection10 in 0.3710751533508301
<class 'format_pb2.Thing'>
Loaded pb from /tmp/tmpzdtm8e13/pb/10 in 0.30977416038513184
Loaded pb from redis:pb/collection10 in 0.5043141841888428

I was thinking about this some more and I think it's more than just performance. As a salt user, If I want to use a memory cache I should be able to. Why should I have to setup a cache infrastructure if I just want to use memory as my backend?

I can definitely appreciate not wanting to have a cache embedded inside salt because there are certainly things built that do a much better job. I don't think there's any maintenance cost to use cachetools implementation.

What is the reason not to be flexible and allow users to choose whichever cache backend they want?

cachedout · 2019-03-01T16:43:39Z

@johnskopis There seems to be some confusion about what I'm suggesting. Any in-memory cache that's shipped with Salt is just fine. My point is that I don't think that cache should exist internal to the worker. For example, if each worker is maintaining its own cache, how do you ensure cache consistency across the fleet of MWorkers? Why spend memory for each worker to cache duplicate data?

If there's a SaltCache worker that workers can query, that would satisify both concerns. It's not a binary choice between an external system and an in-memory system that works with Salt but rather, these can be quite complimentary and could even be made to be a multi-tiered cache, which could be quite interesting.

johnskopis · 2019-03-02T01:45:56Z

@cachedout I opened the pr because it's not possible to cache an on demand pillar using an in memory caché. also the cache api does not support IMS. if you have a better option I'm open ears. to your point, memory is cheap and if I want to burn it I should be able to. what are the other options? IMS caching solves consistency issues... what do you suggest we do instead?

…

On Fri, Mar 1, 2019, 16:43 Mike Place ***@***.***> wrote: @johnskopis <https://github.com/johnskopis> There seems to be some confusion about what I'm suggesting. Any in-memory cache that's shipped with Salt is just fine. My point is that I don't think that cache should exist *internal to the worker*. For example, if each worker is maintaining its own cache, how do you ensure cache consistency? Why spend memory for *each* worker to cache duplicate data? If there's a SaltCache worker that workers can query, that would satisify both concerns. It's not a binary choice between an external system and an in-memory system that works with Salt but rather, these can be quite complimentary. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgWHEY4Pzuf8AxmVk_NSMfPI47dXP3Qks5vSVi8gaJpZM4a-z1f> .

cachedout · 2019-03-02T02:08:14Z

@johnskopis Caching in the way that you are proposing (internal to the master worker processes) does not solve consistency issues -- it creates them.

If you have twenty worker processes, each caching data independently in memory, you have twenty different caches operating at twenty different levels of freshness. What happens when the source pillar data changes? In order to ensure that you don't have some caches expiring before others, you have to be able to ensure that they all have the same copy of the cached data, lest you have a cache coherence problem, which in the case of a pillar would be Very Bad News (tm).

I have proposed multiple ways of solving this problem in a way that will work, cache data in-memory, while avoiding the cache coherence problem that I have just described. Please re-read my comments if they aren't clear.

johnskopis · 2019-03-02T10:31:43Z

the thinking is that the worker cache stores some meta data about the cache object, namely the etag, in the cache.the getter function takes a previously cached item as an argument, which it uses to perform a conditional GET of the underlying resource. caching immutable objects indefinitely or revalidation (conditional get) in every fetch is the *only* way to ensure consistency. using a single cache to serve inconsistent, ie stale, data doesn't address consistency but it does make it less of a concern. I want to be able to revalidate cached objects and I want to do it inside the worker process because that's where the `ext_pillar` runs. I can't cache inside my ext pillar because that module gets reloaded. The functionality I described in this rfc is meant to address this exact problem. The only thing you have suggested is to not do this inside the worker process. Following the same pattern, the user's ext_pillar should not exist in the worker process at all. instead the ext_pillar should be dumbed down to a GET request to an external service with the expectation that users deploy an ext_pillar server side car next to their salt master. I briefly consider this in the proposal under the use an external solution... is this accurate?

…

On Sat, Mar 2, 2019, 02:08 Mike Place ***@***.***> wrote: @johnskopis <https://github.com/johnskopis> Caching in the way that you are proposing (internal to the master worker processes) does *not* solve consistency issues -- it creates them. If you have twenty worker processes, each caching data independently in memory, you have fifty different caches operating at fifty different levels of freshness. What happens when the source pillar data changes? In order to ensure that you don't have some caches expiring before others, you have to be able to ensure that they all have the same copy of the cached data, lest you have a cache coherence problem, which in the case of a pillar would be Very Bad News (tm). I have proposed multiple ways of solving this problem in a way that will work, cache data in-memory, while avoiding the cache coherence problem that I have just described. Please re-read my comments if they aren't clear. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgWHATtJOWxTMxLJ1R5i6bukdZxodjRks5vSd0PgaJpZM4a-z1f> .

cachedout · 2019-03-02T11:19:30Z

Ah, I see. This sounds, though, like it's operating under the assumption that all external pillars call an HTTP endpoint that can support conditional fetching. While HTTP supports conditional GETs, there is no guarantee that the external pillar uses HTTP at all. How do you avoid cache coherence problems across multiple workers without the guarantee that you have the ability to conditionally fetch the object to begin with?

johnskopis · 2019-03-02T15:38:19Z

the best option is to store immutable objects in cache and have them evict when the cache is full. there's nothing wrong with using a shared ttl-based cache. in this scenario the worker cache would be configured to use the shared cache only. with a tiered L1(memory) and L2(shared) cache you should be able to just use the L2 cache. I didn't spend a lot of time here but since the tiered cache should "just work" (TM): `cache.l2.cache(...). I'm trying to be flexible and build stuff that works for everyone's subjectively "best option" because some times the objectively best option isn't practical.

…

On Sat, Mar 2, 2019, 11:19 Mike Place ***@***.***> wrote: Ah, I see. This sounds, though, like it's operating under the assumption that all external pillars call an HTTP endpoint that *can* support conditional fetching. While HTTP supports conditional GETs, there is no guarantee that the external pillar uses HTTP at all. How do you avoid cache coherence problems across multiple workers without the guarantee that you have the ability to conditionally fetch the object to begin with? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgWHHh8AXIKyvmrC2tmlbM8ZFNnIOa1ks5vSl5DgaJpZM4a-z1f> .

cachedout · 2019-03-02T19:58:28Z

I fear this is moving toward an unhelpful and pedantic argument, so let me just make this my last post and attempt to better explain my position here. Beyond this, I'll leave it to @saltstack/team-core and others to respond and give guidance about the best way to proceed here.

Cache coherency is a requirement for caching pillar data on a Salt master. If you run an orchestration, for example, and some machines receive different pillar data than others, it has to potential to result in a catastrophic outage in the managed infrastructure. This result, of course, is very different than IMS caching strategies that browsers use, where stale cache data results in an experience that is, at worst, generally just inconvenient and not at all dangerous.
TTLs used in workers with no shared knowledge will not achieve cache coherence in the manner that you describe. Imagine, for example, that you have a set of ten workers. You have a TTL set to thirty minutes on objects inserted into the cache from the upstream HTTP endpoint. Five requests come in for that data from downstream minions. You now have five workers with knowledge that will consider fresh for thirty minutes. Fifteen minutes after that, the upstream endpoint changes the object. Another request for that data comes in. What happens? Five workers give stale answers while the other five give fresh answers. This is a variation on the cache invalidation problem that @isbm mentions early in this thread.

To achieve coherence across multiple in-memory caches in the manner that you are proposing, there are a few ways one could go, none of them easy. You could implement a variation on write-back or write-through strategies. You've mentioned L1 caches, alluding to processor caches and many processor architectures handle the problem in this manner. In short, you'd have to design and implement a system by which each worker is updated about cached objects on other workers. Alternately, a request could force all workers to query the upstream system as a means to try to stay in sync but this is a bit of a scary way to implement coherence and beyond that, it totally breaks Salt's model of MWorker autonomy which is almost certainly a non-starter with Salt's core team.

The other option is the one I have advocated for during this conversation -- a type of shared memory approach. In my proposal, instead of having a literal segment of shared memory, you would instead have a single process providing caching services, acting as a singular point of truth for cached data from which workers could poll. It would be fine, as you rightly point out, to have a TTL and/or an IMS strategy in this layer. The key point is that all workers would need to query this layer which would provide consistent answers to workers on each request. This provides the added benefit of allowing a pluggable system by which hardened and performant such as memcache or Redis could be used.

Another option would be to put caches on workers but to use cache files on disk. This is not in-memory, as you describe as your goal above, but it would solve the problem and likely still be extremely performant as compared to a remote endpoint. There still needs to be consideration for file locking which would require special attention but Salt has facilities for handling that gracefully.

The tl;dr here is that I am all for a shared, in-memory cache that ships with Salt and can be used by workers but in-memory pillar caching should, not be placed inside the worker itself.

Hopefully that makes sense. I'm disengaging from this thread now and will leave it to others to follow-up. Cheers. :)

waynew

We're re-starting numbering with the SEP process, so this SEP will be number 2, so should be 0002-worker-cache.md.

waynew · 2019-03-19T19:10:16Z

0005-worker-cache.md

@@ -0,0 +1,210 @@
+- Feature Name: `worker_cache` (MWorker Cache / If-Modified-Since Cache IMSCache)
+- Start Date: 2019-02-13
+- RFC PR: (leave this empty)


Suggested change

- RFC PR: (leave this empty)

- RFC PR: https://github.com/saltstack/salt-enhancement-proposals/pull/3

waynew · 2019-03-19T19:15:16Z

So I'm not intimately familiar with all of the ins-and-outs of caching. I've experienced enough to know that here be dragons though 🙂

The tl;dr here is that I am all for a shared, in-memory cache that ships with Salt and can be used by workers but in-memory pillar caching should, not be placed inside the worker itself.

That sounds sensible to me. I like both the idea of having a cache, and it sounds like a bad idea to be placing it within the worker. Are you interested in incorporating that feedback and updating your proposal?

waynew · 2019-09-05T21:31:17Z

Hey @johnskopis are you interested in working on this, or should we go ahead and mark it as abandoned for someone else to pick up in the future?

johnskopis · 2019-09-05T22:37:30Z

Feel free to close as abandoned. Thanks

…

On Thu, Sep 5, 2019 at 10:31 PM Wayne Werner ***@***.***> wrote: Hey @johnskopis <https://github.com/johnskopis> are you interested in working on this, or should we go ahead and mark it as abandoned for someone else to pick up in the future? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AAMBMHCFYZSSAIDZNT6JZW3QIF3CPA5CNFSM4GX3HVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6A4BFQ#issuecomment-528597142>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMBMHGIFVIMN5FUL2XDF23QIF3CPANCNFSM4GX3HVPQ> .

Add worker_cache RFC

652cf1c

johnskopis mentioned this pull request Feb 16, 2019

RFC: Implement a MWorker cache saltstack/salt#51573

Closed

waynew changed the title ~~Add worker_cache RFC~~ Add worker_cache Mar 6, 2019

waynew suggested changes Mar 19, 2019

View reviewed changes

waynew changed the title ~~Add worker_cache~~ SEP 2 - Add worker_cache Apr 5, 2019

Akm0d self-requested a review May 3, 2019 18:26

waynew merged commit 652cf1c into saltstack:master Sep 6, 2019

waynew added the Abandoned Available for a new (or the original) champion to adopt. label Sep 6, 2019

	- RFC PR: (leave this empty)
	- RFC PR: https://github.com/saltstack/salt-enhancement-proposals/pull/3

SEP 2 - Add worker_cache #3

SEP 2 - Add worker_cache #3

Uh oh!

Conversation

johnskopis commented Feb 16, 2019

Uh oh!

johnskopis commented Feb 17, 2019

Uh oh!

cachedout commented Feb 19, 2019

Uh oh!

johnskopis commented Feb 19, 2019 via email

Uh oh!

cachedout commented Feb 25, 2019

Uh oh!

johnskopis commented Feb 25, 2019

Uh oh!

waynew commented Feb 25, 2019

Uh oh!

johnskopis commented Mar 1, 2019

Uh oh!

cachedout commented Mar 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnskopis commented Mar 2, 2019 via email

Uh oh!

cachedout commented Mar 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnskopis commented Mar 2, 2019 via email

Uh oh!

cachedout commented Mar 2, 2019

Uh oh!

johnskopis commented Mar 2, 2019 via email

Uh oh!

cachedout commented Mar 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

waynew left a comment

Choose a reason for hiding this comment

Uh oh!

waynew Mar 19, 2019

Choose a reason for hiding this comment

Uh oh!

waynew commented Mar 19, 2019

Uh oh!

waynew commented Sep 5, 2019

Uh oh!

johnskopis commented Sep 5, 2019 via email

Uh oh!

Uh oh!

cachedout commented Mar 1, 2019 •

edited

Loading

cachedout commented Mar 2, 2019 •

edited

Loading

cachedout commented Mar 2, 2019 •

edited

Loading