Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

SEP 2 - Add worker_cache #3

Merged
merged 1 commit into from
Sep 6, 2019
Merged

Conversation

johnskopis
Copy link
Contributor

No description provided.

@johnskopis
Copy link
Contributor Author

Code examples are here: https://github.com/saltstack/salt/pull/51687/files
exposing is here: https://github.com/saltstack/salt/pull/51687/files

I can update with an example showing how to use inside an ext_pillar

Cache invalidation is handled by the user-supplied fun. The example showing intended usage will show this

@cachedout
Copy link
Contributor

If we're going to head down this route, I would propose that instead of building this internal to each workers, we instead explore a fully-pluggable cache system be designed that workers reach out to. (One of these systems could even be sidecars to MWorkers, perhaps.)

I'd like to, for example, be able to put memcache next to a master and let it handle caching for me instead of doing it inside Salt.

This gives people to re-use whatever caching layers they may already have in their infrastructures and it avoids the overhead of putting this layer directly in the workers.

@johnskopis
Copy link
Contributor Author

johnskopis commented Feb 19, 2019 via email

@cachedout
Copy link
Contributor

@johnskopis I guess I'm not clear on the argument for an in-memory cache in the worker itself versus a caching layer external to the process that could also be memory-based. Writing a robust caching layers can be a lot of effort and I am just trying to understand the case for doing that internal to Salt when many good options could be taken off the shelf and used alongside.

@johnskopis
Copy link
Contributor Author

rather than going around in circles here I think it might make sense to quantify the (de)serialization overhead I'm concerned about.

how long does it take to read 10mb of json (or yaml) from redis vs read 10mb pyobject from memory?

@waynew
Copy link
Contributor

waynew commented Feb 25, 2019

@johnskopis That's a good question! Have you done the comparison? I know I'm certainly interested in seeing/trying out a test case.

@johnskopis
Copy link
Contributor Author

Probably a contrived example but is meant to emphasize the penalty:
https://github.com/johnskopis/microbench/tree/master/serial

An aside, I do a similar exercise every few years and am always a little surprised by the results.

[TEST] data_dir:  /tmp/tmpkzz3dr_g
[RUN] datadir:  /tmp/tmpyq56iwbv
Baseline 10 (copy) in 0.07908153533935547
<class 'dict'>
Loaded json from /tmp/tmpyq56iwbv/json/10 in 0.4411916732788086
Loaded json from redis:json/collection10 in 0.5286178588867188
<class 'dict'>
Loaded mp from /tmp/tmpyq56iwbv/mp/10 in 0.12621498107910156
Loaded mp from redis:mp/collection10 in 0.24647188186645508
<class 'format_pb2.Thing'>
Loaded pb from /tmp/tmpyq56iwbv/pb/10 in 0.21497273445129395
Loaded pb from redis:pb/collection10 in 0.35294318199157715
...
[TEST] data_dir:  /tmp/tmp19hg30ni
[RUN] datadir:  /tmp/tmpzdtm8e13
Baseline 10 (copy) in 0.0826869010925293
<class 'dict'>
Loaded json from /tmp/tmpzdtm8e13/json/10 in 0.4677577018737793
Loaded json from redis:json/collection10 in 0.6676075458526611
<class 'dict'>
Loaded mp from /tmp/tmpzdtm8e13/mp/10 in 0.16786527633666992
Loaded mp from redis:mp/collection10 in 0.3710751533508301
<class 'format_pb2.Thing'>
Loaded pb from /tmp/tmpzdtm8e13/pb/10 in 0.30977416038513184
Loaded pb from redis:pb/collection10 in 0.5043141841888428

I was thinking about this some more and I think it's more than just performance. As a salt user, If I want to use a memory cache I should be able to. Why should I have to setup a cache infrastructure if I just want to use memory as my backend?

I can definitely appreciate not wanting to have a cache embedded inside salt because there are certainly things built that do a much better job. I don't think there's any maintenance cost to use cachetools implementation.

What is the reason not to be flexible and allow users to choose whichever cache backend they want?

@cachedout
Copy link
Contributor

cachedout commented Mar 1, 2019

@johnskopis There seems to be some confusion about what I'm suggesting. Any in-memory cache that's shipped with Salt is just fine. My point is that I don't think that cache should exist internal to the worker. For example, if each worker is maintaining its own cache, how do you ensure cache consistency across the fleet of MWorkers? Why spend memory for each worker to cache duplicate data?

If there's a SaltCache worker that workers can query, that would satisify both concerns. It's not a binary choice between an external system and an in-memory system that works with Salt but rather, these can be quite complimentary and could even be made to be a multi-tiered cache, which could be quite interesting.

@johnskopis
Copy link
Contributor Author

johnskopis commented Mar 2, 2019 via email

@cachedout
Copy link
Contributor

cachedout commented Mar 2, 2019

@johnskopis Caching in the way that you are proposing (internal to the master worker processes) does not solve consistency issues -- it creates them.

If you have twenty worker processes, each caching data independently in memory, you have twenty different caches operating at twenty different levels of freshness. What happens when the source pillar data changes? In order to ensure that you don't have some caches expiring before others, you have to be able to ensure that they all have the same copy of the cached data, lest you have a cache coherence problem, which in the case of a pillar would be Very Bad News (tm).

I have proposed multiple ways of solving this problem in a way that will work, cache data in-memory, while avoiding the cache coherence problem that I have just described. Please re-read my comments if they aren't clear.

@johnskopis
Copy link
Contributor Author

johnskopis commented Mar 2, 2019 via email

@cachedout
Copy link
Contributor

Ah, I see. This sounds, though, like it's operating under the assumption that all external pillars call an HTTP endpoint that can support conditional fetching. While HTTP supports conditional GETs, there is no guarantee that the external pillar uses HTTP at all. How do you avoid cache coherence problems across multiple workers without the guarantee that you have the ability to conditionally fetch the object to begin with?

@johnskopis
Copy link
Contributor Author

johnskopis commented Mar 2, 2019 via email

@cachedout
Copy link
Contributor

cachedout commented Mar 2, 2019

I fear this is moving toward an unhelpful and pedantic argument, so let me just make this my last post and attempt to better explain my position here. Beyond this, I'll leave it to @saltstack/team-core and others to respond and give guidance about the best way to proceed here.

  1. Cache coherency is a requirement for caching pillar data on a Salt master. If you run an orchestration, for example, and some machines receive different pillar data than others, it has to potential to result in a catastrophic outage in the managed infrastructure. This result, of course, is very different than IMS caching strategies that browsers use, where stale cache data results in an experience that is, at worst, generally just inconvenient and not at all dangerous.

  2. TTLs used in workers with no shared knowledge will not achieve cache coherence in the manner that you describe. Imagine, for example, that you have a set of ten workers. You have a TTL set to thirty minutes on objects inserted into the cache from the upstream HTTP endpoint. Five requests come in for that data from downstream minions. You now have five workers with knowledge that will consider fresh for thirty minutes. Fifteen minutes after that, the upstream endpoint changes the object. Another request for that data comes in. What happens? Five workers give stale answers while the other five give fresh answers. This is a variation on the cache invalidation problem that @isbm mentions early in this thread.

To achieve coherence across multiple in-memory caches in the manner that you are proposing, there are a few ways one could go, none of them easy. You could implement a variation on write-back or write-through strategies. You've mentioned L1 caches, alluding to processor caches and many processor architectures handle the problem in this manner. In short, you'd have to design and implement a system by which each worker is updated about cached objects on other workers. Alternately, a request could force all workers to query the upstream system as a means to try to stay in sync but this is a bit of a scary way to implement coherence and beyond that, it totally breaks Salt's model of MWorker autonomy which is almost certainly a non-starter with Salt's core team.

The other option is the one I have advocated for during this conversation -- a type of shared memory approach. In my proposal, instead of having a literal segment of shared memory, you would instead have a single process providing caching services, acting as a singular point of truth for cached data from which workers could poll. It would be fine, as you rightly point out, to have a TTL and/or an IMS strategy in this layer. The key point is that all workers would need to query this layer which would provide consistent answers to workers on each request. This provides the added benefit of allowing a pluggable system by which hardened and performant such as memcache or Redis could be used.

Another option would be to put caches on workers but to use cache files on disk. This is not in-memory, as you describe as your goal above, but it would solve the problem and likely still be extremely performant as compared to a remote endpoint. There still needs to be consideration for file locking which would require special attention but Salt has facilities for handling that gracefully.

The tl;dr here is that I am all for a shared, in-memory cache that ships with Salt and can be used by workers but in-memory pillar caching should, not be placed inside the worker itself.

Hopefully that makes sense. I'm disengaging from this thread now and will leave it to others to follow-up. Cheers. :)

@waynew waynew changed the title Add worker_cache RFC Add worker_cache Mar 6, 2019
Copy link
Contributor

@waynew waynew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're re-starting numbering with the SEP process, so this SEP will be number 2, so should be 0002-worker-cache.md.

@@ -0,0 +1,210 @@
- Feature Name: `worker_cache` (MWorker Cache / If-Modified-Since Cache IMSCache)
- Start Date: 2019-02-13
- RFC PR: (leave this empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- RFC PR: (leave this empty)
- RFC PR: https://github.com/saltstack/salt-enhancement-proposals/pull/3

@waynew
Copy link
Contributor

waynew commented Mar 19, 2019

So I'm not intimately familiar with all of the ins-and-outs of caching. I've experienced enough to know that here be dragons though 🙂

The tl;dr here is that I am all for a shared, in-memory cache that ships with Salt and can be used by workers but in-memory pillar caching should, not be placed inside the worker itself.

That sounds sensible to me. I like both the idea of having a cache, and it sounds like a bad idea to be placing it within the worker. Are you interested in incorporating that feedback and updating your proposal?

@waynew waynew changed the title Add worker_cache SEP 2 - Add worker_cache Apr 5, 2019
@Akm0d Akm0d self-requested a review May 3, 2019 18:26
@waynew
Copy link
Contributor

waynew commented Sep 5, 2019

Hey @johnskopis are you interested in working on this, or should we go ahead and mark it as abandoned for someone else to pick up in the future?

@johnskopis
Copy link
Contributor Author

johnskopis commented Sep 5, 2019 via email

@waynew waynew merged commit 652cf1c into saltstack:master Sep 6, 2019
@waynew waynew added the Abandoned Available for a new (or the original) champion to adopt. label Sep 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Abandoned Available for a new (or the original) champion to adopt.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants