Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

SEP 23: Create salt-file-protocol-file-listing-details.md #31

Conversation

damon-atkins
Copy link

Adding SEP for proposed for having more information returned about files within salt:// so copies/sync from the master can be completed in a more efficient way.

@damon-atkins damon-atkins requested a review from a team as a code owner July 22, 2020 08:38
@ghost ghost requested review from twangboy and removed request for a team July 22, 2020 08:38
Copy link

@OrangeDog OrangeDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea, but checksumming every file is going to have a performance impact, and caching it will just move the problem elsewhere.

```
cp.list_master detail=True directories_only=False path=/
'.':
Time: seconds since 1970

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is supposed to be last modification time? Not creation or last access.
I also assume this is Unix epoch time, in which case the precise specification is since 1970-01-01T00:00:00.000+00 excluding leap seconds.

Time: seconds since 1970
ETag: string/checksum
Size: int
Type: f(ile) or d(irectory)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about all the other types (b, c, l, p, s, D, J)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I am aware salt does not support other types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is cp.list_master_symlinks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just because they are not currently supported doesn't mean the information shouldn't be collected

'file1':
Time: seconds since 1970
ETag: string/checksum
Size: int

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Python int, not a C int?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to reflect a 64bit int.


# Drawbacks
[drawbacks]: #drawbacks
Their maybe compatibility issues between different versions of salt when introduced. As the salt:// protocol is not version
Copy link

@OrangeDog OrangeDog Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*There
*may be
*versioned

Type: d
'file1':
Time: seconds since 1970
ETag: string/checksum

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of checksum?
If there's a security concern (e.g. compromised minion has file replaced by one with a different function but same checksum) then a SHA-2 or similar is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doco, its not important to the minion, it does not need to calculated it.

@terminalmage
Copy link
Contributor

I don't understand how this solves anything. The fileserver already maintains checksums of each served file, so that it can prevent transmitting unmodified files.

@waynew
Copy link
Contributor

waynew commented Jul 23, 2020

Agreeing with terminalmage - do you have an example of the failures of the current process(es)?

Also, this will be SEP 23, please update your filename/contents 👍

@waynew waynew changed the title Create salt-file-protocol-file-listing-details.md SEP 23: Create salt-file-protocol-file-listing-details.md Jul 23, 2020
@damon-atkins
Copy link
Author

damon-atkins commented Jul 25, 2020

Agreeing with terminalmage - do you have an example of the failures of the current process(es)?

Also, this will be SEP 23, please update your filename/contents 👍

windows pkg.refresh_db deletes all .sls files off the minion and re-fetches them again, so the minion contains only files which exist on the master (no old files which have been removed). Most process within Salt does not matter if extra files are cached or left behind on the minion.

I am sure other items rsync ish operation with delete would benefit. e.g. cp.cache_dir, cp.cache_master & cp.get_dir does not support with delete.

@terminalmage
Copy link
Contributor

terminalmage commented Jul 25, 2020

Agreeing with terminalmage - do you have an example of the failures of the current process(es)?
Also, this will be SEP 23, please update your filename/contents +1

windows pkg.refresh_db deletes all .sls files off the minion and re-fetches them again, so the minion contains only files which exist on the master (no old files which have been removed). Most process within Salt does not matter if extra files are cached or left behind on the minion.

This seems like an argument for improving salt/modules/win_pkg.py, and not for adding a bunch of unnecessary (IMO) overhead to the fileserver.

I am sure other items rsync ish operation with delete would benefit. e.g. cp.cache_dir, cp.cache_master & cp.get_dir does not support with delete.

I don't know what you mean by "does not support with delete". But again, cp.cache_dir is just using the underlying fileclient requests to the server, which will cause the master to check the current checksum before transmitting the file, preventing retransmission of modified unmodified files. It's possible that I'm missing something here, so perhaps you could explain this in a little better detail.

@damon-atkins
Copy link
Author

If I break down state file.recursive:

  • calls master_dirs = cp.list_master_dirs to check if the directory is on the master
  • calls file.makedirs_perms if required
  • calls cp.list_master to get a list of all files.
  • calls cp.list_master_symlinks if the person wants to keep symlinks, code goes onto removes symlinks from the list of files, I assume so they are not copied as normal files
  • Starts building up a another list of files
  • calls cp_list_master_dirs if using include_empty
  • For ever file calls file.managed some of the code is not relevant.
    • calls file.get_managed which checksum the local file if it exists, and calls cp.cache_file

A lot of calls to the master, when this code could be simplified if more information was return in one hit, and less on going load for the Master. Minion could also cache the information.

The master already cache the checksum results, minion does not, and could which would save cpu on the minion. That's what eTag, which is the standard use for HTTP. The browser keeps record of the eTag it has no idea what it is, other than the file has changed if the eTag does not match.

@terminalmage
Copy link
Contributor

I think there's probably an argument that we can create a new fileclient call that returns most of what the minion needs for a file.recurse state (i.e. returning information about files, empty dirs, symlinks, etc). For one, this information is (with most fileserver backends) being compiled together and serialized on the master. So the paths are already being collected. But I am dubious that making fileclient requests to cache files presents that much of an extra load.

@OrangeDog
Copy link

OrangeDog commented Jul 27, 2020

ETags work because the files can only change on the server.
If the minion is caching hashes instead of computing them itself, then local changes to a file will not be noticed. That is a bad thing.

@damon-atkins
Copy link
Author

ETags work because the files can only change on the server.
If the minion is caching hashes instead of computing them itself, then local changes to a file will not be noticed. That is a bad thing.

Your write to an extend, /var/cache/salt/minion/ manged by the minion like a browser manages its own cache, files outside of this could be altered by a person/batch/script.

cp.cache_file uses /var/cache/salt/minion/ only. The files /var/cache/salt/minion/ are check against the target files on the system. Looking at the code all files go via /var/cache/salt/minion/ and not direct to their final location. As minion owns /var/cache/salt/minion/ and manages it, eTag is approach is fine. People/tasks/jobs do not change the files within /var/cache/salt/minion/

@sagetherage sagetherage added Draft Initial Status Final Comment Period Speak now or forever hold your peace. and removed Draft Initial Status labels Feb 5, 2021
@sagetherage
Copy link
Contributor

sagetherage commented Feb 11, 2021

@damon-atkins we are moving this into the Final Comment Period and looking to get approval and acceptance, but that does not necessarily mean we have everything ready to implement, but ready to look at possible implementations. @terminalmage we would love to hear from you on this as well and @OrangeDog. This was discussed in the Core Team Planning meeting held 2021-FEB-11.

@terminalmage
Copy link
Contributor

I do not believe that there is a coherently-communicated need here, so I would vote "no" on this.

@damon-atkins
Copy link
Author

damon-atkins commented Feb 15, 2021

The point of putting this up was to, have discussion to resolve the issues:

  • Master providing little information is inefficient, and hence force the minion to make many more requests.
  • Minion need to be able to keep in sync with the salt tree, and not have files which are deleted hanging around. Which could possible cause a security issue. e.g. a File was used a flag. Or someone places a password in an sls file and thinks it will be deleted off the minion when they remove it from the salt master and run another state.
  • Someone renames a large file, the file is left on the minion.
  • the windows package sync process is slow because it can not get the information it needs from the master to make it efficient. The larger the number of files the more inefficient the current process is. https://github.com/saltstack/salt-winrepo contains about 250 files and is growing, all of which need to be removed on the minion and then re-fetched by the minion. (250 x Number of minions if running an update at the same time)

I am happy for salt team to update the proposal if their is a better solution.

@sagetherage
Copy link
Contributor

sagetherage commented Feb 17, 2021

Looking for rough consensus and details of implementation can be hammered out in a planning session hopefully with the newly formed CAB. updated 2021-JUN-29 in the Core Team planning meeting.

@sagetherage sagetherage removed the request for review from Akm0d June 10, 2021 19:08
@gtmanfred
Copy link

Has this proposal been accepted or rejected? This is not clear from any of the approvals or responses from the salt team.

@bryceml
Copy link

bryceml commented Jul 3, 2021

Has this proposal been accepted or rejected? This is not clear from any of the approvals or responses from the salt team.

I think we need 1 more approval from a member of the core team before it can be accepted.

@gtmanfred
Copy link

There doesn't seem to have been a lot of discussion in this issue on why it has been accepted by the core team, only a lot of community involvement.

When specifically asked about for information about if it should be accepted, one former core team member who has offered compelling reasons not to do this.

Can the anyone provide any information about why it is being accepted?

@gtmanfred
Copy link

Also a current core team member who appears to agree with that former core team members assessment that this isn't a great idea to implement. #31 (comment)

@damon-atkins
Copy link
Author

People are approving in general the master needs to provide more information in one hit (if requested too), so the client can be smarter and hence reduce the number of requests required to perform a recursive rsync. More requests against the master results in more load against the master.

The current code on the master tries to cache the list of files for short period time, a better method of caching could also be used for providing the extra information on each file.

@waynew
Copy link
Contributor

waynew commented Jul 16, 2021

Hey @damon-atkins - we just discussed this again in a team meeting and I
mentioned that at some point I did get convinced of the benefit of this SEP,
though it's been long enough that I don't remember precisely why. And I think
my original point still stands:

do you have an example of the failure of the current process(es)?

To fully accept this SEP, I think that:

  • We need to document with an example the current failure/enhancements
  • We also need to document that we don't know the performance impact
    • We would need to measure and get a baseline to compare
    • We may want to gate this functionality if the penalty is severe enough

@damon-atkins
Copy link
Author

@waynew It should not be rushed, it shoud be well thought out. I will try and cover off the items you mention.

@waynew
Copy link
Contributor

waynew commented Jan 5, 2022

Hi all, sorry for the hiatus - we're renewing our SEP focus to get these open ones through the process


To summarize the current discussion:

The discussion has been split about whether or not this SEP has any effect, or if it's just an alternate form of the existing behavior. While at one point I was convinced that the SEP was useful, that discussion and information wasn't captured here in this SEP.

Assuming that past-me had better information than past-past-me, and made a more informed decision, that information should be captured within this SEP, as mentioned in my last comment.

If progress is made, we can go ahead and approve the SEP, otherwise we'll go ahead and mark it as abandoned until someone would like to pick it back up.

@waynew
Copy link
Contributor

waynew commented Feb 15, 2022

So, I still remember that something convinced me that this was a good idea.

But we also lack that information here in the SEP. For now we're going to close this SEP as Abandoned. To re-open or if someone recreates this as a new SEP, some discussion points that were unresolved:

  • chatty protocols are frustrating, we don't want to add a lot of extra network overhead or events
  • we still need to see a good, simple(?) example that demonstrates the failures in existing functionality, as well as some kind of PoC for how this idea will address those problems (without introducing new ones)

This SEP is Abandoned - feel free to either pick it back up yourself, or if anyone else is interested they can address these points in the SEP and open a new PR 👍

@waynew waynew closed this Feb 15, 2022
@waynew waynew added Abandoned Available for a new (or the original) champion to adopt. and removed Final Comment Period Speak now or forever hold your peace. labels Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Abandoned Available for a new (or the original) champion to adopt.
Projects
None yet
Development

Successfully merging this pull request may close these issues.