Skip to content

Conversation

@gabrielmougard
Copy link
Contributor

@gabrielmougard
Copy link
Contributor Author

@tomponline we can start discussing the approach. I mark it as draft as I'm stillin the process to test it but just wanted to let you know about my idea. @simondeziel what was the test setup you used so that I can reproduce a bunch of long running operations and start testing this appraoch.

@simondeziel
Copy link
Member

@simondeziel what was the test setup you used so that I can reproduce a bunch of long running operations and start testing this appraoch.

It's been a while and it seems it was an image refresh taking longer than usual. I think mimic something akin with:

$ lxc launch ubuntu-minimal-daily:24.04 c1
Launching c1
$ lxc exec c1 -- sleep inifinity

Then sudo systemctl stop snap.lxd.daemon.

@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch from cc831d8 to 34c2e89 Compare February 25, 2025 17:36
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellCheck found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch from 34c2e89 to 258618c Compare February 26, 2025 09:16
@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch 18 times, most recently from 2542323 to 42ae29a Compare February 27, 2025 08:53
@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch from 42ae29a to 8cc26b7 Compare February 27, 2025 08:58
@gabrielmougard
Copy link
Contributor Author

gabrielmougard commented Feb 27, 2025

If one wants to reproduce the tests: this should be working https://paste.ubuntu.com/p/MGxxJz35N7/ (still fuguring out why this is different in the CI)

@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch from 8cc26b7 to a428ad5 Compare February 27, 2025 09:29
@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch 2 times, most recently from 45e8db0 to 1569f90 Compare March 12, 2025 10:33
simondeziel
simondeziel previously approved these changes Mar 12, 2025
Copy link
Member

@simondeziel simondeziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the testing side. The standalone+LVM is the most impacted in terms of test duration with ~6m10s but I don't think much can be done about it. For standalone+ceph, it takes ~3m45s so it's not that bad.

@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch 4 times, most recently from 12b0719 to 6e567f0 Compare March 26, 2025 17:18
@gabrielmougard
Copy link
Contributor Author

@tomponline fixed the comments. Tests are green.

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor changes and a question, thanks!

`pendingInstanceOperations` is only used in the LXD shutdown sequence Stop() function at it is a blocking function.
We replace it by a non-blocking function that simply populate a map of instance URL to their operation.

It is also worth mentionning that once the LXD shutdown sequence is triggered, no more operation can be added,
so I don't see the use of "waiting".

Signed-off-by: Gabriel Mougard <[email protected]>
We will no longer call `daemonStorageVolumesUnmount` in a separate goroutine but after the instancesShutdown function.
However, we still need to be able to cancel the volume unmount after a timeout, hence the introduction of the 'select' statement with a context handling.

This change also enable a context to cancel an ongoing unmount procedure.

Signed-off-by: Gabriel Mougard <[email protected]>
…sShutdown` function

* `isInstanceBusy`: check if an instance has a running op tied to it.

Signed-off-by: Gabriel Mougard <[email protected]>
This enables `instancesShutdown` to react upon operation changes (from busy to idle) and to context cancellation
while keeping the shutdown order correct if instances have `boot.stop.priority` set.

Signed-off-by: Gabriel Mougard <[email protected]>
For the testing phase, we need a convenient way to create dummy operations in LXD with a certain duration.

Signed-off-by: Gabriel Mougard <[email protected]>
@gabrielmougard gabrielmougard force-pushed the fix/improve-shutdown-sequence branch from 6e567f0 to e08ef31 Compare March 31, 2025 13:55
Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks!

@tomponline tomponline merged commit c46f7f8 into canonical:main Apr 1, 2025
29 checks passed
for i := 0; i < maxConcurrent; i++ {
go func(instShutdownCh <-chan instance.Instance) {
for inst := range instShutdownCh {
logger.Debug("Instance received for shutdown", logger.Ctx{"project": inst.Project().Name, "instance": inst.Name(), "timestamp": time.Now().UnixNano()})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrielmougard are the new timestamp nano second fields required, they are rather noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stopping LXD should signal instances early for shutdown

3 participants