Skip to content

Conversation

silkeh
Copy link

@silkeh silkeh commented Jan 14, 2025

This PR contains two changes that significantly improve the performance of WorkTree.Status() (from 491 s to 5 s):

  • Skip ignored files when walking through the worktree.
    This signigifantly improves the performance of Status():
    In a repository with 3M ignored files Status now takes 5 s instead of 160 s.
  • Fix loading of .gitignore files in nested ignored directories.
    These were not matched in the current implementation, as that only checks the ignores of the current directory.
    As a side-effect this change also leads to significantly improved performance: in a repository with 3M ignored files Status now takes 160 s instead of 491 s.

I have a backport to master ready if desired.

@pjbgf
Copy link
Member

pjbgf commented Jan 15, 2025

@silkeh thanks for proposing this PR. Although this may improve the use case of multiple ignored files, it seems to cause a regression when that is not the case. I did a quick benchmark testing a call to Status() on a copy of the go-git repo, and these changes seem to have a negative impact on both allocation numbers and time per operation:

BenchmarkStatus-16 (v5.13.0)      	      48	  21514427 ns/op	19511754 B/op	   69999 allocs/op
BenchmarkStatus-16 (this PR)      	      34	  34658273 ns/op	42274701 B/op	  106194 allocs/op

I am struggling for time to pinpoint optimisation areas in the PR, would you be able to optimise it for the standard use case as well?

@pjbgf pjbgf added this to the v6.0.0 milestone Jan 15, 2025
@silkeh
Copy link
Author

silkeh commented Jan 15, 2025

@pjbgf: Thanks for taking a look! I'll take a look later this week and see what I can improve.

@onee-only
Copy link
Contributor

Hello @silkeh, Are you still working on this? This could be a big improvement for #181.

@silkeh
Copy link
Author

silkeh commented Jun 13, 2025

@onee-only: That was the plan, but I've been busy with other things. I'll try to make some time next week.

Fix loading of `.gitignore` files in nested ignored directories.
These were not matched in the current implementation,
as that only checks the ignores of the current directory.

As a side-effect this change also leads to significantly improved performance:
in a repository with 3M ignored files `Status` now takes 160 s instead of 491 s.
@silkeh silkeh changed the base branch from v6-exp to main September 12, 2025 14:12
@silkeh silkeh force-pushed the improve-ignored-status-performance branch from 25c3930 to 269fa79 Compare September 12, 2025 14:14
Skip ignored files when walking through the worktree.

This signigifantly improves the performance of `Status()`:
In a repository with 3M ignored files `Status` now takes 5 s instead of 160 s.
@silkeh silkeh force-pushed the improve-ignored-status-performance branch from 269fa79 to dcfb50d Compare September 12, 2025 16:01
@silkeh
Copy link
Author

silkeh commented Sep 12, 2025

I finally had some time to take a look and rebase the PR. The difference compared to origin/main (3a68d04) seems minor now. I got the following results:

BenchmarkStatus-12(v5.16.2)        433	  27579139 ns/op	18265183 B/op	   43611 allocs/op
BenchmarkStatus-12(origin/main)    324	  37745332 ns/op	 2106671 B/op	   41265 allocs/op
BenchmarkStatus-12(first commit)   321	  37490431 ns/op	 2107606 B/op	   41274 allocs/op
BenchmarkStatus-12(second commit)  307	  39710261 ns/op	 2174177 B/op	   42200 allocs/op

v5.16.2 is a lot faster, but this PR in its current form doesn't hurt performance too much. If you test with a repo on disk it actually seems to be a bit faster.

Benchmark I used:

func BenchmarkStatus(b *testing.B) {
	r, err := Clone(memory.NewStorage(), memfs.New(), &CloneOptions{
		URL: "https://github.com/go-git/go-git",
	})
	require.NoError(b, err)

	workTree, err := r.Worktree()
	require.NoError(b, err)

	_, err = workTree.Status()
	require.NoError(b, err)

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		workTree.Status()
	}
}

If I read the profiling correctly most of the performance degradation is introduced in MatchNoder.NumChildren because it now has to run through the patterns instead of just returning the info from disk:

image

I did a tiny optimization to reduce the allocations a bit, but the rest of the performance improvements would probably have to be done in the gitignore.matcher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants