Skip to content

change map to sync.map to support concurrency #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

calvin-f
Copy link

No description provided.

@dustin-decker
Copy link

Hey @calvin-f, why did you close this? I also tried to solve this race with #175

Did you find something out that made it unnecessary?

@calvin-f
Copy link
Author

Hey @dustin-decker

No, it works well. Creating this pull request on the other hand was just by accident. I intended to create it in my own fork.

I came up with this solution as this was proposed already earlier in the old repository but got never merged.
src-d/go-git#1270

@dustin-decker
Copy link

I see. Could you re-open this then? I'm trying to get it fixed upstream and prefer this fix over mine.

@michenriksen
Copy link
Contributor

I would be very happy if this PR got merged in. Being able to work concurrently on repositories would be amazing! 🙏

@mcuadros
Copy link
Member

mcuadros commented Feb 9, 2021

The two main problems to merges this are:

  • The full library needs to support concurrency, and this requires an in-depth review of the code base.
  • Adding concurrency will speed down some operations (or maybe not much), but this requires do some benchmarks.

Sadly now I don't have time for either of those tasks :/

@MichaelMure
Copy link
Contributor

My two cents:

* The full library needs to support concurrency, and this requires an in-depth review of the code base.

True, but progress toward that goal is better than nothing. That said, fixing the concurrency issues reported by go test -race ./... would do wonder already.

* Adding concurrency will speed down some operations (or maybe not much), but this requires do some benchmarks.

At the moment, the alternative it to use a massive mutex around the whole repository, including for operations that don't require any locking. Everything is going to be better performance wise than that.

@gsmachado
Copy link

I agree 100% with @MichaelMure. Anything in the direction of this PR is better than the current state.

@mcuadros
Copy link
Member

mcuadros commented Mar 10, 2021

At the moment, the alternative it to use a massive mutex around the whole repository, including for operations that don't require any locking. Everything is going to be better performance wise than that.

This when your use case is processed only one repository at a time. SInce my use cause is completely the opposite, I process hundreds of repositories at the same time, your PR will degrade the performance of my deployment. So is clear that your affirmation is not true.

I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.

@MichaelMure
Copy link
Contributor

@mcuadros I meant that from my project point of view, I can either add this global mutex or get panics due to concurrent access. Which is indeed the worst way to solve that problem.

I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.

Does that means that you don't want to make go-git thread-safe? Or is that about not merging an ugly global mutex?

@gsmachado
Copy link

gsmachado commented Mar 14, 2021

Does that means that you don't want to make go-git thread-safe? Or is that about not merging an ugly global mutex?

This question is really important. Thanks, @MichaelMure.

@mcuadros: because (a) if you "don't want to make go-git thread-safe", then it's useless to discuss anything in this PR, since everyone is wasting their precious time. If it's because the code quality of this PR is not satisfactory, and (b) "you don't want to merge an ugly global mutex", then it gives a chance to the community to propose something better.

If it's (b), could you please point to some directions here on where we should pay attention when changing the code to make it thread-safe (and ultimately, propose something better)?

@mcuadros
Copy link
Member

I don't want to make go-git thread-safe, by default for several reason:

  • Break compatibility requires a major version upgrade
  • Make slower the library, and not everyone has your use case.

So if you make a PR where this functionality is disabled by default and could be enabled, I will be happy to merge it.

BTW I already wrote how should be 15days ago:

I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library.

@MichaelMure
Copy link
Contributor

@mcuadros Would you be open to merge something that 1) doesn't break compat and 2) doesn't affect performance?

I believe that should be possible as:

  • there seems to be fairly few places where concurrency is an issue
  • go-git is not massively parallel to begin with and most of the performance cost boils down to disk access, data processing and networking, not mutexes and locks

The first step would be to have proper benchmark to measure this performance cost, if any. Is there some scenario you care about especially, beside the benchmark that already exist in go-git?

@mcuadros
Copy link
Member

I will only merge it if it's an optional funcionallity, so we don't need to do benchmarks.
So you can include the lock function in a function with a boolean.

This was referenced Mar 30, 2021
@cep21
Copy link

cep21 commented Nov 3, 2021

Hi,

Thanks for this library! Are there recommended alternatives to use go-git in a thread safe way?

Thanks!

@nicerobot
Copy link

I will argue for merging this as is, without benchmarks because, IMO, a race condition is a bug, and fixing it is not a feature request nor enhancement. It's simply making the code function correctly.

If this package were implemented in a language where concurrency isn't a native language feature, a disclaimer like "this package is not safe to use concurrently" is probably valid. But concurrency is a feature of Go. So much so, that the runtime itself detects this race condition. Which means, this bug effectively forces me to not be able to use parts of the Go language. It's not forcing me to consider a different design for my code. It's like having a bug that breaks the use of for loops.

To me, this is a serious bug that deserves a patch.

@rgalanakis
Copy link

I would fully expect concurrent access to a single repository to be safe. I have been using go-git for a couple years and just found out it isn't, because we had panics in production that I still cannot reproduce synthetically. To me, and I suspect the vast majority of users, that is a bug, and quite a bad one (thankfully this issue was not hard to find).

The phrase is "make it work, make it right, make it fast" and right now it isn't right. Usually "thread unsafe" is an optional setting for specific use cases, and the default is whatever is correct.

@justinsb
Copy link

justinsb commented Dec 2, 2022

It seems that having benchmarks would be valuable. I was reading the code and wondering e.g. whether calling FindHash is a win, and not having benchmarks makes that difficult to answer.

I couldn't find an issue describing the benchmarks needed. Is there one? What are the key scenarios? I think there's an obvious scenario where we build a big git repo with lots of files and lots of commits, then reopen the repo to flush caches and visiting every file in every commit. But that doesn't cover parallel processing of repos, although if we avoid global mutexes and make the N=1 case faster, I would expect it to speed up the N=100 case also. So would that be a sufficient benchmark for your use case @mcuadros ?

@pjbgf
Copy link
Member

pjbgf commented Dec 3, 2022

There is quite a bit on this thread, so I will try to unpack a few points.

Data races

That said, fixing the concurrency issues reported by go test -race ./... would do wonder already.

This PR fixes no data races reported via go test -race on amd64 Linux.
Based on latest versions, all the data races reported originate from go-billy, and not go-git. I will propose a PR for that separately.

Performance impact

These changes have a 7x performance hit in terms of execution time. There is also an allocation impact which over time will increase GC pressure.

Below is the before and after (rebased) of the existing benchmarks, considering only the relevant deltas.

goos: linux goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz

name           old time/op    new time/op    delta
FindOffset-16     279ns ± 4%    2120ns ± 2%   ~     (p=0.100 n=3+3)

name           old alloc/op   new alloc/op   delta
FindOffset-16     0.00B        416.00B ± 0%   ~     (p=0.100 n=3+3)

name           old allocs/op  new allocs/op  delta
FindOffset-16      0.00          25.00 ± 0%   ~     (p=0.100 n=3+3)

But, I agree some of the points in the thread:

  • go-git should just work, any expection or optimisation should be caveated/documented. Everyone likes surprise-free experience. 😄
  • High index concurrency should not force a performance penalty on users needing High repository concurrency without a way to opt-out.

I have also benchmarked #175, which fixes the same problem. Based on the performance results (refer to #175 (comment)), I would prefer that over this PR.

@pjbgf pjbgf mentioned this pull request May 20, 2023
7 tasks
@github-actions github-actions bot added the stale Issues/PRs that are marked for closure due to inactivity label Dec 13, 2023
@github-actions github-actions bot closed this Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues/PRs that are marked for closure due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.