-
Notifications
You must be signed in to change notification settings - Fork 810
change map to sync.map to support concurrency #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @dustin-decker No, it works well. Creating this pull request on the other hand was just by accident. I intended to create it in my own fork. I came up with this solution as this was proposed already earlier in the old repository but got never merged. |
I see. Could you re-open this then? I'm trying to get it fixed upstream and prefer this fix over mine. |
I would be very happy if this PR got merged in. Being able to work concurrently on repositories would be amazing! 🙏 |
The two main problems to merges this are:
Sadly now I don't have time for either of those tasks :/ |
My two cents:
True, but progress toward that goal is better than nothing. That said, fixing the concurrency issues reported by
At the moment, the alternative it to use a massive mutex around the whole repository, including for operations that don't require any locking. Everything is going to be better performance wise than that. |
I agree 100% with @MichaelMure. Anything in the direction of this PR is better than the current state. |
This when your use case is processed only one repository at a time. SInce my use cause is completely the opposite, I process hundreds of repositories at the same time, your PR will degrade the performance of my deployment. So is clear that your affirmation is not true. I will be ok merging something that is something pluggable or optional. And not a change in the internal behavior of the library. |
@mcuadros I meant that from my project point of view, I can either add this global mutex or get panics due to concurrent access. Which is indeed the worst way to solve that problem.
Does that means that you don't want to make go-git thread-safe? Or is that about not merging an ugly global mutex? |
This question is really important. Thanks, @MichaelMure. @mcuadros: because (a) if you "don't want to make go-git thread-safe", then it's useless to discuss anything in this PR, since everyone is wasting their precious time. If it's because the code quality of this PR is not satisfactory, and (b) "you don't want to merge an ugly global mutex", then it gives a chance to the community to propose something better. If it's (b), could you please point to some directions here on where we should pay attention when changing the code to make it thread-safe (and ultimately, propose something better)? |
I don't want to make go-git thread-safe, by default for several reason:
So if you make a PR where this functionality is disabled by default and could be enabled, I will be happy to merge it. BTW I already wrote how should be 15days ago:
|
@mcuadros Would you be open to merge something that 1) doesn't break compat and 2) doesn't affect performance? I believe that should be possible as:
The first step would be to have proper benchmark to measure this performance cost, if any. Is there some scenario you care about especially, beside the benchmark that already exist in go-git? |
I will only merge it if it's an optional funcionallity, so we don't need to do benchmarks. |
Hi, Thanks for this library! Are there recommended alternatives to use go-git in a thread safe way? Thanks! |
I will argue for merging this as is, without benchmarks because, IMO, a race condition is a bug, and fixing it is not a feature request nor enhancement. It's simply making the code function correctly. If this package were implemented in a language where concurrency isn't a native language feature, a disclaimer like "this package is not safe to use concurrently" is probably valid. But concurrency is a feature of Go. So much so, that the runtime itself detects this race condition. Which means, this bug effectively forces me to not be able to use parts of the Go language. It's not forcing me to consider a different design for my code. It's like having a bug that breaks the use of To me, this is a serious bug that deserves a patch. |
I would fully expect concurrent access to a single repository to be safe. I have been using go-git for a couple years and just found out it isn't, because we had panics in production that I still cannot reproduce synthetically. To me, and I suspect the vast majority of users, that is a bug, and quite a bad one (thankfully this issue was not hard to find). The phrase is "make it work, make it right, make it fast" and right now it isn't right. Usually "thread unsafe" is an optional setting for specific use cases, and the default is whatever is correct. |
It seems that having benchmarks would be valuable. I was reading the code and wondering e.g. whether calling I couldn't find an issue describing the benchmarks needed. Is there one? What are the key scenarios? I think there's an obvious scenario where we build a big git repo with lots of files and lots of commits, then reopen the repo to flush caches and visiting every file in every commit. But that doesn't cover parallel processing of repos, although if we avoid global mutexes and make the N=1 case faster, I would expect it to speed up the N=100 case also. So would that be a sufficient benchmark for your use case @mcuadros ? |
There is quite a bit on this thread, so I will try to unpack a few points. Data races
This PR fixes no data races reported via Performance impactThese changes have a 7x performance hit in terms of execution time. There is also an allocation impact which over time will increase GC pressure. Below is the before and after (rebased) of the existing benchmarks, considering only the relevant deltas.
But, I agree some of the points in the thread:
I have also benchmarked #175, which fixes the same problem. Based on the performance results (refer to #175 (comment)), I would prefer that over this PR. |
No description provided.