-
-
Notifications
You must be signed in to change notification settings - Fork 172
Move dynamic FPU synchronization inside preprocessor defines #2252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Honestly I would recommend just do these 2 changes I suggested earlier and minimize the change to the original code. This way the risk is kept to the minimum (this code is very delicate and I would be very cautious to make any change that I'm not 100% confident with) and it's easy for other dosbox forks to pick up this fix rather than having to deal with a large diff with lots of unrelated changes. |
I'm for minimising the risk since this is an extremely delicate part of the codebase. |
caff04c
to
d6ed92d
Compare
2d9b956
to
0b28734
Compare
I agree with both of you on the need to test this very thoroughly. There must be some benchmark apps out there that test the FPU for correctness. I'd also be uncomfortable merging this just based on a "visual" code review — there's so many things that can go wrong. Having said that, we shouldn't be afraid of changing the code; that's a bad situation to be in. So instead of never touching these parts of the codebase, the question we should be asking instead is how can we regressions test this to gain 100% confidence that it still works as expected? Like I said, someone must have written some FPU test benchmarks in the DOS days that check for the correctness of calculations... Maybe we can even inject "synthetic" dynamic -> normal core switching for testing purposes? Maybe that way this will get tested a lot more thoroughly than the original code ever was! So we shouldn't just shy out, that's all I'm saying. Well, and other DOSBox forks can pick up the initial patch... or also choose to clean up their code 😏 (Related: I rewrote the whole DOSBox mixer last year to use floats instead of integers, gradually, over time. Plus replaced the built-in resampling method with a library. It was somewhat less risky, but arguably every game uses sound, while the FPU is only used by a few. Not "bragging", just saying that these things can be done, they're not "impossible"; you just need to be careful and approach it systematically). |
What's difficult is that this code deals with the case when FPU is used cross core boundary, which AFAIK happens when code modifies itself or when debugger/OS doing system level operations. A benchmark suite normally stays in the dynamic core happy path and won't see anything wrong. That's why DOSBOX has been able to run games for so many years without people noticing this. |
I think in order to properly test this, we need to create a guest program in assembly code that deliberately exercises this core transition logic. |
Ok fair point. |
0b28734
to
8c2de4c
Compare
We've got two bisectable points now:
That's really all we need to track down strange bugs. The other good thing is that somehow all of eXoDOS and eXoWin3.1 are running fine with this synchronization not even done at all! Granted.. those are all games/edutainment, but still, that's about 15,000 executables. With @finalpatch's Turbo Debugger tests showing proper operation, hopefully this improvement fixes more application corner cases. Oh: and we're aware of DOSBox being used for civil engineering work, like bridge and elevated load bearing calculations. Likewise, there are German manufacturers using DOSBox to run software that drives their fabrication equipment. So hopefully the numbers generated are still OK! 🚀 |
8c2de4c
to
b4ab7fc
Compare
I'll try to put together a test executable that deterministically exercises this logic. Give me some time. |
We're currently offering them 64-bit precision instead of an actual IEEE 754 80-bit extended precision format so it should be all good. 😁 But yeah, many people are depending that we get the math correct that probably shouldn't which is yet another factor in mitigating risk. Maybe we should deploy the safer option per @finalpatch first and this extended option could be deployed later once we're satisfied everything still works with the safer option first and a test program for x87 have been written to validate this bit of the code. |
You can try Intel's MCP Diag. But be aware that it will likely fail by default, as it did originally on DOSBox-X. |
Okay, I managed to create a program that can trigger the bug prints "ok" with normal core |
This branch in its current state produces "ok" from my test program. |
The core switch logic as far as I can see, is correct (I still believe 1 memcpy is better than 3 in this performance sensitive code) However there are way too many unrelated changes in this PR that I don't have the bandwidth/motivation to review. I'll leave it to the other folks to approve the whole PR, the part in in core_dyn_x86.cpp looks good to me. |
core_dyn_x86.cpp looks good to me. |
Co-authored-by: finalpatch <[email protected]> Co-authored-by: kcgen <[email protected]>
438b7d8
to
1af5cc0
Compare
Can't really do a useful review of this myself, but the approach @kcgen mentioned with the easy bisecting, plus the asm test code by @finalpatch has probably raised our confidence levels enough to get this merged? |
Thanks. Do you have the compiled .EXE handy? (I tried w/ (To unit test it, I would launch it with the compiled dosbox-staging executable, redirect to a text file, and check for OK).
Thanks for the review.
Yes, I explained in the resolved comments that the type-specific accessors and assigning directly to the struct members is still more self-documenting and explicit versus the bulk memcpy. But to wrap this up, I dropped the two routines in godbolt and quick bench: Here's the assembly at At https://godbolt.org/z/seGvx8e5M The loopless optimized version makes it very hard to benchmark at I tried flagging variables as do-not-optimize, accumulating intermediate p_reg m1/m2/m3, randomizing the dyn_reg array, etc, to no avail. However, Clang 14 atleast is measuring /something/ :laughing: https://quick-bench.com/q/LkhSkKnQLmvE_amiu7ixUd5j-kQ All that to say - we're deep in the noise (a couple instructions either way, and for something that's only called during transitions from core types). These explorations are always fun though! |
1af5cc0
to
c027ea3
Compare
Yes - I think we're set to merge, thanks to @finalpatch's test results and everything else that has been tested (all running fine). Thanks again for this, @finalpatch - and very interested in any more additions you want to help with! ❤️ |
test program binary and source |
Nice! thanks for these, @finalpatch. Anyone know of a CLI test framework (or system)? This FPU tester will be our first (of probably many) executables we run using the compiled DOSBox Staging and inspect the results. This is essentially black-box executable to go alongside our current white-box (GoogleTest and GoogleMock) internal test framework. |
This is a follow-up adjustment mentioned by @finalpatch; thanks!
This PR also:
Hoping you can test this with your Turbo C debugger @finalpatch (and anything else you can throw at it like large and tiny values / overflow / div-by-zero / NaN / etc).
Suggest reviewing commit-by-commit.
DOS Navigator is still OK, as are UBSAN builds.