-
-
Notifications
You must be signed in to change notification settings - Fork 182
Restore sane max N%
cycles behaviour where 100% means 100%
#3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
max N%
cycles behaviour where 100% means 100%
eb2c1dc
to
7015ef6
Compare
I have been testing this for years so i give it a pass if its functional equivalent to 105%. |
We're actually targeting 95% thread load if it's functional equivalent of 105%. That gives it a little bit of overhead so it should be safe. There's your free 5% speed bump by going from 90% thread load to 95% thread load. |
I'm not sure about that. I'm not seeing that from the fixed calculations. |
fully agree with knocking out the 105% and just making it 100%. Regarding the 90% target, here's how it works: The auto-cycler takes a ratio of completed 1ms ticks versus scheduled (or potential) 1ms ticks across a 250 ms period. It scales up cycles more and more as the system continues to prove that it can keep up.. until it can't, at which point it starts lowering cycles. Benchmark applications are the easiest for this algorithm to get right, because the load is usually 100% and constant; including not waiting for vblank. The most challenging scenario is when you have an extended period of light load (maybe minimal or no sound effecst and simple imagery).. the auto cycler sees the machine can handle it.. so keeps ladling on the cycles. Perhaps after 5 seconds of this it's dialed up to 750,000 cycles/ms.. and the host is actually able to complete them. But then the scene changes quickly, perhaps you open a door and the scene requires a lot of rendering, and there's a bunch of NPCs all generating sound effects that the game needs to mix and then DMA transfer. Right then, those cycles are going to be a lot more work.. and the host might actually need 10 ms or more to grind through 750,000 cycles to complete just one 1ms worth of DOS time. The auto-cycler sees the ratio tank, and needs to brings cycles back down to a point where the host can keep up... but even then, it brings them down in a step-wise manner to avoid turning the game into a slide slow. During this "in debt" phase, we're actually bleeding down our audio buffer and that 10% headroom (from the 90% target) will be used up (the DOSBox executable will be consuming 100% host CPU usage). So it's there more like an internal buffer or "emergency fund" when we're in over-subscribed scenarios, and less about controlling host CPU usage. |
Super informative explanation on how the auto cycler works, @kcgen 🎖️ Ok, so you're advising to change the percentage target back to 90% then? (I changed it to 100%, being not fully across all this background info.) |
Previously, `cycles = max` or `cycles = max 100%` really meant 90% CPU core utilisation, and the rather hacky `cycles = max 105%` corresponded to about 95% actual utilisation. You could not go higher than that! This perhaps made _some_ sense in the early 2000s when single-core CPUs were the norm to protect inexperienced users, but as since about 2010 onwards at least dual-cores became common, this is nothing more than some confusing legacy cruft. Even the quite old and underpowered Raspberry Pi 3B released in 2016 has a 64-bit quad-core ARM Cortex...
7015ef6
to
4cb495d
Compare
I'm consistently seeing this 5% gap between Main at 105%: ![]() Test at 100%: ![]() |
Waiting until kcgen wakes up since he knows how this works. Thankfully he seems to agree its time to retire this 2004 era assumption. My understanding always was that it targeted thread load 90% at 100% and 105% would bump to thread load 95% but i'll wait until kcgen wakes up. |
Well, now you're getting 100% utilisation with |
Pinging our Quality Assurance Team: |
Interested to hear how this works on macOS/ARM64. |
My impressions of "100% is 100%" is that the emulation isn't crapping out even if the main thread is pushed at times to 102% to 103% (in htop 5 second intervals) so it seems to be working much better than i expected. Me advocating for 100% being thread load 95% was just about being careful but in fact i'm starting to agree that "100% is 100%". With this new model: Old 105% is now 95%. Old 100% is now 90%. |
I think that's the good way: you get what you ask for. |
In PCPBench in Mode 122 (1600x1200x16bpp) i'm getting 90% in main with default settings and 99.x% using this PR. |
Should be pretty safe merge since we're just working the main thread up to full capacity which forces kernel to schedule tasks to other threads leaving the main thread exclusively for DOSBox Staging if necessary. 👍 |
Waiting for @kcgen to approve too, then yeah, I'll press the Big Green Button 😎 🤘🏻 |
Someone should check https://www.dosbox.com/wiki/Performance I'm super sleep deprived and need to sleep now. |
Tested with Screamer 2 on the Pi 400, and the audio gaps occur slightly later than versus Cycles are allowed to spool up a bit higher for a bit longer before "going into debt" (that is, when the host needs more than 1 ms to generate 1 ms worth of DOS time during heavy load periods). However I'm not counting any additional gaps across the entire Screamer 2 demo period - so the Pi is fast enough to recover (refilling the bled-down audio buffer) once the ratio crushes the subsequent cycle count back to manageable levels for the host. So I think this change is good. Merging this on the heels of @kklobe's |
Thanks for testing this on the Pi @kcgen 🚀 |
We were chatting on IRC on other benefits of this bit of code that may be directly invisible. Mainly the CPU scheduler seeing we're "ringing the bell" by running sometimes at over 100% thread load and giving us those sweet boost clocks faster and more aggressively. |
That's a very interesting point; wasn't aware of that but makes sense! |
Tested all the code paths that i know of:
|
Description
Fixes #1913
Previously,
cycles = max
orcycles = max 100%
really meant 90% CPU core utilisation, and the rather hackycycles = max 105%
corresponded to about 95% actual utilisation. You could not go higher than that! This perhaps made some sense in the early 2000s when single-core CPUs were the norm to protect inexperienced users, but as since about 2010 onwards at least dual-cores became common, this is nothing more than some confusing legacy cruft. Even the quite old and underpowered Raspberry Pi 3B released in 2016 has a 64-bit quad-core ARM Cortex...Having said that, people are always free to use
max 90%
,max 50%
, or whatever. I wanted to get this in for the 0.81.0 testing cycle so we can givecycles = max
a good workout, but I expect no issues.Related issues
#1913
Manual testing
Tried a few extremely demanding late 90s demos that use the VESA 512x384 24-bit true colour mode for CPU-based 3D rendering (kkowboy and moralhardcandy). They worked perfectly fine with
cycles = max
; 100% smooth 70 FPS rendering, zero sound glitches, and I was using Discord and a couple of browsers and tons of other stuff in the background on my 10-core M1 Mac.Checklist
Please tick the items as you have addressed them. Don't remove items; leave the ones that are not applicable unchecked.
I have: