Skip to content

Conversation

johnnovak
Copy link
Member

@johnnovak johnnovak commented Nov 20, 2023

Description

Fixes #1913

Previously, cycles = max or cycles = max 100% really meant 90% CPU core utilisation, and the rather hacky cycles = max 105% corresponded to about 95% actual utilisation. You could not go higher than that! This perhaps made some sense in the early 2000s when single-core CPUs were the norm to protect inexperienced users, but as since about 2010 onwards at least dual-cores became common, this is nothing more than some confusing legacy cruft. Even the quite old and underpowered Raspberry Pi 3B released in 2016 has a 64-bit quad-core ARM Cortex...

Having said that, people are always free to use max 90%, max 50%, or whatever. I wanted to get this in for the 0.81.0 testing cycle so we can give cycles = max a good workout, but I expect no issues.

Related issues

#1913

Manual testing

Tried a few extremely demanding late 90s demos that use the VESA 512x384 24-bit true colour mode for CPU-based 3D rendering (kkowboy and moralhardcandy). They worked perfectly fine with cycles = max; 100% smooth 70 FPS rendering, zero sound glitches, and I was using Discord and a couple of browsers and tons of other stuff in the background on my 10-core M1 Mac.

Checklist

Please tick the items as you have addressed them. Don't remove items; leave the ones that are not applicable unchecked.

I have:

  • followed the project's contributing guidelines and code of conduct.
  • performed a self-review of my code.
  • commented on the particularly hard-to-understand areas of my code.
  • split my work into well-defined, bisectable commits, and I named my commits well.
  • applied the appropriate labels (bug, enhancement, refactoring, documentation, etc.)
  • checked that all my commits can be built.
  • confirmed that my code does not cause performance regressions (e.g., by running the Quake benchmark).
  • added unit tests where applicable to prove the correctness of my code and to avoid future regressions.
  • made corresponding changes to the documentation or the website according to the documentation guidelines.
  • locally verified my website or documentation changes.

@johnnovak johnnovak self-assigned this Nov 20, 2023
@johnnovak johnnovak added the enhancement New feature or enhancement of existing features label Nov 20, 2023
@johnnovak johnnovak marked this pull request as ready for review November 20, 2023 04:24
@johnnovak johnnovak changed the title Restore sane "max N%" cycles behaviour where 100% means 100% Restore sane max N% cycles behaviour where 100% means 100% Nov 20, 2023
@johnnovak johnnovak force-pushed the jn/max-cycles-fix branch 2 times, most recently from eb2c1dc to 7015ef6 Compare November 20, 2023 04:27
@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

I have been testing this for years so i give it a pass if its functional equivalent to 105%.

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

// Ratio we are aiming for is 100% usage

We're actually targeting 95% thread load if it's functional equivalent of 105%. That gives it a little bit of overhead so it should be safe. There's your free 5% speed bump by going from 90% thread load to 95% thread load.

@johnnovak
Copy link
Member Author

// Ratio we are aiming for is 100% usage

We're actually targeting 95% thread load if it's functional equivalent of 105%. That gives it a little bit of overhead so it should be safe.

I'm not sure about that. I'm not seeing that from the fixed calculations.

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

Yeah i have to test this out myself a bit first. Here's Duke Nukem 3D on main looking like its targeting 95% at 105% setting:

Screenshot 2023-11-20 at 7 36 54

@johnnovak
Copy link
Member Author

Yeah i have to test this out myself a bit first. Here's Duke Nukem 3D on main looking like its targeting 95% at 105% setting:
Screenshot 2023-11-20 at 7 36 54

It's tricky because I think the "100% CPU utilisation" is a potential, so it's not guaranteed that you set cycles to max 80% or 90% or 100% and you see the CPU pinned exactly to 80%, 90% and 100% all the time. There are always fluctuations.

@kcgen
Copy link
Member

kcgen commented Nov 20, 2023

fully agree with knocking out the 105% and just making it 100%.

Regarding the 90% target, here's how it works:

The auto-cycler takes a ratio of completed 1ms ticks versus scheduled (or potential) 1ms ticks across a 250 ms period. It scales up cycles more and more as the system continues to prove that it can keep up.. until it can't, at which point it starts lowering cycles.

Benchmark applications are the easiest for this algorithm to get right, because the load is usually 100% and constant; including not waiting for vblank.

The most challenging scenario is when you have an extended period of light load (maybe minimal or no sound effecst and simple imagery).. the auto cycler sees the machine can handle it.. so keeps ladling on the cycles. Perhaps after 5 seconds of this it's dialed up to 750,000 cycles/ms.. and the host is actually able to complete them.

But then the scene changes quickly, perhaps you open a door and the scene requires a lot of rendering, and there's a bunch of NPCs all generating sound effects that the game needs to mix and then DMA transfer.

Right then, those cycles are going to be a lot more work.. and the host might actually need 10 ms or more to grind through 750,000 cycles to complete just one 1ms worth of DOS time.

The auto-cycler sees the ratio tank, and needs to brings cycles back down to a point where the host can keep up... but even then, it brings them down in a step-wise manner to avoid turning the game into a slide slow.

During this "in debt" phase, we're actually bleeding down our audio buffer and that 10% headroom (from the 90% target) will be used up (the DOSBox executable will be consuming 100% host CPU usage).

So it's there more like an internal buffer or "emergency fund" when we're in over-subscribed scenarios, and less about controlling host CPU usage.

@johnnovak
Copy link
Member Author

Super informative explanation on how the auto cycler works, @kcgen 🎖️ Ok, so you're advising to change the percentage target back to 90% then? (I changed it to 100%, being not fully across all this background info.)

Previously,  `cycles = max` or `cycles = max 100%` really meant 90% CPU
core utilisation, and the rather hacky `cycles = max 105%` corresponded
to about 95% actual utilisation. You could not go higher than that! This
perhaps made _some_ sense in the early 2000s when single-core CPUs were
the norm to protect inexperienced users, but as since about 2010 onwards
at least dual-cores became common, this is nothing more than some
confusing legacy cruft. Even the quite old and underpowered Raspberry Pi
3B released in 2016 has a 64-bit quad-core ARM Cortex...
@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

I'm consistently seeing this 5% gap between main and jn/max-cycles-fix with Jane's Combat Simulations: ATF - Advanced Tactical Fighters at 800x600 with everything maxed out (htop is calculating at 5 second interval):

Main at 105%:

main

Test at 100%:

test

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

Waiting until kcgen wakes up since he knows how this works. Thankfully he seems to agree its time to retire this 2004 era assumption.

My understanding always was that it targeted thread load 90% at 100% and 105% would bump to thread load 95% but i'll wait until kcgen wakes up.

@johnnovak
Copy link
Member Author

Waiting until kcgen wakes up since he knows how this works. Thankfully he seems to agree its time to retire this 2004 era assumption.

Well, now you're getting 100% utilisation with cycles = max 100%. That's what you wanted, no? And I think that's what we want, unless @kcgen disagrees.

@Grounded0
Copy link
Collaborator

Pinging our Quality Assurance Team:

@Burrito78

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

Interested to hear how this works on macOS/ARM64.

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

My impressions of "100% is 100%" is that the emulation isn't crapping out even if the main thread is pushed at times to 102% to 103% (in htop 5 second intervals) so it seems to be working much better than i expected. Me advocating for 100% being thread load 95% was just about being careful but in fact i'm starting to agree that "100% is 100%".

With this new model:

Old 105% is now 95%.

Old 100% is now 90%.

@johnnovak
Copy link
Member Author

With this new model:

Old 105% is now 95%.

Old 100% is now 90%.

I think that's the good way: you get what you ask for.

@Burrito78
Copy link
Collaborator

In PCPBench in Mode 122 (1600x1200x16bpp) i'm getting 90% in main with default settings and 99.x% using this PR.

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

Should be pretty safe merge since we're just working the main thread up to full capacity which forces kernel to schedule tasks to other threads leaving the main thread exclusively for DOSBox Staging if necessary. 👍

@johnnovak
Copy link
Member Author

Should be pretty safe merge since we're just working the main thread up to full capacity which forces kernel to schedule tasks to other threads leaving the main thread exclusively for DOSBox Staging if necessary. 👍

Waiting for @kcgen to approve too, then yeah, I'll press the Big Green Button 😎 🤘🏻

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

Someone should check cycles=auto and cycles=max too:

https://www.dosbox.com/wiki/Performance

I'm super sleep deprived and need to sleep now.

@kcgen
Copy link
Member

kcgen commented Nov 20, 2023

Tested with Screamer 2 on the Pi 400, and the audio gaps occur slightly later than versus main.

Cycles are allowed to spool up a bit higher for a bit longer before "going into debt" (that is, when the host needs more than 1 ms to generate 1 ms worth of DOS time during heavy load periods).

However I'm not counting any additional gaps across the entire Screamer 2 demo period - so the Pi is fast enough to recover (refilling the bled-down audio buffer) once the ratio crushes the subsequent cycle count back to manageable levels for the host.

So I think this change is good. Merging this on the heels of @kklobe's Adjust ticksDone with render time PR so the two are back-to-back given the common theme.

@kcgen kcgen merged commit 5b70dc2 into main Nov 20, 2023
@johnnovak
Copy link
Member Author

Thanks for testing this on the Pi @kcgen 🚀

@Grounded0
Copy link
Collaborator

Grounded0 commented Nov 20, 2023

We were chatting on IRC on other benefits of this bit of code that may be directly invisible. Mainly the CPU scheduler seeing we're "ringing the bell" by running sometimes at over 100% thread load and giving us those sweet boost clocks faster and more aggressively.

@johnnovak
Copy link
Member Author

We were chatting on IRC on other benefits of this bit of code that may be directly invisible. Mainly the CPU scheduler seeing we're "ringing the bell" by running sometimes at over 100% thread load and giving us those sweet boost clocks faster and more aggressively.

That's a very interesting point; wasn't aware of that but makes sense!

@Grounded0
Copy link
Collaborator

Tested all the code paths that i know of:

cycles=auto "ringing the bell" at 100%.

cycles=max "ringing the bell" at 100%.

@kcgen kcgen deleted the jn/max-cycles-fix branch December 4, 2023 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or enhancement of existing features

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

CPU max 100% being CPU thread load 90% is outdated

4 participants