-
-
Notifications
You must be signed in to change notification settings - Fork 172
Let Meson use a system call to detect the page size #3174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a white-space only change made with "muon fmt".
Previously we ran local binaries to get the page size, however some systems don't have them (such as RetroPie, which is missing the 'pagesize' binary). The good news is both Microsoft and POSIX both have long lived extremely standardized calls to get the page size: - https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsysteminfo Supported on Windows 2000 and greater. - https://pubs.opengroup.org/onlinepubs/9699919799/ POSIX: supported since 2001 By using a C syscall, Meson will build and run it using the (potential) cross-compiler's runnable environment (if one is configured). This is particularly important when cross compiling, as we want the destination host's page size and not the local build machine's page size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code all looks good and this certainly looks like a win for native compilation. I don't understand how this would work for cross compilation though. If I'm compiling for ARM on my x86 machine, I compile that small program but then I can't execute the resulting binary unless I have qemu or something installed (which is not typically part of a cross compile environment).
The old method probably isn't any better though (running some command may just return host-side value instead of target value). Plus you've introduced an option to manually set page size so someone can always use that if they're compiling for some esoteric architecture.
As an addition to what @weirddan455 mentioned, if it is a syscall, isn't it also getting the info from the running kernel? Which might be the host kernel even with qemu-user? |
I like this idea. How about we just go all the way and modify the code to compute and use this at runtime instead of compile-time? I think the need for this is largely (totally?) confined to |
Yeah, maybe this makes sense to be a runtime value. It's probably out of scope for what @kcgen was planning here though (unless he feels like doing it). IIRC there's some compile time |
This is where Meson leaves the other build systems in dust; it has first-class handling for cross compiling, including three-way cross compiling (where you cross-compile your cross-compiler as an intermediary). When setting it up, you populate a cross build definition file, and one of the fields is an So that's where you'd plug in qemu / wine / etc.. or even a script that does something like Great walk through here: https://mesonbuild.com/Cross-compilation.html
The |
Certainly up to @kcgen. I've already got a version working, it's a very small change, and can't see any place else in the codebase that uses |
It's a really nice idea!
Yeah, this is my biggest worry. Currently, the compiler see these pefect-power-of-2 hard-coded page-size quantities in all the CPU and cache array sizes. So if there's anywhere it can lay down some fast SSE or NEON, it would be there (I hope!). But ultimately the benchmarks will prove it out. Unfortunately I'm not able to take on that level of benchmarking or do some finer-grained inspection of these differences right now though (of course: 100% welcome for anyone to follow up and make it a runtime value). |
@kklobe: maybe we can merge this, and then follow up w/ yours? We've got plenty of time to hit regressions and compare performance (and it would be a small revert in the worst cast). |
Sounds good; preliminary QTD and PERFDOS benchmarks show no change (I don't see any array sizing using that value, only the DOS page-size values, which of course wouldn't be affected by this). I'll throw it in a PR and let the team decide! |
Right on, @kklobe! |
@kklobe @kcgen Um, seems like massive overkill? Has anyone proven not just using 4k has any measurable benefit? I think assuming 64 byte cache lines and 4k page sizes gets you 99% there on modern hardware, so why complicate life? 😄 At least, I'd measure first, then if the gains are significant, then complicate. Especially if the likelihood for regressions and bugs is nonzero, but the likelihood for performance improvements is unproven... |
Page size needs to be correct for the dynamic cores. We definitely want runtime detection to handle aarch64 processors that support setting either 4k or 16k page size at boot time. |
Also, hardware and operating systems that support write-or-execute flagging (W^X) require flagging of whole pages. That's probably what crashed the Pi5: we asked to flag only a 4k portion of a 16K page (and were denied). So dynrec didn't even get off the ground to fill its first page. |
Description
Previously we ran local binaries to get the page size, however some systems don't have them (such as RetroPie, which is missing the
pagesize
binary).The good news is both Microsoft and POSIX both have long-lived extremely standardized calls to get the page size:
https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsysteminfo
Supported on Windows 2000 and greater.
https://pubs.opengroup.org/onlinepubs/9699919799/
Supported on POSIX systems since 2001
By using a C syscall, Meson will build and run it using the cross-compiler's runnable environment (if one is configured).
This is particularly important when cross compiling, as we want the destination host's page size and not the local build machine's page size.
(If a cross-compiler isn't configured Meson just runs it on the local build machine)
Finally, this PR also lets users (optionally) configure the page size via Meson's setup options, such as:
meson setup -Dpagesize=16384 ...
Some machines support multiple page sizes - so there's no harm in letting them override it or experiment.
Why page size is important
The dynamic core converts DOS instructions on the fly into a sequence of host machine instructions, packed into a "page", and passes it to the host as if it's native compiled code. The host then executes that page of machine byte code, one page at a time.
Some processors have boot time switches that can control the page size, like the pi5.
So there's no wiggle room; we either get it correct or DOSBox just crashes straight away as soon as the dynamic recompiler passes its first generated page of instructions in, like we saw here.
Also, hardware and operating systems that support write-or-execute flagging (W^X) require flagging of whole pages. That's probably what crashed the Pi5: we asked to flag only a 4k portion of a 16K page (and were denied). So dynrec didn't even get off the ground to fill its first page.
This is going to become a bigger problem with arm64 and aarch64 chips now becoming very common.
Related issues
Fixes #3160.
Manual testing
Tested on x86-64 Linux, the Pi4 w/ Linux, and Apple's M1.
Checklist
I have: