-
-
Notifications
You must be signed in to change notification settings - Fork 337
nv2a/vk: Update vertex ram buffer to finish only when drawing. #2345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
hw/xbox/nv2a/pgraph/vk/vertex.c
Outdated
// Vertex data changed while building the draw list. Finish drawing | ||
// before updating RAM buffer. | ||
pgraph_vk_finish(pg, VK_FINISH_REASON_VERTEX_BUFFER_DIRTY); | ||
} else if (overlaps_uploaded && !r->in_draw) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the call stack looks like for cases where the buffer is being flushed outside of a draw?
I don't know the VK code particularly well, but I worry that this is being done to allow GPU memory to be synced back to guest-CPU accessible memory. If so, it could mean that skipping the sync works fine in many cases but fails in cases where a title manipulates the framebuffer via CPU during drawing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know in my testing it seemed that there was little to no improvement in the fix should the else branch not do anything. The clearing of the bitmap though was more a safety net for me just in case. However, you bring up a good point and I can try checking if memory is any better/worse with the else branch. Having just the if branch with the in_draw qualifier could work on its own otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is not necessarily the bitmap clear, it's whether it's safe to skip the finish at all.
If the finish is being done outside of the context of a draw in order to get CPU memory to be consistent with the GPU state it wouldn't be safe to skip the finish. You'd need to look at the call stack when in_draw
is false to know why it's being called outside of a draw and then determine whether it's safe to skip or not. The original comment implies that it may be flushing the VK draw in order to read back into guest-CPU-accessible memory, which could make this a "do the wrong thing faster" regression.
For posterity/anybody following along: the Xbox CPU and GPU have unified memory access so it's able to interleave GPU draws with CPU manipulation/readback of the framebuffer. xemu needs to do extra work to sync CPU-modified surfaces to the host GPU and vice-versa in order to emulate this, so it sometimes does flushing that would not be typical for a modern app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, which is sadly limited as I can't test every XBOX game instance, I coded in an assert in the else to trigger and have yet to ever get it to throw. Not saying it means it will never happen, but just has never happened for me in the games I've tested. I could potentially just be very lucky in the race condition as well, if said race is actually happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm confused. If the branch that you added is never hit, what does this PR do? The original change would have either performed the original flush or gone down your new else
path, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it so it only flushes the vertex if the renderer is in_draw. The original if else of my PR just cleared the bitmap as a safety check for when we weren't in_draw, but as I said I never could find a time where this path was ever triggered. So, per your comments I removed the if else and the only change is to only process the buffer when in_draw. The current master version of XEMU without this check causes performance drops due to calling the flush whether we are in_draw or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but if the code never hits the non in_draw
path, then this change can't have any effect, right?
I think maybe when you added your assert you had some additional guarding it that never got hit. As I mentioned, my first concern is whether it's safe to skip the flush in all conditions, the bitmap clear is a secondary concern (though that also seems potentially unsafe if partial overlaps are possible).
The path I'd like you to check is the case where pgraph_vk_update_vertex_ram_buffer
is called with a bit set but without in_draw
. So instead of clearing the bitmap in your original "else" case, set a breakpoint or otherwise print the call stack there so you can try to verify that the flush really is safe to ignore.
Since this PR has an obvious effect on a game for you, either the else
branch will be taken in that game or you can probably conclude that the change itself isn't doing anything and there's something different about how you build it versus the automated builds that produces the performance improvement. My guess is that the else
branch will be taken and the assert you added previously had a logic bug that prevented it from triggering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a few moments after making the comment I realized what it was you were suggesting/asking. I retested, correcting the logic of my assert, and was immediately met with instances where find_next_bit()
returned true while not in a draw state (r->in_draw
).
pgraph_vk_finish()
as I understand it is called to insure that any in-flight draw operations using the old vertex buffer data are completed before the buffer is updated. If r->in_draw
is false, no draw is in progress so there are no in-flight operations that could be using staled data. If we call pgraph_vk_finish()
when r->in_draw
is false it means that no draw commands are currently referencing the buffer, and updating the buffer is safe as the GPU is not using the data at the moment. The find_next_bit()
could potentially be set due to previous operations, but unless a draw is in progress I believe there's no risk of a race or data hazard.
This handles bitmaps better when not drawing. This will improve performance in some known areas such as Dead or Alive 2 Ultimate's Burai Zenin stage and other areas with rain effects such as Miyama stage. In my personal testing it's increased performance in the affected areas by 50%, making each of the affected stages able to run at 60 FPS even at multiple internal render options. In the Video Debug this is most noticeable in the
FINISH_VERTEX_BUFFER_DIRTY
value.Before:

After:
