-
Notifications
You must be signed in to change notification settings - Fork 12.5k
vulkan: Use coopmat2 for conv2d #14982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I upgraded the nvidia driver and the shader compiler and did a quick test. sd2, 512x512.
before (w/ prev pr): after: I also noticed that the first run of a specific pipeline seems to take longer. eg fresh after compilation:
Any following runs don't look like this. edit: sampling speed is now also faster with conv2d_direct used in the diffusion model. enabled: |
perf:
before:
after:
|
Looks good:
|
7a6b4d0
to
493e61b
Compare
The 4096 by 4096 case is unfortunately somewhat slower, however that is a synthetic test so it's not high priority. From #14933:
|
I have a couple more small changes that get another 10% or so, but haven't matched im2col for that case yet. I'll put those in a separate PR after this merges. |
Stacked on #14933, Draft until that's merged.
I haven't done any perf tuning on this yet, there may still be more perf to get.
Directed perf tests:
stable-diffusion: