-
Notifications
You must be signed in to change notification settings - Fork 12.5k
tests : update for LLAMA_SET_ROWS=1 #14961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e1ebdea
to
d6233d6
Compare
tests/test-thread-safety.cpp
Outdated
// each context has a single sequence | ||
cparams.n_seq_max = 1; | ||
|
||
// prevent from launching too many threads | ||
cparams.n_threads = std::min<int>(std::max(2u, std::thread::hardware_concurrency()/params.n_parallel), cparams.n_threads); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slaren Small change to the test to make it compatible with split KV cache. Reduced the number of CPU threads because on the MacBook the process takes a long time (several minutes) to terminate (think it's some resource congestion when there are many threads started by the process, not sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a known issue with the thread pool implementation, using more threads than available will result in the threads spending more time spinning than doing work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that it is good to ignore the parameters of the user to workaround what essentially is a bug. Can this be solved by running the test with -t 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, -t 1
works. I was thinking to use -t 2
so we have context-level concurrency too. With -t 2
the test also runs cleanly on my devices.
target #14960
Extract the test updates from #14959 in a separate PR to be merged before enabling
LLAMA_SET_ROWS=1
by default.Test updates:
test-thread-safety
cparams.n_seq_max = 1
embedding
-np
is not specifiedsave-load-state
-np
is not specified