-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Added dynamic context size. This is perfect for servers running llama models as a service. #13295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ext size once the specofoed context size has reached. This is perfect for servers running llama models as a service.
Next goal is to get a dynamic context size working without the need for resetting memory. Is it possible? Let's see!! |
Hey, @ggerganov |
Hi, I am not convinced that this is a useful feature. IMO the application should pre-allocate the worst-case amount of memory that it plans to use. This way, if it is able to start, you have a guarantee that it will keep running without running out of memory at some later point. I don't see use cases where dynamically adjusting the context has an advantage compared to the existing logic. |
@ggerganov |
Finally achieved Dynamic modification of the context size without resetting the memory. |
73b85e4
to
f471c74
Compare
The context size which is used to allocate the space for model execution and KV caches, cannot be modified once the model and context params are initialized. This can be bad for servers running models as the context sizes are bound to increase overtime.
With dynamic context size, there is no need to restart the servers once the context size exceeds.
Dynamic context size is achieved by modifying the size of n_ctx in cparams followed by resetting the previous memory to create new memory using
memory.reset(model.create_memory(params_mem, cparams));
.As new memory is created, the earlier context is deleted, the best way to save and load the state to preserve.
I will add load state feature as a default while performing this operation in next commit.