Skip to content

Add docs for OpenCL #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 33 commits into from
Closed

Add docs for OpenCL #18

wants to merge 33 commits into from

Conversation

lhez
Copy link
Collaborator

@lhez lhez commented Feb 6, 2025

This PR adds some documentation for OpenCL backend.

engelmi and others added 30 commits January 28, 2025 08:32
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.

Signed-off-by: Michael Engel <[email protected]>
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during ggml-org#5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases
As pulling protocols to llama-run

Signed-off-by: Eric Curtin <[email protected]>
…le instantation bug (ggml-org#11080)

This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
* ci : fix build CPU arm64

* failed, trying ubuntu 22

* vulkan: ubuntu 24

* vulkan : jammy --> noble
…-org#11473)

The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
This commit enables the `--no-warmup` option for the llama-embeddings.

The motivation for this change is to allow the user to disable the
warmup when running the the program.
…(ggml/1065)

some threads kept looping and failed to terminate properly after an abort during CPU execution.

Co-authored-by: issi <[email protected]>
* Add option to not print stack on abort

Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.

* Update ggml/src/ggml.c

---------

Co-authored-by: Diego Devesa <[email protected]>
People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <[email protected]>
…gml-org#11436)

* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging
* server : update auto gen files comments

This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.

The motivation for this change is that `deps.sh` was removed in
Commit 91c36c2 ("server : (web ui)
Various improvements, now use vite as bundler (ggml-org#10599)").

* squash! server : update auto gen files comments [no ci]

Move comments about file generation to README.md.

* squash! server : update auto gen files comments [no ci]

Remove the comments in server.cpp that mention that information
can be found in the README.md file.
…-org#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <[email protected]>
…ja functionality (ggml-org#11489)

* add /apply-template endpoint to server

* remove unnecessary line

* add /apply-template documentation

* return only "prompt" field in /apply-template

* use suggested idea instead of my overly verbose way
This commit updates some of JSON snippets in README.md file and
removes the `json` language tag from the code blocks.

The motivation for this changes is that if there is invalid json in a
code snippet these are highlighted in red which can make it somewhat
difficult to read and can be a little distracting.
This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.

The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.

Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md
ochafik and others added 3 commits January 30, 2025 19:13
…istral, Firefunction, DeepSeek) w/ lazy grammars (ggml-org#9639)


---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.