Add docs for OpenCL #18

lhez · 2025-02-06T06:26:06Z

This PR adds some documentation for OpenCL backend.

The HTTP client in llama-run only prints an error in case the download of a resource failed. If the model name in the CLI parameter list is missing, this causes the application to crash. In order to prevent this, a check for the required model parameter has been added and errors for resource downloads get propagated to the caller. Signed-off-by: Michael Engel <[email protected]>

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during ggml-org#5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases

Signed-off-by: rare-magma <[email protected]>

As pulling protocols to llama-run Signed-off-by: Eric Curtin <[email protected]>

…le instantation bug (ggml-org#11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

* ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble

…-org#11473) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True

ggml-org#11466)

Signed-off-by: Molly Sophia <[email protected]>

This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.

…(ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <[email protected]>

* Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <[email protected]>

People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <[email protected]>

…gml-org#11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging

* server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit 91c36c2 ("server : (web ui) Various improvements, now use vite as bundler (ggml-org#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.

…-org#11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <[email protected]>

…ja functionality (ggml-org#11489) * add /apply-template endpoint to server * remove unnecessary line * add /apply-template documentation * return only "prompt" field in /apply-template * use suggested idea instead of my overly verbose way

This commit updates some of JSON snippets in README.md file and removes the `json` language tag from the code blocks. The motivation for this changes is that if there is invalid json in a code snippet these are highlighted in red which can make it somewhat difficult to read and can be a little distracting.

…ml-org#11503)

This commit replaces the two usages of `std::bind` in favor of lambdas for the callback functions for `callback_new_task` and `callback_update_slots`. The motivation for this changes is consistency with the rest of the code in server.cpp (lambdas are used for all other callbacks/handlers). Also lambdas are more readable (perhaps this is subjective) but also they are recommended over `std::bind` in modern C++. Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md

…ml-org#11496)

…istral, Firefunction, DeepSeek) w/ lazy grammars (ggml-org#9639) --------- Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]>

engelmi and others added 30 commits January 28, 2025 08:32

docker: add perplexity and bench commands to full image (ggml-org#11438)

f643120

Signed-off-by: rare-magma <[email protected]>

cmake : don't fail on GGML_CPU=OFF (ggml-org#11457)

4bf3119

docker: allow installing pip packages system-wide (ggml-org#11437)

d7d1ecc

Signed-off-by: rare-magma <[email protected]>

Add github protocol pulling and http:// (ggml-org#11465)

7fee288

As pulling protocols to llama-run Signed-off-by: Eric Curtin <[email protected]>

HIP: Only call rocblas_initialize on rocblas versions with the multip…

cae9fb4

…le instantation bug (ggml-org#11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

HIP: Supress transformation warning in softmax.cu

be5ef79

loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

ci : fix build CPU arm64 (ggml-org#11472)

d0c0804

* ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble

server : Fixed wrong function name in llamacpp server unit test (ggml…

cf8cc85

…-org#11473) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True

cmake: add hints for locating ggml on Windows using Llama find-package (

794fe23

ggml-org#11466)

llama: fix missing k_cache store for rwkv6qwen2 (ggml-org#11445)

325afb3

Signed-off-by: Molly Sophia <[email protected]>

embedding : enable --no-warmup option (ggml-org#11475)

b636228

This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. …

d2e518e

…(ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <[email protected]>

sync : ggml

8158577

Parse https://ollama.com/library/ syntax (ggml-org#11480)

f0d4b29

People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <[email protected]>

vulkan: Catch pipeline creation failure and print an error message (g…

2711d02

…gml-org#11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging

readme : reference examples relative links (ggml-org#11505)

7919256

server : (docs) added response format for /apply-template [no ci] (gg…

496e5bf

…ml-org#11503)

vocab : correctly identify LF token for GPT-2 style BPE tokenizer (gg…

ffd0821

…ml-org#11496)

sync: minja (ggml-org#11499)

3d804de

CUDA/HIP: add warp_size to cuda_device_info

c300e68

HIP: Prepare reduction operators for wave 64

6af1ca4

HIP: require at least HIP 5.5

27d135c

ochafik and others added 3 commits January 30, 2025 19:13

ci: ccache for all github worfklows (ggml-org#11516)

553f1e4

docs: add OpenCL

a039ef4

github-actions bot added documentation Improvements or additions to documentation SYCL Vulkan testing examples server ggml build python devops Nvidia GPU script labels Feb 6, 2025

lhez closed this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add docs for OpenCL #18

Add docs for OpenCL #18

Uh oh!

lhez commented Feb 6, 2025

Uh oh!

Uh oh!

Add docs for OpenCL #18

Add docs for OpenCL #18

Uh oh!

Conversation

lhez commented Feb 6, 2025

Uh oh!

Uh oh!