Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6081
b6080
imatrix : fix 3d activation handling for hybrid and recurrent models …
b6079
memory : handle kv_unified for hybrid models (#15050)
b6078
vocab : JetBrains Mellum pre-tokenizer (#15045)
b6076
vulkan: Use coopmat2 for conv2d (#14982)
b6075
opencl: fix adreno compiler detection logic (#15029)
b6074
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)
b6073
cuda: make im2col a little faster (#15025)
b6071
llama : enable LLAMA_SET_ROWS=1 by default (#14959) ggml-ci
b6070
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038) * cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch