Skip to content

Releases: ggml-org/llama.cpp

b6081

03 Aug 20:49
d31192b
Compare
Choose a tag to compare
imatrix : use GGUF by default (#14842)

* imatrix : use GGUF by default

* imatrix : use GGUF regardless of the output filename

The legacy format can only be produced with --output-format dat

b6080

03 Aug 20:43
0a2f549
Compare
Choose a tag to compare
imatrix : fix 3d activation handling for hybrid and recurrent models …

b6079

03 Aug 20:30
11a3811
Compare
Choose a tag to compare
memory : handle kv_unified for hybrid models (#15050)

b6078

03 Aug 20:28
97366dc
Compare
Choose a tag to compare
vocab : JetBrains Mellum pre-tokenizer (#15045)

b6076

03 Aug 12:42
6c7a441
Compare
Choose a tag to compare
vulkan: Use coopmat2 for conv2d (#14982)

b6075

02 Aug 18:11
5c0eb5e
Compare
Choose a tag to compare
opencl: fix adreno compiler detection logic (#15029)

b6074

02 Aug 15:29
03d4698
Compare
Choose a tag to compare
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

b6073

02 Aug 15:27
3303c19
Compare
Choose a tag to compare
cuda: make im2col a little faster (#15025)

b6071

02 Aug 15:18
a4569c4
Compare
Choose a tag to compare
llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

b6070

02 Aug 15:04
15e92fd
Compare
Choose a tag to compare
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch