Releases · ggml-org/llama.cpp

03 Aug 20:49

d31192b

b6081 Latest

Latest

imatrix : use GGUF by default (#14842)

* imatrix : use GGUF by default

* imatrix : use GGUF regardless of the output filename

The legacy format can only be produced with --output-format dat

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-03T20:49:25Z
llama-b6081-bin-macos-arm64.zip

sha256:eeac0ac632b41db90e9481092b3afbefa4b1cf9ac9f0d45dcf622f60bd9dc9c7

10.7 MB 2025-08-03T20:49:36Z
llama-b6081-bin-macos-x64.zip

sha256:673a1807b659fe4fc98ae68f0d64e71aa82e1257a4887785c855f140929a21ba

27.4 MB 2025-08-03T20:49:37Z
llama-b6081-bin-ubuntu-vulkan-x64.zip

sha256:136b71f2f3d220047119e4d87baa4a1d58c2fcca1b067a73620b2b414304dc94

21 MB 2025-08-03T20:49:38Z
llama-b6081-bin-ubuntu-x64.zip

sha256:e00e1020c33bc5c3e420773d8ab1a48216fd7453d9d0aebe14a2c32e12fd3db5

12.6 MB 2025-08-03T20:49:39Z
llama-b6081-bin-win-cpu-arm64.zip

sha256:a9606a0dfd045c3d033be85b3aa138243a31deef0fee629af51adb3e69d0fad7

10.9 MB 2025-08-03T20:49:40Z
llama-b6081-bin-win-cpu-x64.zip

sha256:a4f37082751fe0887635e1a845c5c23564c7ef84dcccb1f0fc2582959390bfdc

13.8 MB 2025-08-03T20:49:41Z
llama-b6081-bin-win-cuda-12.4-x64.zip

sha256:60cc3ee4f25d5c92a6a4e75a23eb0eeeef9b7124c7fa43f0b113380d6cddc5a1

130 MB 2025-08-03T20:49:42Z
llama-b6081-bin-win-hip-radeon-x64.zip

sha256:448d87d2d1b2ec40bc182f5952c8803b3fb59a9c1d0c55c5811476b4996a4610

284 MB 2025-08-03T20:49:46Z
llama-b6081-bin-win-opencl-adreno-arm64.zip

sha256:a2a60c12bcefb5ee0810ce0ce83512f27c2331d0db0eaefd71e92c29b7170670

11.3 MB 2025-08-03T20:49:54Z
Source code (zip)

2025-08-03T20:00:05Z
Source code (tar.gz)

2025-08-03T20:00:05Z

03 Aug 20:43

github-actions

b6080

0a2f549

b6080

imatrix : fix 3d activation handling for hybrid and recurrent models …

Assets 15

03 Aug 20:30

github-actions

b6079

11a3811

b6079

memory : handle kv_unified for hybrid models (#15050)

Assets 15

03 Aug 20:28

github-actions

b6078

97366dc

b6078

vocab : JetBrains Mellum pre-tokenizer (#15045)

Assets 15

03 Aug 12:42

github-actions

b6076

6c7a441

b6076

vulkan: Use coopmat2 for conv2d (#14982)

Assets 15

02 Aug 18:11

github-actions

b6075

5c0eb5e

b6075

opencl: fix adreno compiler detection logic (#15029)

Assets 15

02 Aug 15:29

github-actions

b6074

03d4698

b6074

CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

Assets 15

02 Aug 15:27

github-actions

b6073

3303c19

b6073

cuda: make im2col a little faster (#15025)

Assets 15

02 Aug 15:18

github-actions

b6071

a4569c4

b6071

llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

Assets 15

02 Aug 15:04

github-actions

b6070

15e92fd

b6070

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6081

Uh oh!

b6080

Uh oh!

b6079

Uh oh!

b6078

Uh oh!

b6076

Uh oh!

b6075

Uh oh!

b6074

Uh oh!

b6073

Uh oh!

b6071

Uh oh!

b6070

Uh oh!