0.8.5
版本发布时间: 2024-05-25 17:06:24
Mozilla-Ocho/llamafile最新发布版本:0.8.13(2024-08-19 01:22:48)
This release fixes bugs and introduces @kawrakow's latest quant
performance enhancements (a feature exclusive to llamafile). As of #435
the K quants now go consistently 2x faster than llama.cpp upstream. On
big CPUs like Threadripper we've doubled the performance of tiny models,
for both prompt processing and token generation for tiny models (see the
benchmarks below) The llamafile-bench
and llamafile-upgrade-engine
commands have been introduced.
- a86e7ce Add Script To Upgrade llamafile Archives (#412)
- 07e87bf 261dfe7 Fix llamafile-quantize and rewrite documentation
- 938cf72 Faster AVX2 matrix multiplications for MoE models (#428)
- eaa756d Faster AVX2 matrix multiplications for legacy quants (#405)
- 7cb15c6 Another performance optimization for Zen4 + refactoring (#435)
- 9206719 8b2f8d8 e675719 4451c6d Introduce llamafile-bench command (cpu mode only)
- 87d4ce1 Fix f16 cpuid check (caused crashes on sandybridge)
- 5c40565 8d1afe4 Avoid crashing on llava ctrl-c
- c0aa43e Introduce bf16 cuda support
- 00e4f72 Enable GGML_CUDA_FORCE_MMQ in tinyBLAS mode
- d228e01 0b5997d 64fbffc Sync with llama.cpp upstream (#427)
- c660d38 Add text embedding models to 'other example llamafiles' table (#422)
- 49cc13c Updated README with instructions to load models from third-party apps (#417)
Note: Please use llamafile v0.8.4 if you need prebuilt (driver-only) AMD GPU support on Windows, at least for the next few weeks, until https://github.com/ggerganov/llama.cpp/issues/7156 is resolved.
Binaries run on Linux, Windows, MacOS, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64. Supported GPUs are CUDA, ROCm, and Metal. Prebuilt GPU binaries are provided for CUDA/ROCm on Linux, and CUDA on Windows. To install this release on systems with a POSIX-style shell:
sudo -s
cd /usr/local
wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.5/llamafile-0.8.5.zip
unzip llamafile-0.8.5.zip
exit
llamafile --help
To upgrade your old llamafiles without needing to redownload, run:
llamafile-upgrade-engine old.llamafile new.llamafile
Prebuilt llamafiles that have the LLM weights included are available at:
- https://huggingface.co/Mozilla (official)
- https://huggingface.co/models?library=llamafile (community)
Here are some tutorials:
- https://justine.lol/oneliners/
- https://github.com/mozilla-ocho/llamafile/
- https://future.mozilla.org/news/llamafiles-for-embeddings-in-local-rag-applications/
- https://blog.mozilla.ai/local-llm-as-judge-evaluation-with-lm-buddy-prometheus-and-llamafile/
- https://www.docker.com/blog/a-quick-guide-to-containerizing-llamafile-with-docker-for-ai-applications/
Here are some performance benchmarks for various quantization formats, on the world's flagship CPUs. See https://justine.lol/matmul/ to compare these numbers to where we were back in March two months ago.
cpu_info | model_filename | size | test | t/s |
---|---|---|---|---|
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.BF16 | 86.99 GiB | pp512 | 447.01 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.BF16 | 86.99 GiB | tg16 | 11.35 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.F16 | 86.99 GiB | pp512 | 340.63 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.F16 | 86.99 GiB | tg16 | 11.01 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q8_0 | 46.22 GiB | pp512 | 288.16 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q8_0 | 46.22 GiB | tg16 | 15.82 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q6_K | 35.74 GiB | pp512 | 431.51 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q6_K | 35.74 GiB | tg16 | 22.73 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q5_K_M | 30.95 GiB | pp512 | 427.71 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q5_K_M | 30.95 GiB | tg16 | 24.90 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q4_K_M | 26.49 GiB | pp512 | 440.03 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q4_K_M | 26.49 GiB | tg16 | 27.31 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q4_0 | 24.63 GiB | pp512 | 287.51 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q4_0 | 24.63 GiB | tg16 | 18.92 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q3_K_M | 21.00 GiB | pp512 | 433.89 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q3_K_M | 21.00 GiB | tg16 | 30.30 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q3_K_S | 19.03 GiB | pp512 | 432.36 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q3_K_S | 19.03 GiB | tg16 | 31.34 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q2_K | 16.12 GiB | pp512 | 449.64 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mixtral-8x7b-instruct-v0.1.Q2_K | 16.12 GiB | tg16 | 33.71 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | pp512 | 2103.25 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | tg16 | 57.34 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | pp512 | 2603.84 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | tg16 | 77.18 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | pp512 | 2038.64 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | tg16 | 80.23 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | pp512 | 2203.77 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | tg16 | 100.78 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | pp512 | 2838.05 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | tg16 | 135.27 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | pp512 | 2328.06 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | tg16 | 138.15 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | pp512 | 2676.14 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | tg16 | 143.58 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | pp512 | 2281.44 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | tg16 | 145.02 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | pp512 | 2757.59 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | tg16 | 143.59 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | pp512 | 2444.11 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | tg16 | 148.50 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | pp512 | 2758.90 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | tg16 | 149.92 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | pp512 | 2847.95 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | tg16 | 150.84 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | pp512 | 2420.58 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | tg16 | 154.27 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | pp512 | 2743.74 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | tg16 | 155.29 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | pp512 | 2779.92 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | tg16 | 157.92 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | pp512 | 2758.16 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | tg16 | 162.65 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | pp512 | 2777.59 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | tg16 | 166.82 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | pp512 | 384.37 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | tg16 | 40.00 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | pp512 | 386.59 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | tg16 | 49.91 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | pp512 | 703.34 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | tg16 | 47.44 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | pp512 | 700.94 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | tg16 | 94.79 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | pp512 | 225.57 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | tg16 | 103.42 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | pp512 | 224.11 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | tg16 | 103.06 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | pp512 | 248.61 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | tg16 | 106.27 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | pp512 | 250.70 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | tg16 | 108.10 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | pp512 | 237.00 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | tg16 | 104.68 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | pp512 | 281.29 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | tg16 | 115.67 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | pp512 | 316.26 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | tg16 | 119.35 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | pp512 | 306.96 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | tg16 | 107.95 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | pp512 | 659.77 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | tg16 | 135.96 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | pp512 | 207.70 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | tg16 | 102.14 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | pp512 | 230.59 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | tg16 | 93.07 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | pp512 | 205.75 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | tg16 | 100.52 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | pp512 | 247.06 |
Apple M2 Ultra (+fp16+dotprod) | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | tg16 | 106.44 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | pp512 | 27.84 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.F32 | 4.10 GiB | tg16 | 2.10 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | pp512 | 28.09 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.BF16 | 2.05 GiB | tg16 | 4.55 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | pp512 | 58.27 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.F16 | 2.05 GiB | tg16 | 4.89 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | pp512 | 44.60 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q8_0 | 1.09 GiB | tg16 | 8.36 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | pp512 | 18.21 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q6_K | 860.86 MiB | tg16 | 11.47 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | pp512 | 16.89 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_1 | 791.50 MiB | tg16 | 12.43 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | pp512 | 19.38 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_K_M | 745.11 MiB | tg16 | 13.22 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | pp512 | 18.35 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_0 | 729.84 MiB | tg16 | 13.20 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | pp512 | 19.51 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q5_K_S | 729.84 MiB | tg16 | 13.68 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | pp512 | 20.12 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_1 | 668.18 MiB | tg16 | 14.67 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | pp512 | 24.52 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_K_M | 636.18 MiB | tg16 | 14.61 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | pp512 | 25.78 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_K_S | 609.53 MiB | tg16 | 15.69 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | pp512 | 42.03 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q4_0 | 606.53 MiB | tg16 | 15.32 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | pp512 | 17.40 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_L | 563.42 MiB | tg16 | 13.83 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | pp512 | 18.82 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_M | 522.30 MiB | tg16 | 14.47 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | pp512 | 16.29 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q3_K_S | 475.51 MiB | tg16 | 13.77 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | pp512 | 19.77 |
Raspberry Pi 5 | TinyLlama-1.1B-Chat-v1.0.Q2_K | 411.41 MiB | tg16 | 16.48 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.F32 | 26.98 GiB | pp512 | 398.57 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.F32 | 26.98 GiB | tg16 | 10.18 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.BF16 | 13.49 GiB | pp512 | 759.25 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.BF16 | 13.49 GiB | tg16 | 19.29 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.F16 | 13.49 GiB | pp512 | 559.94 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.F16 | 13.49 GiB | tg16 | 19.26 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q8_0 | 7.17 GiB | pp512 | 518.76 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q8_0 | 7.17 GiB | tg16 | 26.31 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q6_K | 5.53 GiB | pp512 | 726.13 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q6_K | 5.53 GiB | tg16 | 38.65 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_1 | 5.07 GiB | pp512 | 534.04 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_1 | 5.07 GiB | tg16 | 38.68 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_K_M | 4.78 GiB | pp512 | 723.25 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_K_M | 4.78 GiB | tg16 | 41.13 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_0 | 4.65 GiB | pp512 | 536.67 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_0 | 4.65 GiB | tg16 | 42.46 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_K_S | 4.65 GiB | pp512 | 651.05 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q5_K_S | 4.65 GiB | tg16 | 42.14 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_1 | 4.24 GiB | pp512 | 572.67 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_1 | 4.24 GiB | tg16 | 43.19 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_K_M | 4.07 GiB | pp512 | 728.48 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_K_M | 4.07 GiB | tg16 | 44.29 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_K_S | 3.86 GiB | pp512 | 666.82 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_K_S | 3.86 GiB | tg16 | 45.18 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_0 | 3.83 GiB | pp512 | 562.96 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q4_0 | 3.83 GiB | tg16 | 48.02 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_L | 3.56 GiB | pp512 | 706.64 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_L | 3.56 GiB | tg16 | 46.82 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_M | 3.28 GiB | pp512 | 715.62 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_M | 3.28 GiB | tg16 | 48.29 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_S | 2.95 GiB | pp512 | 722.11 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q3_K_S | 2.95 GiB | tg16 | 49.76 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q2_K | 2.53 GiB | pp512 | 739.28 |
AMD Ryzen Threadripper PRO 7995WX (znver4) | mistral-7b-instruct-v0.2.Q2_K | 2.53 GiB | tg16 | 53.01 |
1、 llamafile-0.8.5 33.79MB
2、 llamafile-0.8.5.zip 72.87MB
3、 llamafile-bench-0.8.5 7.51MB
4、 llamafile-quantize-0.8.5 7.24MB
5、 zipalign-0.8.5 752.92KB