0.8.13

版本发布时间: 2024-08-19 01:22:48

Mozilla-Ocho/llamafile最新发布版本:0.8.13(2024-08-19 01:22:48)

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

llamafile lets you distribute and run LLMs with a single file

llamafile is a local LLM inference tool introduced by Mozilla Ocho in Nov 2023, which offers superior performance and binary portability to the stock installs of six OSes without needing to be installed. It features the best of llama.cpp and cosmopolitan libc while aiming to stay ahead of the curve by including the most cutting-edge performance and accuracy enhancements. What llamafile gives you is a fun web GUI chatbot, a turnkey OpenAI API compatible server, and a shell-scriptable CLI interface which together put you in control of artificial intelligence.

v0.8.13 changes

This release synchronizes with upstream projects, bringing with it support for the newest models (e.g. Gemma 2B). Support for LLaMA v3 has been significantly improved.

e9ee3f9 Synchronize with llama.cpp upstream
d0b5e8f Upgrade to Cosmopolitan v3.7.1

The new llamafiler server is now able to serve 2400 embeddings per second on CPU. That's 3x faster than the llama.cpp server upstream. It's now hardened for security. You should be able to safely use it a public facing web server. There's a man page for llamafiler. You can also read the docs online: /llamafile/server/doc/index.md.

070aa13 Bring new server up to 2421 embedding/sec
584a327 Increase tokens per second on tiny models
99dd1c0 Add seccomp, tokenbucket, and batch prioritization
cda83f8 Make GGML threads spawn 10x faster
d451e0e Add chrome://tracing/ feature

The new llamafiler server now fully supports all the old embedding endpoints that were provided by llamafile --server. Support for serving embeddings has been removed from the old server.

be94c1f Add OpenAI /v1/embeddings to new llamafiler server

This release introduces whisperfile which is a single-file implementation of OpenAI's Whisper model. It lets you transcribe speech to text and even translate it too. Our implementation is based off Georgi Gerganov's whisper.cpp project. The project to turn it into a whisperfile was founded by CJ Pais who's handed over maintenance of his awesome work. There's a man page for whisperfile (which also can be viewed by running ./whisperfile --help) and we have online documentation with markdown tutorials at /whisper.cpp/doc/index.md.

fd891be Merge whisperfile into llamafile (#517)
7450034 Use colorblind friendly TTY colors in whisperfile
ggerganov/whisper.cpp#2360 (our fork is upstreaming changes)

We developed a faster, more accurate implementation of GeLU. This helps improve the performance of tiny models. It leads to measurable quality improvements in whisper model output.

8ace604 Write explicitly vectorized GeLU functions
b5748f3 Implement the real trick to GeLU with proof
ggerganov/llama.cpp#8878 (our fork is upstreaming changes)

We've been improving floating point numerical stability for very large models, e.g. Mixtral 8x22b and Command-R-Plus. tinyBLAS on CPU for F32, F16, and BF16 weights now uses a new zero-overhead divide-and-conquer approach to computing dot products, which we call ruler reduction, that can result in a 10x reduction in worst case roundoff error accumulation.

cb817f5 Reduce rounding errors for very large models
5b06924 Use ruler reduction for GGML dot products

This release introduces sdfile, which is our implementation of stable diffusion. No documentation is yet provided for this command, other than the docs provided by the upstream stable-diffusion.cpp project on which it's based.

3b7b1e3 Add stable-diffusion.cpp
25ceb2c Upgrade stable diffusion

The list of new architectures and tokenizers introduced by this version are: Open ELM, GPT NEOX, Arctic, DeepSeek2, ChatGLM, BitNet, T5, JAIS, Poro, Viking, Tekken, and CodeShell.

Known Issues

The llamafile executable size is increased from 30mb to 200mb by this release. This is caused by ggerganov/llama.cpp#7156. We're already employing some workarounds to minimize the impact of upstream development contributions on binary size, and we're aiming to find more in the near future.

相关地址：原始地址下载(tar) 下载(zip)

1、 llamafile-0.8.13 230.17MB

2、 llamafile-0.8.13.zip 472.09MB

3、 llamafile-bench-0.8.13 8.41MB

4、 sdfile-0.8.13 17.47MB

5、 whisperfile-0.8.13 225.79MB

查看：2024-08-19发行的版本