MyGit

0.8.9

Mozilla-Ocho/llamafile

版本发布时间: 2024-07-02 03:11:46

Mozilla-Ocho/llamafile最新发布版本:0.8.13(2024-08-19 01:22:48)

This release gets Gemma2 working closer to how Google intended.

This release fixes Android support. You can now run LLMs on your phone using Cosmopolitan software like llamafile. Thank you @aj47 (techfren.net) for bug reports and and testing efforts. See also other bug fixes described by the Cosmopolitan v3.5.4 and v3.5.3 release notes.

Our future replacement for the server now has an /embedding endpoint. On my workstation, it's currently able to serve 851 requests per second for a prompt with 52 tokens, using the all-MiniLM-L6-v2.Q6_K.gguf embeddings model. None of the requests fail and 99th percentile latency is 56.74ms.

You can try the new embedding server as follows:

make -j o//llamafile/server/main
o//llamafile/server/main -m /weights/all-MiniLM-L6-v2.F32.gguf
curl http://127.0.0.1:8080/embedding?prompt=orange

Compatibility with the old server's API of posting JSON content will be added in upcoming changes. The same goes for the OpenAI API. The goal's to be compatible with everything.

相关地址:原始地址 下载(tar) 下载(zip)

1、 llamafile-0.8.9 28.62MB

2、 llamafile-0.8.9.zip 59.19MB

查看:2024-07-02发行的版本