0.8.7
版本发布时间: 2024-06-24 23:00:18
Mozilla-Ocho/llamafile最新发布版本:0.8.13(2024-08-19 01:22:48)
This release includes important performance enhancements for quants.
- 293a528 Performance improvements on Arm for legacy and k-quants (#453)
- c38feb4 Optimized matrix multiplications for i-quants on
__aarch64__
(#464)
This release fixes bugs. For example, we're now using a brand new memory manager, which is believed to support platforms like Android that have a virtual address space with fewer than 47 bits. This release also restores our prebuilt Windows AMD GPU support, thanks to tinyBLAS.
- 0c0e72a Upgrade to Cosmopolitan v3.5.1
- 629e208 Fix server crash due to /dev/urandom
- 60404a8 Always use tinyBLAS with AMD GPUs on Windows
- 6d3590c Pacify --temp flag when running in server mode
- a28250b Update GGML_HIP_UMA (#473)
- e973fa2 Improve CPU brand detection
- 9cd8d70 Update sever README build/testing instructions (#461)
It should be noted that, in future releases, we plan to introduce a new server for llamafile. This new server is being designed for performance and production-worthiness. It's not included in this release, since the new server currently only supports a tokenization endpoint. However the endpoint is capable of doing 2 million requests per second whereas with the current server, the most we've ever seen is a few thousand.
- e0656ea Introduce new llamafile server
1、 llamafile-0.8.7 24.38MB
2、 llamafile-0.8.7.zip 50.87MB
3、 llamafile-bench-0.8.7 8.17MB
4、 zipalign-0.8.7 604.49KB