v0.2.0
版本发布时间: 2024-07-19 20:51:35
EricLBuehler/mistral.rs最新发布版本:v0.3.1(2024-09-29 23:39:44)
New features
- Support .bin, .pt, .pth extensions
- Add Starcoder 2 GGUF
- 🔥 PagedAttention - beating llama.cpp running GGUF plus all the throughput benefits 😉
- Optimized performance and memory usage
Rust MSRV
MSRV of mistral.rs
v0.2.0 is 1.75.
What's Changed
- Fix SWA order (flip it) for Gemma 2 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/554
- Support .bin, .pt, .pth extensions by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/557
- Update readme by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/558
- Fix Starcoder 2 ISQ by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/559
- Update deps by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/560
- Add the starcoder2 GGUF arch by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/561
- Readme update for starcoder2 gguf by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/562
- Fix PyPI release trigger by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/566
- Optimize multi-batch and inference performance with PagedAttention by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/552
- [Breaking] Version 0.2.0 by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/527
- Paged attention support for vision models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/567
- Automatically use paged attn on cuda, get memory size by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/568
- Add docs link for vision loader by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/570
- Add matching for valid model weight names by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/571
- Remove ensure about no paged attn for vision models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/573
- Add percentage utilization support to paged attn by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/574
- Include block engine in paged attn metadata by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/576
- Update deps and sync Candle by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/578
- Optimize CLIP model by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/579
- Use softmax_last_dim in CLIP by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/580
- Fix method of calculating paged attn with util percent by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/581
- Handle windows in paged attn build by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/577
- Warn instead of error when paged attn not supported by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/583
- Warn instead of error when paged attn for adapters not supported by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/584
- Add support for lm_head to adapter models by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/586
- Add default plotly feature by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/587
- Improve memory handling of PagedAttention with GGUF by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/590
- Fix Windows build on cuda w/ PagedAttention by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/589
- Update cuda kernels build.rs on windows by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/591
- Bump version to 0.2.0 and update docs by @EricLBuehler in https://github.com/EricLBuehler/mistral.rs/pull/582
Full Changelog: https://github.com/EricLBuehler/mistral.rs/compare/v0.1.26...v0.2.0
Install mistralrs-server 0.2.0
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.0/mistralrs-server-installer.sh | sh
Download mistralrs-server 0.2.0
File | Platform | Checksum |
---|---|---|
mistralrs-server-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
mistralrs-server-x86_64-apple-darwin.tar.xz | Intel macOS | checksum |
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |
1、 dist-manifest.json 12.71KB
2、 mistralrs-server-aarch64-apple-darwin-update 5.79MB
3、 mistralrs-server-aarch64-apple-darwin.tar.xz 7.6MB
4、 mistralrs-server-aarch64-apple-darwin.tar.xz.sha256 111B
5、 mistralrs-server-installer.sh 30.87KB
6、 mistralrs-server-x86_64-apple-darwin-update 6.01MB
7、 mistralrs-server-x86_64-apple-darwin.tar.xz 8.23MB
8、 mistralrs-server-x86_64-apple-darwin.tar.xz.sha256 110B
9、 mistralrs-server-x86_64-unknown-linux-gnu-update 11.32MB
10、 mistralrs-server-x86_64-unknown-linux-gnu.tar.xz 8.84MB
11、 mistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256 115B
12、 source.tar.gz 407.21KB
13、 source.tar.gz.sha256 80B