v1.7.0
版本发布时间: 2024-03-15 10:14:03
neuralmagic/deepsparse最新发布版本:v1.8.0(2024-07-20 02:18:07)
New Features:
- DeepSparse Pipelines v2 was introduced, enabling more complex pipelines to be represented. Text Generation (compatible with Hugging Face Transformers) and Image Classification pipelines have been refactored to the v2 format. (#1324, #1385, #1460, #1596, #1502, #1460, #1626)
- OpenAI Server compatibility added on top of Pipelines v2. (#1445, #1477)
-
deepsparse.evaluate
APIs and CLIs added with plugins for perplexity and lm-eval-harness for LLM evaluations. (#1596) - An example was added demonstrating how to use LLMPerf for benchmarking DeepSparse LLM servers. (#1502)
- Continuous batching support has been added for text generation pipelines and inference server pathways, enabling inference over multiple text streams at once. (#1569, #1571)
Changes:
- Exposed
sequence_length
for greater control over text generation pipelines. (#1518) -
deepsparse.analyze
functionality has been updated to work properly with LLMs. (#1324) - The logging and timing infrastructure for Pipelines expanded to enable more thorough tracking and logging, in addition to furthering support for integrations with Prometheus and other standard logging platforms. (#1614)
- UX improved for text generation pipelines to more closely match Hugging Face Transformers pipelines. (#1583, #1584, #1590, #1592, #1598)
Resolved Issues:
- Compile time for dense LLMs is no longer very slow.
- Text generation pipeline bug fixes: corrected sampling logic errors and inappropriate in-place logits mutation resulting in incorrect answers for LLMs when using sampling. (#1406, #1414)
- KV cache was fixed for improper handling of the
kv_cache
input while using external KV cache management, which resulted in inaccurate model inference for ONNX Runtime comparison pathways. (#1337) - Benchmarking runs for LLMs with internal KV cache no longer crash or report inaccurate numbers. (#1512, #1514)
- SciPy dependencies were removed to address issues for CV pipelines where they would fail on import of
scipy
and crash. (#1604, #1602)
Known Issues:
- OPT models produce incorrect outputs and are no longer supported.
- Streaming support is limited within the DeepSparse Pipeline v2 framework for tasks other than text generation.
1、 deepsparse-1.7.0-cp310-cp310-macosx_13_0_arm64.whl 31.81MB
2、 deepsparse-1.7.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 38.6MB
3、 deepsparse-1.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 44.9MB
4、 deepsparse-1.7.0-cp311-cp311-macosx_13_0_arm64.whl 31.81MB
5、 deepsparse-1.7.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 38.6MB
6、 deepsparse-1.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 44.9MB
7、 deepsparse-1.7.0-cp38-cp38-macosx_13_0_arm64.whl 31.81MB
8、 deepsparse-1.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 38.6MB
9、 deepsparse-1.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 44.9MB
10、 deepsparse-1.7.0-cp39-cp39-macosx_13_0_arm64.whl 31.81MB
11、 deepsparse-1.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 38.6MB
12、 deepsparse-1.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 44.9MB
13、 deepsparse-1.7.0.tar.gz 44.45MB
14、 deepsparse-macosx_api_demo.tar.gz 31.09MB
15、 deepsparse-neon_api_demo.tar.gz 53.15MB
16、 deepsparse_api_demo.tar.gz 75.93MB