v1.41.22
版本发布时间: 2024-07-15 23:50:22
BerriAI/litellm最新发布版本:v1.44.15-stable(2024-09-04 00:07:25)
What's Changed
- feat mem utils debugging return size of in memory cache by @ishaan-jaff in https://github.com/BerriAI/litellm/pull/4705
- [Fix Memory Usage] - only use per request tracking if slack alerting is being used by @ishaan-jaff in https://github.com/BerriAI/litellm/pull/4703
- [Debug-Utils] Add some useful memory usage debugging utils by @ishaan-jaff in https://github.com/BerriAI/litellm/pull/4704
- Return
retry-after
header for rate limited requests by @krrishdholakia in https://github.com/BerriAI/litellm/pull/4706 - add azure ai pricing + token info (mistral/jamba instruct/llama3) by @krrishdholakia in https://github.com/BerriAI/litellm/pull/4702
- Allow setting
logging_only
in guardrails config by @krrishdholakia in https://github.com/BerriAI/litellm/pull/4696
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.41.21...v1.41.22
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.41.22
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 150.0 | 170.61999054511887 | 6.341044942359792 | 0.0 | 1895 | 0 | 122.10332300003301 | 1263.2002629999874 |
Aggregated | Passed ✅ | 150.0 | 170.61999054511887 | 6.341044942359792 | 0.0 | 1895 | 0 | 122.10332300003301 | 1263.2002629999874 |
1、 load_test.html 1.59MB
2、 load_test_stats.csv 536B