v0.7.0

版本发布时间: 2024-06-18 23:52:25

NVIDIA/GenerativeAIExamples最新发布版本:v0.8.0(2024-08-21 11:11:58)

This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from Nvidia API Catalog as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using latest Nvidia NIM-LLM.

Added

Added model auto download and caching support for nemo-retriever-embedding-microservice and nemo-retriever-reranking-microservice. Updated steps to deploy the services can be found here.
Multimodal RAG Example enhancements
- Moved to the PDF Plumber library for parsing text and images.
- Added pgvector vector DB support.
- Added support to ingest files with .pptx extension
- Improved accuracy of image parsing by using tesseract-ocr
Added a new notebook showcasing RAG usecase using accelerated NIM based on-prem deployed models
Added a new experimental example showcasing how to create a developer-focused RAG chatbot using RAPIDS cuDF source code and API documentation.
Added a new experimental example demonstrating how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines.

Changed

All examples now use llama3 models from Nvidia API Catalog as default. Summary of updated examples and the model it uses is available here.
Switched default embedding model of all examples to Snowflake arctic-embed-I model
Added more verbose logs and support to configure log level for chain server using LOG_LEVEL enviroment variable.
Bumped up version of langchain-nvidia-ai-endpoints, sentence-transformers package and milvus containers
Updated base containers to use ubuntu 22.04 image nvcr.io/nvidia/base/ubuntu:22.04_20240212
Added llama-index-readers-file as dependency to avoid runtime package installation within chain server.

Deprecated

Deprecated support of on-prem LLM model deployment using NeMo Inference Framework Container. Developers can use Nvidia NIM-LLM to deploy TensorRT optimized models on-prem and plug them in with existing examples.
Deprecated kubernetes operator support.
nvolveqa_40k embedding model was deprecated from Nvidia API Catalog. Updated all notebooks and experimental artifacts to use Nvidia embed-qa-4 model instead.
Removed notebooks numbered 00-04, which used on-prem LLM model deployment using deprecated NeMo Inference Framework Container.

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-06-18发行的版本