MyGit

v0.2.0

ray-project/ray-llm

版本发布时间: 2023-08-04 11:34:02

ray-project/ray-llm最新发布版本:v0.5.0(2024-01-19 04:38:00)

What's changed?

This update introduces breaking changes to model configuration YAMLs and the Aviary SDK. Refer to the migration guide below for more details.

In order to use Aviary backend, ensure you are using the official Docker image anyscale/aviary:latest. Using the backend without Docker is not a supported usecase. anyscale/aviary:latest-tgi image has been superseded by anyscale/aviary:latest.

Migration Guide For Model YAMLs

In the most recent version of Aviary we introduce breaking changes in the model YAMLs. This guide will help you migrate your existing model YAMLs to the new format.

Changes

  1. Move any fields under model_config.initialization to be under model_config and then remove model_config.initialization.

Then remove the following sections/fields and everything that is under them: - model_config.initializer - model_config.pipeline - model_config.batching

  1. Rename model_config to engine_config.

    In v0.2, we introduce Engine, the Aviary abstraction for interacting with a model. In short, Engine combines the functionality of initializers, pipelines, and predictors.

    Pipeline and initializer parameters are no longer configurable. In v0.2 we remove the option to specify static batching and instead do continuous batching by default for performance improvement.

  2. Add the Scheduler and Policy configs.

    The scheduler is a component of the engine that determines which requests to run inference on. The policy is a component of the scheduler that determines the scheduling strategy. These components previously existed in Aviary, however they weren't explicitly configurable.

    Previously the following parameters were specified under model_config.generation:

    • max_batch_total_tokens
    • max_total_tokens
    • max_waiting_tokens
    • max_input_length
    • max_batch_prefill_tokens

    rename max_waiting_tokens to max_iterations_curr_batch

    place these parameters under engine_config.scheduler.policy

    for example:

    
    engine_config:
      scheduler:
        policy:
          max_iterations_curr_batch: 100
          max_batch_total_tokens: 100000
          max_total_tokens: 100000
          max_input_length: 100
          max_batch_prefill_tokens: 100000
    

相关地址:原始地址 下载(tar) 下载(zip)

查看:2023-08-04发行的版本