MyGit

v0.10.0

argmaxinc/WhisperKit

版本发布时间: 2024-12-20 08:17:17

argmaxinc/WhisperKit最新发布版本:v0.10.0(2024-12-20 08:17:17)

Highlights

This release provides support for protocol-defined model inputs and output types, supporting full MLX or MLTensor pipelines without the need to convert to MLMultiArrays between encoder/decoder stages. For example, instead of

func encodeFeatures(_ features: MLMultiArray) async throws -> MLMultiArray?

you can now define the types by protocol:

func encodeFeatures(_ features: any FeatureExtractorOutputType) async throws -> (any AudioEncoderOutputType)?

where the types are defined as so:

public protocol FeatureExtractorOutputType {}
extension MLMultiArray: FeatureExtractorOutputType {}
public protocol AudioEncoderOutputType {}
extension MLMultiArray: AudioEncoderOutputType {}

or for a type that is a struct:

public struct TextDecoderMLMultiArrayOutputType: TextDecoderOutputType {
    public var logits: MLMultiArray?
    public var cache: DecodingCache?
}

so the entire structure can be handled by any model that conforms to the protocol, adding more flexibility for passing different data types between models, and thus reducing the amount of conversion steps vs. previous where it was assumed to be all MLMultiArrays.

We've made a start in using different inference types by using the new MLTensor for token sampling on devices that have the latest OS support, which resulted in a 2x speedup for that operation. Future work will shift the entire pipeline to using these.

There are also some important fixes included:

⚠️ Breaking changes

Finally, there were some great open-source contributions listed below, with a broad range of improvements to the library. Huge thanks to all the contributors 🙏

What's Changed

New Contributors

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.9.4...v0.10.0

相关地址:原始地址 下载(tar) 下载(zip)

查看:2024-12-20发行的版本