v0.6.0

版本发布时间: 2024-04-18 14:22:52

argmaxinc/WhisperKit最新发布版本:v0.9.4(2024-11-07 09:51:16)

Highlights

Async batch transcription is here 🎉 contributed by @jkrukowski
- With this release, you can now simultaneously transcribe multiple audio files at once, fully utilizing the new async prediction APIs released with iOS17/macOS14 (see the wwdc video here).
- New interface with audioPaths input:
- ```
  let audioPaths = [
      "/path/to/file1.wav",
      "/path/to/file2.wav"
  ]
  let whisperKit = try await WhisperKit()
  let transcriptionResults: [[TranscriptionResult]?] = await whisperKit.transcribe(audioPaths: audioPaths)
```
- You can also use it via the CLI using the new argument --audio-folder "path/to/folder/"
- Future work will be chunking up single files to significantly speed up long-form transcription
- Note that this entails breaking changes and deprecations, see below for the full upgrade guide.
Several bug fixes, accuracy improvements, and quality of life upgrades by @hewigovens @shawiz and @jkrukowski
- Every issue raised and PR merged from the community helps make WhisperKit better every release, thank you and keep them coming! 🙏

⚠️ Upgrade Guide

We aim to minimize breaking changes, so with this update we added a few deprecation flags for changed interfaces, which will be removed later but for now are still usable and will not throw build errors. There are some breaking changes for lower level and newer methods so if you do notice build errors click the dropdown below to see the full guide.

Full Upgrade Guide

API changes

Deprecations

`WhisperKit`

Deprecated

public func transcribe(
    audioPath: String,
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?

use instead

public func transcribe(
    audioPath: String,
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]

Deprecated

public func transcribe(
    audioArray: [Float],
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?

use instead

public func transcribe(
    audioArray: [Float],
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]

`TextDecoding`

Deprecated

func decodeText(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options decoderOptions: DecodingOptions,
    callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> [DecodingResult]

use instead

func decodeText(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options decoderOptions: DecodingOptions,
    callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> DecodingResult

Deprecated

func detectLanguage(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options: DecodingOptions,
    temperature: FloatType
) async throws -> [DecodingResult]

use instead

func detectLanguage(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options: DecodingOptions,
    temperature: FloatType
) async throws -> DecodingResult

Breaking changes

removed Transcriber protocol

`AudioProcessing`

static func loadAudio(fromPath audioFilePath: String) -> AVAudioPCMBuffer?

becomes

static func loadAudio(fromPath audioFilePath: String) throws -> AVAudioPCMBuffer

`AudioStreamTranscriber`

public init(
    audioProcessor: any AudioProcessing, 
    transcriber: any Transcriber, 
    decodingOptions: DecodingOptions, 
    requiredSegmentsForConfirmation: Int = 2, 
    silenceThreshold: Float = 0.3, 
    compressionCheckWindow: Int = 20, 
    useVAD: Bool = true, 
    stateChangeCallback: AudioStreamTranscriberCallback?
)

becomes

public init(
    audioEncoder: any AudioEncoding,
    featureExtractor: any FeatureExtracting,
    segmentSeeker: any SegmentSeeking,
    textDecoder: any TextDecoding,
    tokenizer: any WhisperTokenizer,
    audioProcessor: any AudioProcessing,
    decodingOptions: DecodingOptions,
    requiredSegmentsForConfirmation: Int = 2,
    silenceThreshold: Float = 0.3,
    compressionCheckWindow: Int = 20,
    useVAD: Bool = true,
    stateChangeCallback: AudioStreamTranscriberCallback?
)

`TextDecoding`

func prepareDecoderInputs(withPrompt initialPrompt: [Int]) -> DecodingInputs?

becomes

func prepareDecoderInputs(withPrompt initialPrompt: [Int]) throws -> DecodingInputs

What's Changed

Add microphoneUnavailable error by @hewigovens in https://github.com/argmaxinc/WhisperKit/pull/113
Improve token timestamps and language detection by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/114
Respect skipSpecialTokens option in the decodingCallback function by @shawiz in https://github.com/argmaxinc/WhisperKit/pull/115
Disallow invalid --language values by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/116
Run tests in parallel on CI by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/117
Async batch predictions by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/107

New Contributors

@hewigovens made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/113
@shawiz made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/115

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.5.0...v0.6.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-04-18发行的版本