v0.6.0
版本发布时间: 2024-04-18 14:22:52
argmaxinc/WhisperKit最新发布版本:v0.9.4(2024-11-07 09:51:16)
Highlights
- Async batch transcription is here 🎉 contributed by @jkrukowski
- With this release, you can now simultaneously transcribe multiple audio files at once, fully utilizing the new async prediction APIs released with iOS17/macOS14 (see the wwdc video here).
- New interface with
audioPaths
input: -
let audioPaths = [ "/path/to/file1.wav", "/path/to/file2.wav" ] let whisperKit = try await WhisperKit() let transcriptionResults: [[TranscriptionResult]?] = await whisperKit.transcribe(audioPaths: audioPaths)
- You can also use it via the CLI using the new argument
--audio-folder "path/to/folder/"
- Future work will be chunking up single files to significantly speed up long-form transcription
- Note that this entails breaking changes and deprecations, see below for the full upgrade guide.
- Several bug fixes, accuracy improvements, and quality of life upgrades by @hewigovens @shawiz and @jkrukowski
- Every issue raised and PR merged from the community helps make WhisperKit better every release, thank you and keep them coming! 🙏
⚠️ Upgrade Guide
We aim to minimize breaking changes, so with this update we added a few deprecation flags for changed interfaces, which will be removed later but for now are still usable and will not throw build errors. There are some breaking changes for lower level and newer methods so if you do notice build errors click the dropdown below to see the full guide.
Full Upgrade Guide
API changes
Deprecations
WhisperKit
Deprecated
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
Deprecated
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
TextDecoding
Deprecated
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> [DecodingResult]
use instead
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> DecodingResult
Deprecated
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> [DecodingResult]
use instead
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> DecodingResult
Breaking changes
- removed
Transcriber
protocol
AudioProcessing
static func loadAudio(fromPath audioFilePath: String) -> AVAudioPCMBuffer?
becomes
static func loadAudio(fromPath audioFilePath: String) throws -> AVAudioPCMBuffer
AudioStreamTranscriber
public init(
audioProcessor: any AudioProcessing,
transcriber: any Transcriber,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
becomes
public init(
audioEncoder: any AudioEncoding,
featureExtractor: any FeatureExtracting,
segmentSeeker: any SegmentSeeking,
textDecoder: any TextDecoding,
tokenizer: any WhisperTokenizer,
audioProcessor: any AudioProcessing,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
TextDecoding
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) -> DecodingInputs?
becomes
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) throws -> DecodingInputs
What's Changed
- Add
microphoneUnavailable
error by @hewigovens in https://github.com/argmaxinc/WhisperKit/pull/113 - Improve token timestamps and language detection by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/114
- Respect skipSpecialTokens option in the decodingCallback function by @shawiz in https://github.com/argmaxinc/WhisperKit/pull/115
- Disallow invalid
--language
values by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/116 - Run tests in parallel on CI by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/117
- Async batch predictions by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/107
New Contributors
- @hewigovens made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/113
- @shawiz made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/115
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.5.0...v0.6.0