v0.7.0

版本发布时间: 2024-05-24 18:02:09

argmaxinc/WhisperKit最新发布版本:v0.10.1(2024-12-21 13:48:53)

This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking 🚀

Highlights

Energy VAD based chunking 🗣️ @jkrukowski
- There is a new decoding option called chunkingStrategy which can significantly speed up your single file transcriptions with minimal WER downsides.
- It works by finding a clip point in the middle of the longest silence (lowest audio energy) in the last 15s of a 30s window and uses that to split up all the audio ahead of time so it can be asynchronously decoded in parallel.
- Heres a video of it in action, comparing .none chunking strategy with .vad

https://github.com/argmaxinc/WhisperKit/assets/1981179/0f865caa-3a08-412e-a0bf-080ec16a439a

Detect language helper:
- You can now call detectLanguage with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing.
- Example:

let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"

WhisperKit via Expo @seb-sep
- For anyone that's been wanting to use WhisperKit in react native, @seb-sep is maintaining a repo that makes it easy, and also setup an automation that will automatically update it with each new WhisperKit release, check it out here: https://github.com/seb-sep/whisper-kit-expo
Bug fixes and enhancements:
- @jiangdi0924 and @fengcunhan contributed some nice fixes in this release with #136 and #138 (see below)
- Also moved the decoding progress callback to be fully async so that it doesn't block the decoder thread

What's Changed

Fix language detection by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/133
Fix the reset operation exception in transcribeFile in the Demo. by @jiangdi0924 in https://github.com/argmaxinc/WhisperKit/pull/136
gh action for making pr to whisper-kit-expo on whisperkit release by @seb-sep in https://github.com/argmaxinc/WhisperKit/pull/137
add reStartRecordingLive function by @fengcunhan in https://github.com/argmaxinc/WhisperKit/pull/138
Added @_disfavoredOverload for deprecated methods by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/143
VAD audio chunking by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/135
Async Progress Callback by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/145
Detect language helper by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/146

New Contributors

@jiangdi0924 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/136
@seb-sep made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/137
@fengcunhan made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/138

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.1...v0.7.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-05-24发行的版本