v0.7.0
版本发布时间: 2024-05-24 18:02:09
argmaxinc/WhisperKit最新发布版本:v0.9.4(2024-11-07 09:51:16)
This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking 🚀
Highlights
- Energy VAD based chunking 🗣️ @jkrukowski
- There is a new decoding option called
chunkingStrategy
which can significantly speed up your single file transcriptions with minimal WER downsides. - It works by finding a clip point in the middle of the longest silence (lowest audio energy) in the last 15s of a 30s window and uses that to split up all the audio ahead of time so it can be asynchronously decoded in parallel.
- Heres a video of it in action, comparing
.none
chunking strategy with.vad
- There is a new decoding option called
https://github.com/argmaxinc/WhisperKit/assets/1981179/0f865caa-3a08-412e-a0bf-080ec16a439a
- Detect language helper:
- You can now call
detectLanguage
with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing. - Example:
- You can now call
let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"
- WhisperKit via Expo @seb-sep
- For anyone that's been wanting to use WhisperKit in react native, @seb-sep is maintaining a repo that makes it easy, and also setup an automation that will automatically update it with each new WhisperKit release, check it out here: https://github.com/seb-sep/whisper-kit-expo
- Bug fixes and enhancements:
- @jiangdi0924 and @fengcunhan contributed some nice fixes in this release with #136 and #138 (see below)
- Also moved the decoding progress callback to be fully async so that it doesn't block the decoder thread
What's Changed
- Fix language detection by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/133
- Fix the reset operation exception in transcribeFile in the Demo. by @jiangdi0924 in https://github.com/argmaxinc/WhisperKit/pull/136
- gh action for making pr to whisper-kit-expo on whisperkit release by @seb-sep in https://github.com/argmaxinc/WhisperKit/pull/137
- add reStartRecordingLive function by @fengcunhan in https://github.com/argmaxinc/WhisperKit/pull/138
- Added
@_disfavoredOverload
for deprecated methods by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/143 - VAD audio chunking by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/135
- Async Progress Callback by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/145
- Detect language helper by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/146
New Contributors
- @jiangdi0924 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/136
- @seb-sep made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/137
- @fengcunhan made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/138
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.1...v0.7.0