v0.8.0
版本发布时间: 2024-07-13 02:50:16
argmaxinc/WhisperKit最新发布版本:v0.10.1(2024-12-21 13:48:53)
With this release, we had a huge focus on reliability in terms of memory usage (especially for large files), common crashes, and various correctness errors that the community has reported in issues.
Highlights
- Memory-efficient Handling of Large Files: WhisperKit is much more memory-efficient for large files with some improvements to #158 by @finnvoor. This change speeds up the audio resampling significantly and removes a few other unnecessary data copies. It also fixes a buffer misalignment issue that caused #183 . For more aggressive memory savings, the default audio file chunking size can be configured through maxReadFrameSize. Here is the memory chart for a ~200 MB compressed audio file from #174, showing up to 3x faster resampling with 50% less memory. Note that WhisperKit requires uncompressed Float values for the MLModel input, so the compressed file becomes roughly ~1 GB minimum after read and resample to 16khz 1 channel.
Before | After |
---|---|
-
Progress Bar: @finnvoor also contributed a fix to the progress when in VAD chunking mode. WhisperAX now shows an indicator while the file is being resampled and the overall progress of the decoding. Note that this is not an exactly linear progress bar because it is based on how many windows have completed decoding, so it will speed up toward the end of the process as more windows complete.
-
Various other improvements: We also did a pass on our current issues and resolved many of them, if you have one pending please test out this version to verify they are fixed. Thanks again to everyone that contributes to these issues, it helps immensely to make WhisperKit better for everyone 🚀.
What's Changed
- Remove purported OGG support from CLI by @iandundas in https://github.com/argmaxinc/WhisperKit/pull/153
- Resample audio files in 10mb chunks by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/158
- feat: add version output by @chenrui333 in https://github.com/argmaxinc/WhisperKit/pull/148
- Fix TEST_HOST name mismatch by @CongLeSolutionX in https://github.com/argmaxinc/WhisperKit/pull/177
- feat: copy text with eager decoding, add keyboard shortcut by @iGerman00 in https://github.com/argmaxinc/WhisperKit/pull/178
- Fix progress when using VAD chunking by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/179
- Fix indeterminate tests by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/180
- Fix resampling large files by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/183
New Contributors
- @iandundas made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/153
- @chenrui333 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/148
- @CongLeSolutionX made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/177
- @iGerman00 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/178
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.7.2...v0.8.0