v0.4.0
版本发布时间: 2023-04-03 23:29:31
SYSTRAN/faster-whisper最新发布版本:v1.0.3(2024-07-01 18:05:08)
Integration of Silero VAD
The Silero VAD model is integrated to ignore parts of the audio without speech:
model.transcribe(..., vad_filter=True)
The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.
Note: the Silero model is executed with onnxruntime
which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.
Speaker diarization using stereo channels
The function decode_audio
has a new argument split_stereo
to split stereo audio into seperate left and right channels:
left, right = decode_audio(audio_file, split_stereo=True)
# model.transcribe(left)
# model.transcribe(right)
Other changes
- Add
Segment
attributesavg_log_prob
andno_speech_prob
(same definition as openai/whisper) - Ignore audio frames raising an
av.error.InvalidDataError
exception during decoding - Fix option
prefix
to be passed only to the first 30-second window - Extend
suppress_tokens
with some special tokens that should always be suppressed (unlesssuppress_tokens is None
) - Raise a more helpful error message when the selected model size is invalid
- Disable the progress bar when the model to download is already in the cache