v0.3.0
版本发布时间: 2023-03-16 16:15:02
modelscope/FunASR最新发布版本:v0.3.0(2023-03-16 16:15:02)
What's new:
2023.3.17, funasr-0.3.0, modelscope-1.4.1
- New Features:
- Added support for GPU runtime solution, nv-triton, which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
- Added support for CPU runtime quantization solution, which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted benchmark tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
- Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, demo.
- Added streaming inference pipeline to the 16k VAD model, 8k VAD model, with support for audio input streams (>= 10ms) , demo.
- Improved the punctuation prediction model, resulting in increased accuracy (F-score increased from 55.6 to 56.5).
- Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. Paraformer streaming model is used to output text in real time, while Paraformer-large offline model is used to correct recognition results, demo.
- New Models:
- Added 16k Paraformer streaming model, which supports real-time speech recognition with streaming audio input, demo. It can be deployed using the gRPC service to implement real-time subtitle function.
- Added streaming punctuation model, which supports real-time punctuation marking in streaming speech recognition scenarios, with real-time calls based on VAD points. It can be used along with real-time ASR models to achieve readable real-time subtitle function, demo.
- Added TP-Aligner timestamp model, which takes audio and corresponding text as input and outputs word-level timestamps. Its performance is comparable to that of the Kaldi FA model (60.3ms vs. 69.3ms). It can be combined freely with ASR models, demo.
- Added financial domain model (8k Paraformer-large-3445vocab), which is fine-tuned using 1000 hours of data. The recognition accuracy on the financial domain test set increased by 5%, and the recall rate of domain keywords increased by 7%.
- Added audio-visual domain model (16k Paraformer-large-3445vocab), which is fine-tuned using 10,000 hours of data. The recognition accuracy on the audio-visual domain test set increased by 8%.
- Added 8k speaker verification model, which can be used for speaker embedding extraction.
- Added speaker diarization models, including 16k SOND Chinese model, 8k SOND English model, which achieved the best performance on AliMeeting and Callhome with a DER of 4.46% and 11.13%, respectively.
- Added UniASR streaming offline unifying models, including 16k UniASR Burmese, 16k UniASR Hebrew, 16k UniASR Urdu, 8k UniASR Mandarin financial domain, and 16k UniASR Mandarin audio-visual domain.
最新更新:
- 2023年3月17日:funasr-0.3.0, modelscope-1.4.1
- 功能完善:
- 新增GPU runtime方案,nv-triton,可以将modelscope中Paraformer模型便捷导出,并部署成triton服务,实测,单GPU-V100,RTF为0.0032,吞吐率为300,benchmark。
- 新增CPU runtime量化方案,支持从modelscope导出量化版本onnx与libtorch,实测,CPU-8369B,量化后,RTF提升50%(0.00438->0.00226),吞吐率翻倍(228->442),benchmark。
- 新增加C++版本grpc服务部署方案,配合C++版本onnxruntime,以及量化方案,相比python-runtime性能翻倍。
- 16k VAD模型,8k VAD模型,modelscope pipeline,新增加流式推理方式,,最小支持10ms语音输入流,用法。
- 优化标点预测模型,主观体验标点准确性提升(fscore绝对提升 55.6->56.5)。
- 基于grpc服务,新增实时字幕demo,采用2pass识别模型,Paraformer流式模型 用来上屏,Paraformer-large离线模型用来纠正识别结果。
- 上线新模型:
- 16k Paraformer流式模型,支持语音流输入,可以进行实时语音识别,用法。支持基于grpc服务进行部署,可实现实时字幕功能。
- 流式标点模型,支持流式语音识别场景中的标点打标,以VAD点为实时调用点进行流式调用。可与实时ASR模型配合使用,实现具有可读性的实时字幕功能,用法
- TP-Aligner时间戳模型,输入音频及对应文本输出字级别时间戳,效果与Kaldi FA模型相当(60.3ms v.s. 69.3ms),支持与asr模型自由组合,用法。
- 金融领域模型,8k Paraformer-large-3445vocab,使用1000小时数据微调训练,金融领域测试集识别效果相对提升5%,领域关键词召回相对提升7%。
- 音视频领域模型,16k Paraformer-large-3445vocab,使用10000小时数据微调训练,音视频领域测试集识别效果相对提升8%。
- 8k说话人确认模型,CallHome数据集英文说话人确认模型,也可用于声纹特征提取。
- 说话人日志模型,16k SOND中文模型,8k SOND英文模型,在AliMeeting和Callhome上获得最优性能,DER分别为4.46%和11.13%。
- UniASR流式离线一体化模型: 16k UniASR缅甸语、 16k UniASR希伯来语、 16k UniASR乌尔都语、 8k UniASR中文金融领域、16k UniASR中文音视频领域。
- 功能完善:
New Contributors
- @dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
- @yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
- @zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
- @znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
- @songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227
Full Changelog: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0