k2-fsa/sherpa-onnx
Fork: 389 Star: 3336 (更新于 2024-10-17 07:50:25)
license: Apache-2.0
Language: C++ .
Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
最后发布版本: v1.10.28 ( 2024-10-13 15:30:04)
Supported functions
Speech recognition | Speech synthesis |
---|---|
✔️ | ✔️ |
Speaker identification | Speaker diarization | Speaker verification |
---|---|---|
✔️ | ✔️ | ✔️ |
Spoken Language identification | Audio tagging | Voice activity detection |
---|---|---|
✔️ | ✔️ | ✔️ |
Keyword spotting | Add punctuation |
---|---|
✔️ | ✔️ |
Supported platforms
Architecture | Android | iOS | Windows | macOS | linux |
---|---|---|---|---|---|
x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
x86 | ✔️ | ✔️ | |||
arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
arm32 | ✔️ | ✔️ | |||
riscv64 | ✔️ |
Supported programming languages
1. C++ | 2. C | 3. Python | 4. JavaScript |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
5. Java | 6. C# | 7. Kotlin | 8. Swift |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
9. Go | 10. Dart | 11. Rust | 12. Pascal |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see sherpa-rs
It also supports WebAssembly.
Introduction
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker diarization
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64
, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go,
C#
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
Links for Huggingface Spaces
You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.
Description | URL |
---|---|
Speaker diarization | Click me |
Speech recognition | Click me |
Speech recognition with Whisper | Click me |
Speech synthesis | Click me |
Generate subtitles | Click me |
Audio tagging | Click me |
Spoken language identification with Whisper | Click me |
We also have spaces built using WebAssembly. They are listed below:
Description | Huggingface space | ModelScope space |
---|---|---|
Voice activity detection with silero-vad | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Zipformer | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Paraformer | Click me | 地址 |
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large | Click me | 地址 |
Real-time speech recognition (English) | Click me | 地址 |
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice | Click me | 地址 |
VAD + speech recognition (English) with Whisper tiny.en | Click me | 地址 |
VAD + speech recognition (English) with Zipformer trained with GigaSpeech | Click me | 地址 |
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech | Click me | 地址 |
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech | Click me | 地址 |
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 | Click me | 地址 |
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small | Click me | 地址 |
Speech synthesis (English) | Click me | 地址 |
Speech synthesis (German) | Click me | 地址 |
Speaker diarization | Click me | 地址 |
Links for pre-built Android APKs
You can find pre-built Android APKs for this repository in the following table
Description | URL | 中国用户 |
---|---|---|
Speaker diarization | Address | 点此 |
Streaming speech recognition | Address | 点此 |
Text-to-speech | Address | 点此 |
Voice activity detection (VAD) | Address | 点此 |
VAD + non-streaming speech recognition | Address | 点此 |
Two-pass speech recognition | Address | 点此 |
Audio tagging | Address | 点此 |
Audio tagging (WearOS) | Address | 点此 |
Speaker identification | Address | 点此 |
Spoken language identification | Address | 点此 |
Keyword spotting | Address | 点此 |
Links for pre-built Flutter APPs
Real-time speech recognition
Description | URL | 中国用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Text-to-speech
Description | URL | 中国用户 |
---|---|---|
Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
Linux (x64) | Address | 点此 |
macOS (x64) | Address | 点此 |
macOS (arm64) | Address | 点此 |
Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
Links for pre-built Lazarus APPs
Links for pre-trained models
Description | URL |
---|---|
Speech recognition (speech to text, ASR) | Address |
Text-to-speech (TTS) | Address |
VAD | Address |
Keyword spotting | Address |
Audio tagging | Address |
Speaker identification (Speaker ID) | Address |
Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
Punctuation | Address |
Speaker segmentation | Address |
Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Projects using sherpa-onnx
voiceapi
Streaming ASR and TTS based on FastAPI
It shows how to use the ASR and TTS Python APIs with FastAPI.
腾讯会议摸鱼工具 TMSpeech
Uses streaming ASR in C# with graphical user interface.
Video demo in Chinese: 【开源】Windows实时字幕软件(网课/开会必备)
lol互动助手
It uses the JavaScript API of sherpa-onnx along with Electron
Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!
最近版本更新:(数据更新于 2024-10-17 07:50:08)
2024-10-13 15:30:04 v1.10.28
2024-09-29 14:08:32 speaker-segmentation-models
2024-09-19 11:09:13 v1.10.27
2024-09-14 14:38:46 v1.10.26
2024-09-13 15:00:34 v1.10.25
2024-08-30 17:52:59 v1.10.24
2024-08-24 23:48:32 v1.10.23
2024-08-16 22:42:10 v1.10.22
2024-08-08 11:00:56 v1.10.21
2024-07-29 12:43:04 v1.10.20
主题(topics):
aarch64, android, arm32, asr, cpp, csharp, dotnet, ios, lazarus, linux, macos, mfc, object-pascal, onnx, raspberry-pi, risc-v, speech-to-text, text-to-speech, vits, windows
k2-fsa/sherpa-onnx同语言 C++最近更新仓库
2024-11-05 23:57:44 PCSX2/pcsx2
2024-11-05 22:06:04 LizardByte/Sunshine
2024-11-05 00:42:13 ClickHouse/ClickHouse
2024-11-04 21:49:30 notepad-plus-plus/notepad-plus-plus
2024-11-03 22:31:09 MaaAssistantArknights/MaaAssistantArknights
2024-11-02 20:28:28 AaronFeng753/Waifu2x-Extension-GUI