argmaxinc/WhisperKit
Fork: 343 Star: 4042 (更新于 2024-12-22 05:14:48)
license: MIT
Language: Swift .
On-device Speech Recognition for Apple Silicon
最后发布版本: v0.10.1 ( 2024-12-21 13:48:53)
WhisperKit is a Swift package that integrates OpenAI's popular Whisper speech recognition model with Apple's CoreML framework for efficient, local inference on Apple devices.
Check out the demo app on TestFlight.
[Blog Post] [Python Tools Repo]
Table of Contents
Installation
Swift Package Manager
WhisperKit can be integrated into your Swift project using the Swift Package Manager.
Prerequisites
- macOS 14.0 or later.
- Xcode 15.0 or later.
Xcode Steps
- Open your Swift project in Xcode.
- Navigate to
File
>Add Package Dependencies...
. - Enter the package repository URL:
https://github.com/argmaxinc/whisperkit
. - Choose the version range or specific version.
- Click
Finish
to add WhisperKit to your project.
Package.swift
If you're using WhisperKit as part of a swift package, you can include it in your Package.swift dependencies as follows:
dependencies: [
.package(url: "https://github.com/argmaxinc/WhisperKit.git", from: "0.9.0"),
],
Then add WhisperKit
as a dependency for your target:
.target(
name: "YourApp",
dependencies: ["WhisperKit"]
),
Homebrew
You can install WhisperKit
command line app using Homebrew by running the following command:
brew install whisperkit-cli
Getting Started
To get started with WhisperKit, you need to initialize it in your project.
Quick Example
This example demonstrates how to transcribe a local audio file:
import WhisperKit
// Initialize WhisperKit with default settings
Task {
let pipe = try? await WhisperKit()
let transcription = try? await pipe!.transcribe(audioPath: "path/to/your/audio.{wav,mp3,m4a,flac}")?.text
print(transcription)
}
Model Selection
WhisperKit automatically downloads the recommended model for the device if not specified. You can also select a specific model by passing in the model name:
let pipe = try? await WhisperKit(WhisperKitConfig(model: "large-v3"))
This method also supports glob search, so you can use wildcards to select a model:
let pipe = try? await WhisperKit(WhisperKitConfig(model: "distil*large-v3"))
Note that the model search must return a single model from the source repo, otherwise an error will be thrown.
For a list of available models, see our HuggingFace repo.
Generating Models
WhisperKit also comes with the supporting repo whisperkittools
which lets you create and deploy your own fine tuned versions of Whisper in CoreML format to HuggingFace. Once generated, they can be loaded by simply changing the repo name to the one used to upload the model:
let config = WhisperKitConfig(model: "large-v3", modelRepo: "username/your-model-repo")
let pipe = try? await WhisperKit(config)
Swift CLI
The Swift CLI allows for quick testing and debugging outside of an Xcode project. To install it, run the following:
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit
Then, setup the environment and download your desired model.
make setup
make download-model MODEL=large-v3
Note:
- This will download only the model specified by
MODEL
(see what's available in our HuggingFace repo, where we use the prefixopenai_whisper-{MODEL}
) - Before running
download-model
, make sure git-lfs is installed
If you would like download all available models to your local folder, use this command instead:
make download-models
You can then run them via the CLI with:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --audio-path "path/to/your/audio.{wav,mp3,m4a,flac}"
Which should print a transcription of the audio file. If you would like to stream the audio directly from a microphone, use:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --stream
Contributing & Roadmap
Our goal is to make WhisperKit better and better over time and we'd love your help! Just search the code for "TODO" for a variety of features that are yet to be built. Please refer to our contribution guidelines for submitting issues, pull requests, and coding standards, where we also have a public roadmap of features we are looking forward to building in the future.
License
WhisperKit is released under the MIT License. See LICENSE for more details.
Citation
If you use WhisperKit for something cool or just find it useful, please drop us a note at info@takeargmax.com!
If you use WhisperKit for academic work, here is the BibTeX:
@misc{whisperkit-argmax,
title = {WhisperKit},
author = {Argmax, Inc.},
year = {2024},
URL = {https://github.com/argmaxinc/WhisperKit}
}
最近版本更新:(数据更新于 2024-12-22 15:27:08)
2024-12-21 13:48:53 v0.10.1
2024-12-20 08:17:17 v0.10.0
2024-11-07 09:51:16 v0.9.4
2024-11-06 02:01:25 v0.9.3
2024-11-03 04:20:56 v0.9.2
2024-10-09 10:10:17 v0.9.0
2024-07-13 02:50:16 v0.8.0
2024-05-30 21:11:16 v0.7.2
2024-05-26 06:55:04 v0.7.1
2024-05-24 18:02:09 v0.7.0
主题(topics):
inference, ios, macos, speech-recognition, swift, transformers, visionos, watchos, whisper
argmaxinc/WhisperKit同语言 Swift最近更新仓库
2024-12-19 06:06:30 stripe/stripe-ios
2024-12-16 23:43:56 wordpress-mobile/WordPress-iOS
2024-12-04 20:33:05 Artificial-Pancreas/iAPS
2024-12-03 10:12:09 utmapp/UTM
2024-11-25 23:11:48 Alamofire/Alamofire
2024-11-19 05:05:44 whoeevee/EeveeSpotify