2.10.0
版本发布时间: 2023-12-05 22:09:37
xenova/transformers.js最新发布版本:2.17.2(2024-05-29 22:36:30)
What's new?
🎵 New task: Zero-shot audio classification
The task of classifying audio into classes that are unseen during training. See here for more information.
Example: Perform zero-shot audio classification with Xenova/clap-htsat-unfused
.
import { pipeline } from '@xenova/transformers';
// Create a zero-shot audio classification pipeline
const classifier = await pipeline('zero-shot-audio-classification', 'Xenova/clap-htsat-unfused');
const audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/dog_barking.wav';
const candidate_labels = ['dog', 'vaccum cleaner'];
const scores = await classifier(audio, candidate_labels);
// [
// { score: 0.9993992447853088, label: 'dog' },
// { score: 0.0006007603369653225, label: 'vaccum cleaner' }
// ]
Audio used
💻 New architectures: CLAP, Audio Spectrogram Transformer, ConvNeXT, and ConvNeXT-v2
We added support for 4 new architectures, bringing the total up to 65!
-
CLAP for zero-shot audio classification, text embeddings, and audio embeddings (https://github.com/xenova/transformers.js/pull/427). See here for the list of available models.
-
Zero-shot audio classification (same as above)
-
Text embeddings with
Xenova/clap-htsat-unfused
:import { AutoTokenizer, ClapTextModelWithProjection } from '@xenova/transformers'; // Load tokenizer and text model const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clap-htsat-unfused'); const text_model = await ClapTextModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused'); // Run tokenization const texts = ['a sound of a cat', 'a sound of a dog']; const text_inputs = tokenizer(texts, { padding: true, truncation: true }); // Compute embeddings const { text_embeds } = await text_model(text_inputs); // Tensor { // dims: [ 2, 512 ], // type: 'float32', // data: Float32Array(1024) [ ... ], // size: 1024 // }
-
Audio embeddings with
Xenova/clap-htsat-unfused
:import { AutoProcessor, ClapAudioModelWithProjection, read_audio } from '@xenova/transformers'; // Load processor and audio model const processor = await AutoProcessor.from_pretrained('Xenova/clap-htsat-unfused'); const audio_model = await ClapAudioModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused'); // Read audio and run processor const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav'); const audio_inputs = await processor(audio); // Compute embeddings const { audio_embeds } = await audio_model(audio_inputs); // Tensor { // dims: [ 1, 512 ], // type: 'float32', // data: Float32Array(512) [ ... ], // size: 512 // }
-
-
Audio Spectrogram Transformer for audio classification (https://github.com/xenova/transformers.js/pull/427). See here for the list of available models.
import { pipeline } from '@xenova/transformers'; // Create an audio classification pipeline const classifier = await pipeline('audio-classification', 'Xenova/ast-finetuned-audioset-10-10-0.4593'); // Predict class const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav'; const output = await classifier(url, { topk: 4 }); // [ // { label: 'Meow', score: 0.5617874264717102 }, // { label: 'Cat', score: 0.22365376353263855 }, // { label: 'Domestic animals, pets', score: 0.1141069084405899 }, // { label: 'Animal', score: 0.08985692262649536 }, // ]
-
ConvNeXT for image classification (https://github.com/xenova/transformers.js/pull/428). See here for the list of available models.
import { pipeline } from '@xenova/transformers'; // Create image classification pipeline const classifier = await pipeline('image-classification', 'Xenova/convnext-tiny-224'); // Classify an image const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg'; const output = await classifier(url); // [{ label: 'tiger, Panthera tigris', score: 0.6153212785720825 }]
-
ConvNeXT-v2 for image classification (https://github.com/xenova/transformers.js/pull/428). See here for the list of available models.
import { pipeline } from '@xenova/transformers'; // Create image classification pipeline const classifier = await pipeline('image-classification', 'Xenova/convnextv2-atto-1k-224'); // Classify an image const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg'; const output = await classifier(url); // [{ label: 'tiger, Panthera tigris', score: 0.6391205191612244 }]
🔨 Other improvements
- Support decoding of tensors in https://github.com/xenova/transformers.js/pull/416
Full Changelog: https://github.com/xenova/transformers.js/compare/2.9.0...2.10.0