2.14.0
版本发布时间: 2024-01-11 01:50:52
huggingface/transformers.js最新发布版本:2.17.2(2024-05-29 22:36:30)
What's new?
🚀 Segment Anything Model (SAM)
The Segment Anything Model (SAM) can be used to generate segmentation masks for objects in a scene, given an input image and input points. See here for the full list of pre-converted models. Support for this model was added in https://github.com/xenova/transformers.js/pull/510.
Demo + source code: https://huggingface.co/spaces/Xenova/segment-anything-web
Example: Perform mask generation w/ Xenova/slimsam-77-uniform
.
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
// Tensor {
// dims: [ 1, 3, 410, 614 ],
// type: 'bool',
// data: Uint8Array(755220) [ ... ],
// size: 755220
// }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
// dims: [ 1, 1, 3 ],
// type: 'float32',
// data: Float32Array(3) [
// 0.8350210189819336,
// 0.9786665439605713,
// 0.8379436731338501
// ],
// size: 3
// }
You can then visualize the 3 predicted masks with:
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');
Input image | Visualized output |
---|---|
Next, select the channel with the highest IoU score, which in this case is the second (green) channel. Intersecting this with the original image gives us an isolated version of the subject:
Selected Mask | Intersected |
---|---|
🛠️ Improvements
- Add support for processing non-square images w/
ConvNextFeatureExtractor
in https://github.com/xenova/transformers.js/pull/503 - Encode revision in remote URL by https://github.com/xenova/transformers.js/pull/507
Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.4...2.14.0