2.14.0

版本发布时间: 2024-01-11 01:50:52

huggingface/transformers.js最新发布版本:2.17.2(2024-05-29 22:36:30)

What's new?

🚀 Segment Anything Model (SAM)

The Segment Anything Model (SAM) can be used to generate segmentation masks for objects in a scene, given an input image and input points. See here for the full list of pre-converted models. Support for this model was added in https://github.com/xenova/transformers.js/pull/510.

demo

Demo + source code: https://huggingface.co/spaces/Xenova/segment-anything-web

Example: Perform mask generation w/ Xenova/slimsam-77-uniform.

import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';

const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');

const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window

const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);

const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
//   Tensor {
//     dims: [ 1, 3, 410, 614 ],
//     type: 'bool',
//     data: Uint8Array(755220) [ ... ],
//     size: 755220
//   }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
//   dims: [ 1, 1, 3 ],
//   type: 'float32',
//   data: Float32Array(3) [
//     0.8350210189819336,
//     0.9786665439605713,
//     0.8379436731338501
//   ],
//   size: 3
// }

You can then visualize the 3 predicted masks with:

const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');

Input image	Visualized output

Next, select the channel with the highest IoU score, which in this case is the second (green) channel. Intersecting this with the original image gives us an isolated version of the subject:

Selected Mask	Intersected

🛠️ Improvements

Add support for processing non-square images w/ ConvNextFeatureExtractor in https://github.com/xenova/transformers.js/pull/503
Encode revision in remote URL by https://github.com/xenova/transformers.js/pull/507

Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.4...2.14.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-01-11发行的版本