v1.2.0

modelscope/modelscope

版本发布时间: 2023-01-18 20:53:17

modelscope/modelscope最新发布版本:v1.13.2(2024-03-22 17:57:37)

中文版本

该版本共新增上架38个模型，其中14个模型支持finetune能力。

模型功能特性说明

高性能检测热门应用系列， 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架DAMOYOLO，新增实时口罩检测模型、实时安全帽检测模型、实时人体检测模型、实时香烟检测模型上线，提供开箱即用的高效体验
语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune
语音合成，新增方言模型四川话、广东粤语与上海话，新增俄语与韩语外语模型
- SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型
- SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia，方言广东话女声模型
- SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda，方言上海话女声模型
- SambertHifigan语音合成-俄语-通用领域-16k-发音人masha，俄语女声模型
- SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong，韩语女声模型
语音文件后处理
- 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型
图像人脸融合
- 自动进行人脸区域提取&对齐，并完成面部特征提取，无需额外预处理。
- 引入3D重建网络对脸型进行拟合迁移，使得融合后的脸型相似度更高。
人脸人体
- GPEN人像增强修复-大分辨率人脸，基于GPEN框架，收集超大分辨率人脸数据训练的1024和2048模型。
视觉编辑
- DDColor图像上色，相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。
- VFI-RAFT视频插帧，和其它SOTA模型相比，在大运动和重复纹理场景下有较好的插帧效果。
- DUT-RAFT视频稳像，对多种视频抖动都有稳定的去抖效果，相比原生DUT，能够更好地保持视频清晰度。
底层视觉
- RealBasicVSR视频超分辨率，对于大部分真实场景的视频超分辨率效果良好，对于小部分降质十分严重的情况可能表现不佳。

非兼容性修改

文图生成任务输出类型改为多图输出
语音合成任务输出数据从output_pcm改为output_wav

新模型列表及快捷访问

贡献组织	模型名称
哔哩哔哩	RealCUGAN图像超分辨率
元语智能	元语功能型对话大模型
封神榜	闻仲-GPT2-110M-中文-v2
封神榜	二郎神-RoBERTa-330M-文本相似度
封神榜	二郎神-RoBERTa-110M-自然语言推理
封神榜	二郎神-RoBERTa-330M-文本相似度
阿里巴巴AAIG	离散对抗训练ViT-H/14-鲁棒图像分类-imagenet1k
阿里云机器学习平台PAI	GPT-MoE中文67亿诗歌生成模型
阿里云机器学习平台PAI	GPT-MoE中文270亿作文生成模型
达摩院	读光-文字检测-单词检测模型-英文-VLPT预训练
达摩院	读光-文档理解-文档理解多模态预训练模型
达摩院	中文StableDiffusion-通用领域
达摩院	DDColor图像上色
达摩院	视频多目标跟踪-行人
达摩院	MaskDINO-SwinL图像实例分割
达摩院	VFI-RAFT视频插帧
达摩院	DUT-RAFT视频稳像
达摩院	RealBasicVSR视频超分辨率
达摩院	GPEN人像增强修复-大分辨率人脸
达摩院	YOLOX-PAI手部检测模型
达摩院	ConvNeXt图像分类-中文-垃圾分类
达摩院	BNext二值化图像分类-英文-通用-small
达摩院	实时口罩检测-通用
达摩院	实时安全帽检测-通用
达摩院	实时香烟检测-通用
达摩院	人脸活体检测模型
达摩院	人脸活体检测模型-IR
达摩院	MGeo多任务多模态地址预训练底座-中文-base
达摩院	MaSTS预训练模型-中文-搜索-CLUE语义匹配-large
达摩院	MaSTS文本相似度-中文-搜索-CLUE语义匹配-large
达摩院	NestedNER命名实体识别-中文-医疗领域-base
达摩院	CoROM文本向量-中文-电商领域-base
达摩院	CoROM语义相关性-中文-电商领域-base
达摩院	全任务零样本学习-mT5分类增强版-中文-base
达摩院	StructBERT情绪分类-中文-七分类-base
达摩院	HiTransUSE用户满意度估计-中文-电商-base
达摩院	UniASR语音识别-中文-通用-8k-实时-pytorch
达摩院	Paraformer语音识别-中文-通用-16k-离线-large-pytorch
达摩院	Paraformer语音识别-中文-通用-16k-离线-large-长音频版
达摩院	RaNER-chunking-英文-large
达摩院	mPLUG-HiTeA-视频问答模型-英文-Base
达摩院	mPLUG-HiTeA-视频描述-英文-Base
达摩院	Mask2Former-R50全景分割
达摩院	图像人脸融合
达摩院	春联生成模型-中文-base
达摩院	GPT-3夸夸机器人-中文-large
达摩院	BART文本纠错-中文-法律领域-large

English Version

Highlight

Add finetune support for DAMO-YOLO
Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
Add finetune support asr, tts and kws model
Batch inference support for nlp and ofa based multi-modal tasks
Add high-resolution gpen model for face restoration
Add DDColor model for image colorization
Add VFI-RAFT model for video frame interpolation
Add DUT-RAFT model for video stabilization
Add RealBasicVSR model for video-super-resolution

Breaking changes

change output of task text to image to list of images
change output of task tts from output_pcm to output_wav

Feature

Add easyrobust-models for image classification
Video depth estimation support cpu mode
asr pipeline add output_dir parame
Add RTS face recognition ood model
Add image-defrcn-fewshot-detection
Add hires gpen model
Add mgeo finetune and pipeline
Add asr finetune & change inference
Add quadtree image matching pipeline
Add finetune for DAMO-YOLO
Add FLRGB Face Liveness RGB Model
Add speech separation finetune
Asr inference: support new models, punctuation, vad, sv
Add vop retrieval
Add NAFNet Image Deblurring pipeline and finetune support
Add megatron bert
Add panovit-layout-estimation-pipeline
Add vision middleware
Add panorama_depth_estimation
Unify token classfication model output
Faq support finetune and multilingual
Support stable diffusion and add DAMO chinese stable diffusion model
Add cv-bnext-image-classification-pipeline
Add VFI-RAFT model for video frame interpolation
Add face changing pipeline
Add DUT-RAFT model for video stabilization
Update token_cls default sequence_length: 128 -> 512
Add structure tasks for ofa: sudoku & text2sql
Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
Add model for multiple object tracking in video
Add ConvNeXt model
Add ppl metric
Add image colorization
Add User satisfaction estimation pipeline
OFA finetune support configuration file
Add vldoc model
Add space-t trainer for finetune
Add speech separation pipeline
Add cv_casmvs_multi-view-depth-esimation_general
Add finetune support for mask2former
Add domain specific object detection models
Add maskdino model
Add FLIR Face Liveness Model
Add finetune support for kws nearfield
Add cv_pointnet2_sceneflow-estimation_general
Add HiTeA model for VideoQA and Caption
Add GPT-2 model
Add real-time human detection model
Add video depth estimation pipeline
Add ocr-detection-vlpt-pipeline
Add RealBasicVSR model for video-super-resolution
Add image skychange model
Add support for cv_rdevos_video-object-segmentation
Support kantts infer and finetune
Add gpt-moe model
Add hand detection

Improvements

Text-error-correction support batch inference
Add beam search and pair finetune for GPT-3
Optimize ast_index logic
Refactor msdataset modules
Save a video with h264 vcodec for video_super_resolution
Enhance interface standard and refactor card_detection, face detection, tinynas object detection and image classification pipeline
Audio pipeline Support byte input feature and refine fp implementations
Remove opencv-python from framework requirements and remove easynlp from nlp default requirements
GPT-3 model supports batch input
Batch inference for all ofa models
AST scanner prebuilt in whl to speed up import process

BugFix

Fix best ckpt saver not actually save best ckpt error
Fix logger file hanlder problem
Fix missing self. (#61)
Fix: hub test suites can not parallel
Fix loading custom cv data error (#59)
Fix saved checkpoint can't run with pipeline for gpt3
Fix check video type cv2.VideoCapture and add unittest
Fix a bug for plug inference
Fix memory leak bug in eval for movie scene segmentation
Fix useragent string and a trainer invokedby
Fix: statistics header not correct set
Fix timeout issue for uni-fold list_oss_objects api
Fix card-detection model unregistered error and fix log warning
Fix multimer input for science/protein_structure
Fix demo service && copy license for cv/language_guided_video_summarization
Fix file directory create error in ddp training

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-01-18发行的版本