v1.2.0
版本发布时间: 2023-01-18 20:53:17
modelscope/modelscope最新发布版本:v1.13.2(2024-03-22 17:57:37)
中文版本
该版本共新增上架38个模型,其中14个模型支持finetune能力。
模型功能特性说明
-
高性能检测热门应用系列, 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架DAMOYOLO,新增实时口罩检测模型、实时安全帽检测模型、实时人体检测模型、实时香烟检测模型上线,提供开箱即用的高效体验
-
语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune
-
语音合成,新增方言模型四川话、广东粤语与上海话,新增俄语与韩语外语模型
-
SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型
-
SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia, 方言广东话女声模型
-
SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda, 方言上海话女声模型
-
SambertHifigan语音合成-俄语-通用领域-16k-发音人masha, 俄语女声模型
-
SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong, 韩语女声模型
-
-
语音文件后处理
- 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型
-
图像人脸融合
-
自动进行人脸区域提取&对齐,并完成面部特征提取,无需额外预处理。
-
引入3D重建网络对脸型进行拟合迁移,使得融合后的脸型相似度更高。
-
-
人脸人体
- GPEN人像增强修复-大分辨率人脸,基于GPEN框架,收集超大分辨率人脸数据训练的1024和2048模型。
-
视觉编辑
-
DDColor图像上色,相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。
-
VFI-RAFT视频插帧,和其它SOTA模型相比,在大运动和重复纹理场景下有较好的插帧效果。
-
DUT-RAFT视频稳像,对多种视频抖动都有稳定的去抖效果,相比原生DUT,能够更好地保持视频清晰度。
-
-
底层视觉
- RealBasicVSR视频超分辨率,对于大部分真实场景的视频超分辨率效果良好,对于小部分降质十分严重的情况可能表现不佳。
非兼容性修改
- 文图生成任务输出类型改为多图输出
- 语音合成任务输出数据从output_pcm改为output_wav
新模型列表及快捷访问
English Version
Highlight
- Add finetune support for DAMO-YOLO
- Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
- Add finetune support asr, tts and kws model
- Batch inference support for nlp and ofa based multi-modal tasks
- Add high-resolution gpen model for face restoration
- Add DDColor model for image colorization
- Add VFI-RAFT model for video frame interpolation
- Add DUT-RAFT model for video stabilization
- Add RealBasicVSR model for video-super-resolution
Breaking changes
- change output of task text to image to list of images
- change output of task tts from output_pcm to output_wav
Feature
- Add easyrobust-models for image classification
- Video depth estimation support cpu mode
- asr pipeline add output_dir parame
- Add RTS face recognition ood model
- Add image-defrcn-fewshot-detection
- Add hires gpen model
- Add mgeo finetune and pipeline
- Add asr finetune & change inference
- Add quadtree image matching pipeline
- Add finetune for DAMO-YOLO
- Add FLRGB Face Liveness RGB Model
- Add speech separation finetune
- Asr inference: support new models, punctuation, vad, sv
- Add vop retrieval
- Add NAFNet Image Deblurring pipeline and finetune support
- Add megatron bert
- Add panovit-layout-estimation-pipeline
- Add vision middleware
- Add panorama_depth_estimation
- Unify token classfication model output
- Faq support finetune and multilingual
- Support stable diffusion and add DAMO chinese stable diffusion model
- Add cv-bnext-image-classification-pipeline
- Add VFI-RAFT model for video frame interpolation
- Add face changing pipeline
- Add DUT-RAFT model for video stabilization
- Update token_cls default sequence_length: 128 -> 512
- Add structure tasks for ofa: sudoku & text2sql
- Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
- Add model for multiple object tracking in video
- Add ConvNeXt model
- Add ppl metric
- Add image colorization
- Add User satisfaction estimation pipeline
- OFA finetune support configuration file
- Add vldoc model
- Add space-t trainer for finetune
- Add speech separation pipeline
- Add cv_casmvs_multi-view-depth-esimation_general
- Add finetune support for mask2former
- Add domain specific object detection models
- Add maskdino model
- Add FLIR Face Liveness Model
- Add finetune support for kws nearfield
- Add cv_pointnet2_sceneflow-estimation_general
- Add HiTeA model for VideoQA and Caption
- Add GPT-2 model
- Add real-time human detection model
- Add video depth estimation pipeline
- Add ocr-detection-vlpt-pipeline
- Add RealBasicVSR model for video-super-resolution
- Add image skychange model
- Add support for cv_rdevos_video-object-segmentation
- Support kantts infer and finetune
- Add gpt-moe model
- Add hand detection
Improvements
- Text-error-correction support batch inference
- Add beam search and pair finetune for GPT-3
- Optimize ast_index logic
- Refactor msdataset modules
- Save a video with h264 vcodec for video_super_resolution
- Enhance interface standard and refactor card_detection, face detection, tinynas object detection and image classification pipeline
- Audio pipeline Support byte input feature and refine fp implementations
- Remove opencv-python from framework requirements and remove easynlp from nlp default requirements
- GPT-3 model supports batch input
- Batch inference for all ofa models
- AST scanner prebuilt in whl to speed up import process
BugFix
- Fix best ckpt saver not actually save best ckpt error
- Fix logger file hanlder problem
- Fix missing self. (#61)
- Fix: hub test suites can not parallel
- Fix loading custom cv data error (#59)
- Fix saved checkpoint can't run with pipeline for gpt3
- Fix check video type cv2.VideoCapture and add unittest
- Fix a bug for plug inference
- Fix memory leak bug in eval for movie scene segmentation
- Fix useragent string and a trainer invokedby
- Fix: statistics header not correct set
- Fix timeout issue for uni-fold list_oss_objects api
- Fix card-detection model unregistered error and fix log warning
- Fix multimer input for science/protein_structure
- Fix demo service && copy license for cv/language_guided_video_summarization
- Fix file directory create error in ddp training