v0.8.2dev0
版本发布时间: 2022-12-24 08:19:44
huggingface/pytorch-image-models最新发布版本:v1.0.9(2024-08-24 07:42:07)
Part way through the conversion of models to multi-weight support (model_arch.pretrain_tag
), module reorg for future building, and lots of new weights and model additions as we go...
This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/tree/0.6.x
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
-
efficientnet_b5.in12k_ft_in1k
- 85.9 @ 448x448 -
vit_medium_patch16_gap_384.in12k_ft_in1k
- 85.5 @ 384x384 -
vit_medium_patch16_gap_256.in12k_ft_in1k
- 84.5 @ 256x256 -
convnext_nano.in12k_ft_in1k
- 82.9 @ 288x288
-
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py
, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)- original source: https://github.com/baaivision/EVA
model | top1 | param_count | gmac | macts | hub |
---|---|---|---|---|---|
eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | link |
eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | link |
eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | link |
eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py
.- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636
model | top1 | param_count | gmac | macts | hub |
---|---|---|---|---|---|
eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | link |
eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | link |
eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | link |
eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | link |
Dec 5, 2022
- Pre-release (
0.8.0dev0
) of multi-weight support (model_arch.pretrained_tag
). Install withpip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompile
argument - Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model | top1 | param_count | gmac | macts | hub |
---|---|---|---|---|---|
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | link |
vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | link |
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | link |
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | link |
vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | link |
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | link |
vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | link |
vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | link |
vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | link |
vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | link |
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | link |
vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | link |
vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | link |
vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | link |
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | link |
vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | link |
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | link |
vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | link |
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | link |
vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | link |
vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | link |
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | link |
vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | link |
vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
model | top1 | param_count | gmac | macts | hub |
---|---|---|---|---|---|
maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | link |
maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | link |
maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | link |
maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | link |
maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | link |
maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | link |
maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | link |
maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | link |
maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | link |
maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | link |
maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | link |
maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | link |
maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | link |
maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | link |
maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | link |
maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | link |
maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | link |
maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex
, bfloat16 supportedf via--amp-dtype bfloat16
- main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename