v1.0.3
版本发布时间: 2024-05-16 02:19:44
huggingface/pytorch-image-models最新发布版本:v1.0.9(2024-08-24 07:42:07)
May 14, 2024
- Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
- Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
- Add
normalize=
flag for transorms, return non-normalized torch.Tensor with original dytpe (forchug
) - Version 1.0.3 release
May 11, 2024
-
Searching for Better ViT Baselines (For the GPU Poor)
weights and vit variants released. Exploring model shapes between Tiny and Base.
- AttentionExtract helper added to extract attention maps from
timm
models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949 -
forward_intermediates()
API refined and added to more models including some ConvNets that have other extraction methods. - 1017 of 1047 model architectures support
features_only=True
feature extraction. Remaining 34 architectures can be supported but based on priority requests. - Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.
April 11, 2024
- Prepping for a long overdue 1.0 release, things have been stable for a while now.
- Significant feature that's been missing for a while,
features_only=True
support for ViT models with flat hidden states or non-std module layouts (so far covering'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'
) - Above feature support achieved through a new
forward_intermediates()
API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input)
output = model.forward_head(final_feat) # pooling + classifier head
print(final_feat.shape)
torch.Size([2, 197, 768])
for f in intermediates:
print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))
for o in output:
print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
- TinyCLIP vision tower weights added, thx Thien Tran