v0.6.0
版本发布时间: 2024-09-01 18:12:02
microsoft/torchgeo最新发布版本:v0.6.1(2024-10-11 02:22:20)
TorchGeo 0.6.0 Release Notes
TorchGeo 0.6 adds 18 new datasets, 15 new datamodules, and 27 new pre-trained models, encompassing 11 months of hard work by 23 contributors from around the world.
Highlights of this release
Multimodal foundation models
There are thousands of Earth observation satellites orbiting the Earth at any given time. Historically, in order to use one of these satellites in a deep learning pipeline, you would first need to collect millions of manually-labeled images from this sensor in order to train a model. Self-supervised learning enabled label-free pre-training, but still required millions of diverse sensor-specific images, making it difficult to use newly launched or expensive commercial satellites.
TorchGeo 0.6 adds multiple new multimodal foundation models capable of being used with imagery from any satellite/sensor, even ones the model was not explicitly trained on. While GASSL and Scale-MAE only support RGB images, DOFA supports RGB, SAR, MSI, and HSI with any number of spectral bands. It uses a novel wavelength-based encoder to map the spectral wavelength of each band to a known range of wavelengths seen during training.
The following table describes the dynamic spatial (resolution), temporal (time span), and/or spectral (wavelength) support, either via their training data (implicit) or via their model architecture (explicit), offered by each of these models:
Model | Spatial | Temporal | Spectral |
---|---|---|---|
DOFA | implicit | - | explicit |
GASSL | implicit | - | - |
Scale-MAE | explicit | - | - |
TorchGeo 0.6 also adds multiple new unimodal foundation models, including DeCUR and SatlasPretrain.
Source Cooperative migration
TorchGeo contains a number of datasets from the recently defunct Radiant MLHub:
- AgriFieldNet Competition Dataset
- Smallholder Cashew Plantations in Benin
- Sentinel-2 Cloud Cover Segmentation Dataset
- CV4A Kenya Crop Type Competition
- Tropical Cyclone Wind Estimation Competition
- Marine Debris Dataset for Object Detection in Planetscope Imagery
- Rwanda Field Boundary Competition Dataset
- South Africa Crop Type Competition
- SpaceNet Datasets
- Western USA Live Fuel Moisture
These datasets were recently migrated to Source Cooperative (and AWS in the case of SpaceNet), but with a completely different file format and directory structure. It took a lot of effort, but we have finally ported all of these datasets to the new download location and file hierarchy. As an added bonus, the new data loader code is significantly simpler, allowing us to remove 2.5K lines of code in the process!
OSGeo community project
TorchGeo is now officially a member of the OSGeo community! OSGeo is a not-for-profit foundation for open source geospatial software, providing financial, organizational, and legal support. We are in good company, with other OSGeo projects including GDAL, PROJ, GEOS, QGIS, and PostGIS. Membership in OSGeo promotes advertising of TorchGeo to the community, and also ensures that we follow best practices for the stability, health, and interoperability of the open source geospatial ecosystem.
All TorchGeo users are encouraged to join us on Slack, join our Hugging Face organization, and join us in OSGeo using any of the following badges in our README:
Lightning Studios support
TorchGeo has always had a close collaboration with Lightning AI, including active contributions to PyTorch Lightning and TorchMetrics. In this release, we added buttons allowing users to launch our tutorial notebooks in the new Lightning Studios platform. Lightning Studios is a more powerful version of Google Colab, with reproducible software and data environments allowing you to pick up where you left off, VS Code and terminal support, and the ability to quickly scale up to a large number of GPUs. All TorchGeo tutorials have been confirmed to work in both Lightning Studios and Google Colab, allowing users to get started with TorchGeo without having to invest in their own hardware.
Backwards-incompatible changes
- All Radiant MLHub datasets have been ported to the Source Cooperative file hierarchy (#1830)
- GeoDataset: the bbox sample key was renamed to bounds in order to support Kornia (#2199)
- Chesapeake7 and Chesapeake13: datasets were removed when updating to the 2022 edition (#2214)
- Benin Cashews and Rwanda Field Boundary: remove
os.path.expanduser
for consistency (#1705) - LEVIR-CD and OSCD:
images
key was split intoimage1
andimage2
for change detection (#1684, #1696) - EuroSAT:
B08A
was renamed toB8A
to match Sentinel-2 (#1646)
Dependencies
New (optional) dependencies
- aws-cli: to download datasets from AWS (#2203)
- azcopy: to download datasets from Azure (#2064)
- prettier: for YAML file formatting (#2018)
- ruff: for code style and documentation testing (#1994)
Removed (optional) dependencies
- radiant-mlhub: website no longer exists (#1830)
- rarfile: datasets rehosted as zip files (#2210)
- zipfile-deflate: no longer needed for newer Chesapeake data (#2214)
- black: replaced by ruff (#1994)
- flake8: replaced by ruff (#1994)
- isort: replaced by ruff (#1994)
- pydocstyle: replaced by ruff (#1994)
- pyupgrade: replaced by ruff (#1994)
Changes to existing dependencies
- python: 3.10+ required following SPEC 0 (#1966)
- fiona: 1.8.21+ required (#1966)
- kornia: 0.7.3+ required (#1979, #2144)
- lightly: 1.4.5+ required (#2196)
- lightning: 2.3 not supported due to bug (#2155, #2211)
- matplotlib: 3.5+ required (#1966)
- numpy: 1.21.2+ required (#1966), numpy 2 support added (#2151)
- pandas: 1.3.3+ required (#1966)
- pillow: 3.3+ required (#1966), jpeg2000 support required (#2209)
- pyproj: 3.3+ required (#1966)
- rasterio: 1.3+ required (#1966)
- shapely: 1.8+ required (#1966)
- torch: 1.13+ required (#1358)
- torchvision: 0.14+ required (#1358)
- h5py: 3.6+ required (#1966)
- opencv: 4.5.4+ required (#1966)
- pycocotools: 2.0.7+ required (#1966)
- scikit-image: 0.19+ required (#1966)
- scipy: 1.7.2+ required (#1966)
Datamodules
New datamodules
- AgriFieldNet (#1873)
- CaBuAr (#2235)
- ChaBuD (#1259)
- Digital Typhoon (#1748)
- EuroSAT Spatial (#2074)
- GeoNRW (#2209)
- I/O Bench (#1972)
- LEVIR-CD (#1770)
- LEVIR-CD+ (#1707)
- QuakeSet (#1997)
- Sentinel-2 + CDL (#1889)
- Sentinel-2 + EuroCrops (#1869)
- Sentinel-2 + NCCM (#1950)
- Sentinel-2 + South America Soybean (#1959)
- South Africa Crop Type (#1970)
- VHR-10 (#1082)
Changes to existing datamodules
- Remove torchgeo.datamodules.utils.dataset_split (#2005)
- EuroSAT: make sure normalization is actually applied (#2176)
Changes to existing base classes
- Fix plotting in datamodules when dataset is a subset (#2003)
Datasets
New datasets
- AgriFieldNet (#1459)
- Airphen (#1803)
- CaBuAr (#2235)
- ChaBuD (#1259)
- CropHarvest (#1677)
- Digital Typhoon (#1748)
- EuroCrops (#1813)
- EuroSAT Spatial (#2074)
- GeoNRW (#2209)
- I/O Bench (#1972)
- LEVIR-CD (#1770)
- Northeast China Crop Map (#1666)
- PRISMA (#1743)
- QuakeSet (#1997)
- SkyScript (#2253)
- South Africa Crop Type (#1840)
- South America Soybean (#1668)
- SpaceNet 8 (#2203)
Changes to existing datasets
- Benin Cashews: migrate to Source Cooperative (#2116)
- Benin Cashews: remove
os.path.expanduser
for consistency (#1705) - BigEarthNet: fix broken download link (#2174)
- CDL: add 2023 checksum (#1844)
- Chesapeake: update to 2022 edition (#2214)
- ChesapeakeCVPR: reuse NLCD colormap (#1690)
- Cloud Cover: migrate to Source Cooperative (#2117)
- CV4A Kenya Crop Type: migrate to Source Cooperative (#2090)
- EuroSAT: rename
B08A
toB8A
to match Sentinel-2 (#1646) - FireRisk: redistribute on Hugging Face (#2000)
- GlobBiomass: add min/max timestamp (#2086)
- GlobBiomass: use float32 for pixelwise regression mask (#2086)
- GlobBiomass: fix length of dataset (#2086)
- L7 Irish: convert to IntersectionDataset (#2034)
- L8 Biome: convert to IntersectionDataset (#2058)
- LEVIR-CD+: split
image
intoimage1
andimage2
for change detection (#1696) - NASA Marine Debris: migrate to Source Cooperative (#2206)
- OSCD: support fine-grained band selection (#1684)
- OSCD: split
image
intoimage1
andimage2
for change detection (#1696) - PatternNet: redistribute on Hugging Face (#2100)
- RESISC45: redistribute on Hugging Face (#2210)
- Rwanda Field Boundary: don't plot empty masks during testing (#2254)
- Rwanda Field Boundary: migrate to Source Cooperative (#2118)
- Rwanda Field Boundary: remove
os.path.expanduser
for consistency (#1705) - SpaceNet 1–7: migrate to Source Cooperative (#2203)
- Tropical Cyclone: migrate to Source Cooperative (#2068)
- VHR-10: redistribute on Hugging Face (#2210)
- VHR-10: improved plotting (#2092)
- Wester USA Live Fuel Moisture: migrate to Source Cooperative (#2206)
Changes to existing base classes
- Add support for
pathlib.Path
to all datasets (#2173) - Datasets can now use command-line utilities to download (#2064)
- GeoDataset:
bbox
key was renamed tobounds
(#2199) - GeoDataset: ignore other bands for separate files (#2222)
- GeoDataset: don't warn about missing files for downloadable datasets (#2033)
- RasterDataset: allow subclasses to specify which resampling algorithm to use (#2015)
- RasterDataset: use nearest neighbors for int and bilinear for float by default (#2015)
- RasterDataset: calculate resolution after changing CRS (#2193)
- RasterDataset: support date_str containing % character (#2233)
- RasterDataset: users can now specify the min/max time of a dataset (#2086)
- VectorDataset: add
dtype
attribute to match RasterDataset (#1869) - VectorDataset: extract timestamp from filename to match RasterDataset (#1814)
- IntersectionDataset: ignore 0 area overlap (#1985)
New error classes
- DatasetNotFoundError: when a dataset has not yet been downloaded (#1714, #2053)
- DependencyNotFoundError: when an optional dependency is not installed (#2054)
- RGBBandsMIssingError: when you try to plot a dataset but don't use RGB bands (#1737, #2053)
Models
New model architectures
- DOFA (#1903, #2052)
- Scale-MAE (#2057)
- Swin Transformer v2 (#1358, #2052)
New model weights
- DeCUR (#2191)
- DOFA (#1903)
- Scale-MAE (#2057)
- SatlasPretrain (#1358, #1884, #2038)
Samplers
Changes to existing samplers
- RandomGeoSampler: fix performance regression, 60% speedup with preprocessed data (#1968)
Trainers
New trainers
- I/O Bench (#1972)
Changes to existing trainers
- Explicitly specify batch size (#1928, #1933)
- MoCo: explicitly specify memory bank size (#1931)
- Semantic Segmentation: support
ingore_index
when using Jaccard loss (#1898) - SimCLR: switch from Adam to LARS optimizer (#2196)
- SimCLR: explicitly specify memory bank size (#1931)
Transforms
- Use Kornia's AugmentationSequential for all model weights (#1979)
- Update TorchGeo's AugmentationSequential to support object detection (#1082)
Documentation
Changes to API docs
- Datasets: add license information about every dataset (#1732)
- Datasets: update link to cite SSL4EO-L dataset (#1942)
- Models: emphasize new multimodal foundation models (#2236)
- Trainers: update num_classes parameter description (#2101)
Changes to user docs
- Alternatives: update metrics (#2259)
- Contributing: explain how to use new I/O Bench dataset (#1972)
Changes to tutorials
- Add button for the new Lightning Studios (#2146)
- Remove button for the recently defunct Planetary Computer Hub (#2107)
- Custom Raster Datasets: download the dataset before calling super (#2177)
- Custom Raster Datasets: fix typo (#1987)
- Transforms: update EuroSAT band names to match Sentinel-2 (#1646)
Other documentation changes
- README: fix CLI example (#2142)
- README: add Hugging Face badge (#1957)
- README: fix example of creating fake raster data (#2162)
- Read the Docs: use latest Ubuntu version to build (#1954)
- Allow horizontal scrolling of wide tables (#1958)
- Fix broken links and redirects (#2267)
Testing
Style
- Use prettier for configuration files (#2018)
- Use ruff for code files (#1994, #2001)
Type hints
- Ensure all functions have type hints (#2217)
- Make all class variables immutable (#2218)
- Check for unreachable code (#2241)
Unit testing
- Datasets: test dataset length (#2084, #2089)
- Datamodules: don't download during testing (#2215, #2231)
- download_url: add shared fixture to avoid code duplication (#2232)
- load_state_dict: add shared fixture to avoid code duplication (#1932)
- load_state_dict_from_url: add shared fixture to avoid code duplication (#2223)
- torch_hub: add fixture to avoid downloading checkpoints to home directory (#2265)
- Pytest: silence warnings (#1929, #1930, #2224)
- PyVista: headless plotting (#1667)
Other CI changes
- Check numpy 2 compliance (#2151)
- Coverage: use newer flag to override ignores (#2260)
- Dependabot: update devcontainer (#2025)
- Dependabot: group torch and torchvision (#2025)
- Labeler: update to v5 (#1759)
- macOS: disable pip caching (#2024)
- Windows: fail fast mode (#2225)
Contributors
This release is thanks to the following contributors:
@adamjstewart @alhridoy @ashnair1 @burakekim @calebrob6 @cookie-kyu @DarthReca @Domejko @favyen2 @GeorgeHuber @isaaccorley @kcrans @nilsleh @oddeirikigland @pioneerHitesh @piperwolters @robmarkcole @sfalkena @ShadowXZT @shreyakannan1205 @TropicolX @wangyi111 @yichiac