v0.13.0
版本发布时间: 2024-04-13 04:12:06
tracel-ai/burn最新发布版本:v0.14.0(2024-08-28 01:24:39)
The Burn Release 0.13 is a significant update introducing numerous new features and performance enhancements. One major change is the removal of the Sync
trait implementation from most Burn types, see Core User APIs. Additionally, the release introduces several new tensor operations, module features, optimizers, as well as improvements to the autodiff backend. Notably, a new bridge mechanism facilitates runtime switching between backends, and significant work has been done on the Just-in-Time and Wgpu backends. The release also addresses numerous bug fixes, documentation improvements, infrastructure updates, CI enhancements, and miscellaneous changes to improve code quality and usability.
Core User APIs
A major change in this release is that most Burn types no longer implement the Sync
trait, such as modules, optimizers, and tensors. This change should not impact users of the Learner
struct for model training. However, it may affect those who implemented their own training loop and inference server. While modules, optimizers and tensors can be sent to other threads, they cannot be accessed concurrently by multiple threads. This aligns with Burn's workflow, where each tensor operation requires an owned version of the tensor. The change was made to safely reduce the number of locks needed when modifying the state of the autodiff graph, fusion state, allocation cache, and various other use cases. While not all locks have been removed, the type signature no longer poses a problem for follow-up optimizations. Note that the same tensor can still be sent to multiple threads without copying the underlying data. However it will require cloning before sending a tensor to a thread. (#1575) @nathanielsimard
Tensor
- Support signed value for
Tensor::arange
#1238 @Nikaidou-Shinku - Add
Tensor::unsqueeze_dims
op (#1236) @skewballfox - Add support for
Any
,All
operations to Tensor (#1342) @ashdtu - Add
not_equal
andnot_equal_elem
tensor ops (#1374) @laggui - Element wise
min
/max
between a pair of tensors (#1385) @boondocklabs - Add
is_close
andall_close
tensor operators (#1389) @antimora - Interpolate tensor operation (Inference Only) (#1246) @Nikaidou-Shinku @antimora @ashdtu
- Autodiff/training support for Nearest Interpolation (#1414) @Nikaidou-Shinku @ashdtu @antimora
- Add
argwhere
andnonzero
boolean tensor ops (#1394) @laggui - Add
bool()
op for numerical tensor (#1402) @antimora - Tensor
permute
operator (#1410) @antimora - Add
sign
tensor operator (#1446) @antimora - Rename
diagonal
toeye
tensor op and add missing entry for diagonal to Book tensor section (#1449) @antimora - Add
prod
andprod_dim
tensor ops (#1460) @antimora - Add
tril_mask
,triu_mask
anddiag_mask
ops (#1479) @antimora - Add
flip
tensor operator (#1468) @carrotflakes - Add tensor
sorting
operations (#1488) (#1494) @laggui - Add
topk
tensor operation (#1497) @laggui - Tensor
expand
operator (#1508) @antimora - Provide Tensor Padding Helpers (#960) (#1097) @jcmullwh @antimora
- Move
log_sigmoid
to activation ops (#1558) @laggui - Add
repeat
autodiff and fusion support (#1600) @louisfd
Module
- Feature Addition: PRelu Module (#1328) @Arjun31415
- Implement Instance Normalization (#1321) @tushushu
- Add enum module support (#1337) @laggui
- Make the parameters of
conv1d
andconv2d
public. (#1245) @Arjun31415 - Parameters are now lazy initialized, so you don't need to implement both the
init
andinit_with(record)
method for training/inference. (#1539) @nathanielsimard - Support multilabel binary cross entropy (#1571)
- Implement Huber loss (#1444) @WorldSEnder
- Feat: Add Leaky Relu Model (#1467) @Arjun31415
- Feat/swiglu (#1507) @ashdtu
- Feat: transformer rotary positional encoding to transformer modules (#1604) @ashdtu
Optimizer
- Add linear learning rate scheduler (#1443) @astral4
- Exponential learning rate scheduler @1481 @rubenjr0
- Cosine Annealing learning rate scheduler with cold restarts @1481 @rubenjr0
- Add Rank0 variant to AdaptorRecordV1 and AdaptorRecordItemV1 (#1442) @carrotflakes
Train
- Add multi-label classification dataset and metric (#1572) @laggui
- Add learner training summary (#1591) @laggui
Backend
This release also introduces the backend bridge, a new mechanism for runtime switching between backends. While an improvement, it remains compatible with previous methods of supporting mixed precision. (#1529) @nathanielsimard
JIT
Significant effort has been devoted over the past few months to refactor the previous Wgpu backend into a shader-agnostic Just-in-Time backend. All lower-level dependencies have been abstracted into the Just-in-Time Runtime trait, requiring a compiler, compute server, and storage. The bulk of this work was carried out by @nathanielsimard and @louisfd.
Commits: #1274 #1280 #1313 #1340 #1356 #1359 #1378 #1391 #1396 #1398 #1417 #1429 #1423 #1424 #1433 #1456 #1474 #1457 #1480 #1472 #1493 #1509 #1530 #1528 #1541 #1550 #1569
Wgpu
- Enable
burn-fusion
by default. (#1223) @nathanielsimard - Feature/autotune int ops (#1136) @agelas
- Add runtime options in Wgpu init methods. (#1505) @nathanielsimard
- Decent speedup of transposed convolution @louisfd
Autodiff
Extensive work has also been undertaken on Burn's autodiff backend. The backend now supports gradient checkpointing to reduce memory usage and has been refactored into a client/server architecture. These updates result in significantly less blocking when tracking gradients, enhancing performance particularly on smaller models. Furthermore, various bugs have been fixed where some graph nodes weren't used, potentially truncating the autodiff graph. Overall, these changes make the autodiff process more reliable and efficient. (#1575) (#1358) @louisfd @nathanielsimard
Candle
- Upgrade to Candle 0.4.1. (#1382) @laggui
Data
- Add an image folder dataset implementation. (#1232) (#1132) @laggui
- Add
burn::data::network::downloader
. (#1283) @laggui
Import
- [PyTorchRecorder] Allow multiple pattern matches in chain. (#1269) @laggui
- [PyTorchRecorder] Pytorch config extraction (#1323) @antimora
- [PyTorchRecorder] Pass top-level key to extract state_dict (#1300) @antimora
- [PyTorchRecorder] print debug option (#1425) @antimora
- [PyTorchRecorder] Truncate debug display for NestedValue (#1428) @antimora
- [PyTorchRecorder] Support for non-contiguous indexes in PyTorchFileRecorder keys (#1432) @antimora
- [PyTorchRecorder] Add Enum module support (#1436) @antimora
- [ONNX] Parser rewrite (#1296) @skewballfox
Benchmarks
We have implemented a system that enables the comparison of backends across a variety of tasks. Currently, most of these tasks consist of micro-benchmarks, but we plan to expand the range of benchmarks in the future. To ensure Burn's portability and performance across different devices, the community can run and upload benchmarks! 🔥
- Created the
burnbench
CLI. (#1260) @syl20bnr - Added GitHub authentication to the
burnbench
CLI. (#1285) @syl20bnr - Updated GitHub App ID with the official application. (#1397) @syl20bnr
- Implemented benchmark upload functionality to the server. (#1381) @syl20bnr
- Compiled benchmarks in a dedicated target directory. (#1435) @syl20bnr
- Enhanced benchmark result presentation with a neat table and attempted to run every benchmark. (#1464) @akhildevelops
- Improved access token refreshing and displayed authenticated user name. (#1483) @syl20bnr
- Added system information to benchmark results. (#1495) @syl20bnr
- Included Operating System information in benchmark results. (#1531) @syl20bnr
- Fixed automatic fusion activation issue with Wgpu. (#1542) @syl20bnr
- Tweaked and added kinds to Gelu benchmark names. (#1533) @syl20bnr
- Ensured backend names in JSON reports match the
burnbench
CLI. (#1375) @errordeveloper @syl20bnr - Added 'all' choice to
--benches
and--backends
options. (#1567) @syl20bnr - Revamped
burnbench
output for improved readability and compactness. (#1568) @syl20bnr - Added URL to browse results on the burn.dev website. (#1573) @syl20bnr
Bug Fix
- Fix the pow backward pass when one of the tensor wasn't tracking the gradients. (#1225) (#1224) @nathanielsimard
- Fix batch norm on the LibTorch backend when the aggregation was on the same device. (#1226) @nathanielsimard
- Fix training dashboard metrics switch on Max OS & Linux (#1228) @nathanielsimard
- Fix a bug introduced in (#1138) where arithmetic could fail on usize type. (#1287) @louisfd
- [PyTorchRecorder] Fix out of memory bug (#1270) (#1286) @antimora
- [PyTorchRecorder] Fix chain pattern matching when multiple patterns are provided (#1273) @laggui
- Fix LogEventStore end epoch log (#1314) @laggui
- Huggingface dataset importer: check that pa_type is valid before checking if is_binary (#1354) @laggui
- Fix implicit casting of bool in wgpu backend (#1391) @louisfd
- Fix Switched arguments in reshape_args_usize check (#1409) @jackdarlison
- Fix tch view data corruption (#1434) @nathanielsimard
- Missing Debug derive for Group Norm Config (#1482) @Arjun31415
- Numerically stable log_sigmoid (#1548) @laggui
- Fix pytorch recorder adapt_linear when using autodiff backend (#1576) @laggui
Infrastructure
The minimum Rust version has been updated to 1.75. (#1297) @syl20bnr
Docs
- Improve the doc feature flags for docs.rs (#1212) @syl20bnr
- Include the backends in the documentation (#1229) @nathanielsimard
- Started the burn developer book. (#1184) @skewballfox @syl20bnr @antimora
- Update TORCH_CUDA_VERSION usage. (#1284) @laggui
- fix(book): add missing device parameter to mode.init(). (#1302) @apertureless
- fix(book): add missing second parameter to CrosEntropyLoss constructor (#1301) @apertureless
- docs(book-&-examples): modify book and examples with new prelude module (#1372) @bioinformatist
- Update tensor book (#1401) @antimora
- Fix book MNIST reference (no more huggingface) (#1471) @laggui
- Update SUPPORTED-ONNX-OPS.md (#1547) @antimora
- Update book module (#1557) @antimora
- Update pytorch-model.md (#1570) @antimora
- Fixes to code examples in section 5.2 (#1594) @hrishim
CI
- Add a semantic versioning checker. (#1219) @Luni-4
- Simplify CI binaries updating. (#1235) @Luni-4
- Trigger test suite when Cargo.lock file is updated (#1326) @syl20bnr
- Fix codecov and update to weekly the .dependabot file for cargo (#1320) @Luni-4
- Refactor xtask (#1288) @iamricks
- Fix broken test and run-checks script (#1347) @antimora
- Add stale action (#1383) @Luni-4
- Update Cargo.lock workflow to trigger commit checks (#1399) @syl20bnr
- Use GitHub's own action to generate GitHub App token (#1437) @syl20bnr
- Add support for cargo metadata new workspace member format (#1500) @syl20bnr
- Switch codecov to informational mode (#1540) @syl20bnr
- Migrate workflows to use Blaze runners (#1596) @dcvz
- Add a retry on adding ppa for kisak (#1599) @dcvz
Tests
- Add
NaN
andInf
detection inassert_approx_eq
to catch potential numerical bugs. (#1209) @skewballfox
Misc
- Make all struct CamelCase (#1316) (#1311) @antimora
- Move burn crates to their own crates directory (#1336) @syl20bnr
- Add sub-crates as members of workspace (#1348) @antimora
- Pytorch message updates (#1344) @antimora
- Chore: update main README links to crate-specific READMEs (#1415) @ekalosak
- [Wasm] remove exit in scripts (#1543) @AlexErrant
- Use num-traits for float ops (#1584) @antimora