v0.7.0
版本发布时间: 2023-08-28 00:40:37
tinygrad/tinygrad最新发布版本:v0.9.2(2024-08-14 07:19:48)
Bigger again at 4942 lines :( But, tons of new features this time!
Just over 500 commits since 0.6.0
.
Release Highlights
- Windows support has been dropped to focus on Linux and Mac OS.
- Some functionality may work on Windows but no support will be provided, use WSL instead.
-
DiskTensors: a way to store tensors on disk has been added.
- This is coupled with functionality in
state.py
which supports saving/loading safetensors and loading torch weights.
- This is coupled with functionality in
- Tensor Cores are supported on M1/Apple Silicon and on the 7900 XTX (WMMA).
- Support on the 7900 XTX requires weights and data to be in float16, full float16 compute support will come in a later release.
- Tensor Core behaviour/usage is controlled by the
TC
envvar.
- Kernel optimization with nevergrad
- This optimizes the shapes going into the kernel, gated by the
KOPT
envvar.
- This optimizes the shapes going into the kernel, gated by the
- P2P buffer transfers are supported on most AMD GPUs when using a single python process.
- This is controlled by the
P2P
envvar.
- This is controlled by the
- LLaMA 2 support.
- A requirement of this is bfloat16 support for loading the weights, which is semi-supported by casting them to float16, proper bfloat16 support is tracked at #1290.
- The LLaMA example now also supports 8-bit quantization using the flag
--quantize
.
- Most MLPerf models have working inference examples. Training these models is currently being worked on.
- Initial multigpu training support.
- slow multigpu training by copying through host shared memory.
- Somewhat follows torch's multiprocessing and DistributedDataParallel high-level design.
- See the hlb_cifar10.py example.
- SymbolicShapeTracker and Symbolic JIT.
- These two things combined allow models with changing shapes to be jitted like transformers.
- This means that LLaMA can now be jitted for a massive increase in performance.
- Be warned that the API for this is very WIP and may change in the future, similarly with the rest of the tinygrad API.
- aarch64 and ptx assembly backend.
- WebGPU backend, see the
compile_efficientnet.py
example. - Support for torch like tensor indexing by other tensors.
- Some more
nn
layers were promoted, namelyEmbedding
and variousConv
layers. - VITS and so-vits-svc examples added.
- Initial documentation work.
- Quickstart guide:
/docs/quickstart.md
- Environment variable reference:
/docs/env_vars.md
- Quickstart guide:
And lots of small optimizations all over the codebase.