MyGit

v0.6.0

mosaicml/composer

版本发布时间: 2022-04-21 09:49:15

mosaicml/composer最新发布版本:v0.25.0(2024-09-25 04:56:05)

🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

pip install --upgrade mosaicml==0.6.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.0

Major Changes

  1. 🗃️ Automatic Gradient Accumulation

    Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and hardware combination!

    To use automatic gradient accumulation, set grad_accum='auto'. For example:

    trainer = Trainer(
        ...,
        grad_accum='auto',
    )
    
  2. 💾 Artifact Logging

    Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

    Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.

  3. 📊 Metric Values on the State

    Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.

  4. ⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

    Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

  1. 🏃‍♀️ Training Run Names

    We introduced a run_name parameter in the Trainer to help organize training runs.

    trainer = Trainer(
        ...,
        run_name='awesome-traing-run',
    )
    

    We'll automatically pick one if the run name is not specified.

  2. 💈 Automatic Progress Bars

    The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

    To disable the progress bar, set progress_bar=False. For example:

    trainer = Trainer(
        ...,
        progress_bar=False,
    )
    
  3. 🪵 Logged Data in the Console

    To print Logger calls to the console, set the log_to_console and the console_log_level arguments.

    trainer = Trainer(
        ...,
        log_to_console=True,
        console_log_level="epoch",
    )
    

    By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.

  4. 📃 Capturing stdout and stderr in Log Files

    The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.

  5. ⬆️ PyTorch 1.11 Support

    We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!

  6. ✅ Checkpointing

    We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

    In addition, we changed the checkpointing argument names for the trainer.

    • The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.
    • The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.
    • load_path replaces load_path_format.
    • save_name replaces save_path_format.
    • save_latest_filename replaces save_latest_format.
  7. 🏎️ Profiling

    We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

    As part of this refactor, the profiler arguments have changed:

    • prof_trace_handlers replaces prof_event_handlers.
    • prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.
    • torch_prof_folder replaces torch_profiler_trace_dir
    • The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.
  8. 🏗️ TorchVision Model Architectures

    We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

Changelog

New Contributors

Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0

相关地址:原始地址 下载(tar) 下载(zip)

查看:2022-04-21发行的版本