ray-1.8.0
版本发布时间: 2021-11-03 02:33:14
ray-project/ray最新发布版本:ray-2.37.0(2024-09-25 07:37:52)
Highlights
- Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here.
- Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!
- This Ray release supports Apple Silicon (M1 Macs). Check out the installation instructions for more information!
Ray Autoscaler
🎉 New Features:
- Fake multi-node mode for autoscaler testing (#18987)
💫Enhancements:
- Improve unschedulable task warning messages by integrating with the autoscaler (#18724)
Ray Client
💫Enhancements
- Use async rpc for remote call and actor creation (#18298)
Ray Core
💫Enhancements
- Eagerly install job-level runtime_env (#19449, #17949)
🔨 Fixes:
- Fixed resource demand reporting for infeasible 1-CPU tasks (#19000)
- Fixed printing Python stack trace in Python worker (#19423)
- Fixed macOS security popups (#18904)
- Fixed thread safety issues for coreworker (#18902, #18910, #18913 #19343)
- Fixed placement group performance and resource leaking issues (#19277, #19141, #19138, #19129, #18842, #18652)
- Improve unschedulable task warning messages by integrating with the autoscaler (#18724)
- Improved Windows support (#19014, #19062, #19171, #19362)
- Fix runtime_env issues (#19491, #19377, #18988)
Ray Data
Ray Datasets is now in beta! The beta release includes a new integration with Ray Train yielding scalable ML ingest for distributed training. It supports repeating and rewindowing pipelines, zipping two pipelines together, better cancellation of Datasets workloads, and many performance improvements. Check out the docs here, try it out for your ML ingest and batch inference workloads, and let us know how it goes!
🎉 New Features:
- Ray Train integration (#17626)
- Add support for repeating and rewindowing a DatasetPipeline (#19091)
- .iter_epochs() API for iterating over epochs in a DatasetPipeline (#19217)
- Add support for zipping two datasets together (#18833)
- Transformation operations are now cancelled when one fails or the entire workload is killed (#18991)
- Expose from_pandas()/to_pandas() APIs that accept/return plain Pandas DataFrames (#18992)
- Customize compression, read/write buffer size, metadata, etc. in the IO layer (#19197)
- Add spread resource prefix for manual round-robin resource-based task load balancing
💫Enhancements:
- Minimal rows are now dropped when doing an equalized split (#18953)
- Parallelized metadata fetches when reading Parquet datasets (#19211)
🔨 Fixes:
- Tensor columns now properly support table slicing (#19534)
- Prevent Datasets tasks from being captured by Ray Tune placement groups (#19208)
- Empty datasets are properly handled in most transformations (#18983)
🏗 Architecture refactoring:
- Tensor dataset representation changed to a table with a single tensor column (#18867)
RLlib
🎉 New Features:
- Allow n-step > 1 and prioritized replay for R2D2 and RNNSAC agents. (18939)
🔨 Fixes:
- Fix memory leaks in TF2 eager mode. (#19198)
- Faster worker spaces inference if specified through configuration. (#18805)
- Fix bug for complex obs spaces containing Box([2D shape]) and discrete components. (#18917)
- Torch multi-GPU stats not protected against race conditions. (#18937)
- Fix SAC agent with dict space. (#19101)
- Fix A3C/IMPALA in multi-agent setting. (#19100)
🏗 Architecture refactoring:
- Unify results dictionary returned from Trainer.train() across agents regardless of (tf or pytorch, multi-agent, multi-gpu, or algos that use >1 SGD iterations, e.g. ppo) (#18879)
Ray Workflow
🎉 New Features:
- Introduce workflow.delete (#19178)
🔨Fixes:
- Fix the bug which allow workflow step to be executed multiple times (#19090)
🏗 Architecture refactoring:
- Object reference serialization is decoupled from workflow storage (#18328)
Tune
🎉 New Features:
- PBT: Add burn-in period (#19321)
💫Enhancements:
- Optional forcible trial cleanup, return default autofilled metrics even if Trainable doesn't report at least once (#19144)
- Use queue to display JupyterNotebookReporter updates in Ray client (#19137)
- Add resume="AUTO" and enhance resume error messages (#19181)
- Provide information about resource deadlocks, early stopping in Tune docs (#18947)
- Fix HEBOSearch installation docs (#18861)
- OptunaSearch: check compatibility of search space with evaluated_rewards (#18625)
- Add
save
andrestore
methods for searchers that were missing it & test (#18760) - Add documentation for reproducible runs (setting seeds) (#18849)
- Depreciate
max_concurrent
inTuneBOHB
(#18770) - Add
on_trial_result
to ConcurrencyLimiter (#18766) - Ensure arguments passed to tune
remote_run
match (#18733) - Only disable ipython in remote actors (#18789)
🔨Fixes:
- Only try to sync driver if sync_to_driver is actually enabled (#19589)
- sync_client: Fix delete template formatting (#19553)
- Force no result buffering for hyperband schedulers (#19140)
- Exclude trial checkpoints in experiment sync (#19185)
- Fix how durable trainable is retained in global registry (#19223, #19184)
- Ensure
loc
column in progress reporter is filled (#19182) - Deflake PBT Async test (#19135)
- Fix
Analysis.dataframe()
documentation and enable passing ofmode=None
(#18850)
Ray Train (SGD)
Ray SGD has been rebranded to Ray Train! The new documentation landing page can be found here. Ray Train is integrated with Ray Datasets for distributed data loading while training, documentation available here.
🎉 New Features:
- Ray Datasets Integration (#17626)
🔨Fixes:
- Improved support for multi-GPU training (#18824, #18958)
- Make actor creation async (#19325)
📖Documentation:
- Rename Ray SGD v2 to Ray Train (#19436)
- Added migration guide from Ray SGD v1 (#18887)
Serve
🎉 New Features:
- Add ability to recover from a checkpoint on cluster failure (#19125)
- Support kwargs to deployment constructors (#19023)
🔨Fixes:
- Fix asyncio compatibility issue (#19298)
- Catch spurious ConnectionErrors during shutdown (#19224)
- Fix error with uris=None in runtime_env (#18874)
- Fix shutdown logic with exit_forever (#18820)
🏗 Architecture refactoring:
- Progress towards Serve autoscaling (#18793, #19038, #19145)
- Progress towards Java support (#18630)
- Simplifications for long polling (#19154, #19205)
Dashboard
🎉 New Features:
- Basic support for the dashboard on Windows (#19319)
🔨Fixes:
- Fix healthcheck issue causing the dashboard to crash under load (#19360)
- Work around aiohttp 4.0.0+ issues (#19120)
🏗 Architecture refactoring:
- Improve dashboard agent retry logic (#18973)
Thanks
Many thanks to all those who contributed to this release! @rkooo567, @lchu-ibm, @scv119, @pdames, @suquark, @antoine-galataud, @sven1977, @mvindiola1, @krfricke, @ijrsvt, @sighingnow, @marload, @jmakov, @clay4444, @mwtian, @pcmoritz, @iycheng, @ckw017, @chenk008, @jovany-wang, @jjyao, @hauntsaninja, @franklsf95, @jiaodong, @wuisawesome, @odp, @matthewdeng, @duarteocarmo, @czgdp1807, @gjoliver, @mattip, @richardliaw, @max0x7ba, @Jasha10, @acxz, @xwjiang2010, @SongGuyang, @simon-mo, @zhisbug, @ccssmnn, @Yard1, @hazeone, @o0olele, @froody, @robertnishihara, @amogkam, @sasha-s, @xychu, @lixin-wei, @architkulkarni, @edoakes, @clarkzinzow, @DmitriGekhtman, @avnishn, @liuyang-my, @stephanie-wang, @Chong-Li, @ericl, @juliusfrost, @carlogrisetti