v1.0.0
版本发布时间: 2023-11-07 14:12:05
ray-project/kuberay最新发布版本:v1.2.1(2024-08-31 14:43:49)
KubeRay is officially in General Availability!
- Bump the CRD version from v1alpha1 to v1.
- Relocate almost all documentation to the Ray website.
- Improve RayJob UX.
- Improve GCS fault tolerance.
GCS fault tolerance
- [GCS FT] Improve GCS FT cleanup UX (#1592, @kevin85421)
- [Bug][RayCluster] Fix RAY_REDIS_ADDRESS parsing with redis scheme and… (#1556, @rueian)
- [Bug] RayService with GCS FT HA issue (#1551, @kevin85421)
- [Test][GCS FT] End-to-end test for cleanup_redis_storage (#1422)(#1459) (#1466, @rueian)
- [Feature][GCS FT] Clean up Redis once a GCS FT-Enabled RayCluster is deleted (#1412, @kevin85421)
- Update GCS fault tolerance YAML (#1404, @kevin85421)
- [GCS FT] Consider the case of sidecar containers (#1386, @kevin85421)
- [GCS FT] Give readiness / liveness probes good default values (#1364, @kevin85421)
- [GCS FT][Refactor] Redefine the behavior for deleting Pods and stop listening to Kubernetes events (#1341, @kevin85421)
CRD versioning
- [CRD] Inject CRD version to the Autoscaler sidecar container (#1496, @kevin85421)
- [CRD][2/n] Update from CRD v1alpha1 to v1 (#1482, @kevin85421)
- [CRD][1/n] Create v1 CRDs (#1481, @kevin85421)
- [CRD] Set maxDescLen to 0 (#1449, @kevin85421)
RayService
- [Hotfix][Bug] Avoid unnecessary zero-downtime upgrade (#1581, @kevin85421)
- [Feature] Add an example for RayService high availability (#1566, @kevin85421)
- [Feature] Add a flag to make zero downtime upgrades optional (#1564, @kevin85421)
- [Bug][RayService] KubeRay does not recreate Serve applications if a head Pod without GCS FT recovers from a failure. (#1420, @kevin85421)
- [Bug] Fix the filename of text summarizer YAML (#1415, @kevin85421)
- [serve] Change text ml yaml to use french in user config (#1403, @zcin)
- [services] Add text ml rayservice yaml (#1402, @zcin)
- [Bug] Fix flakiness of RayService e2e tests (#1385, @kevin85421)
- Add RayService sample test (#1377, @Darren221)
- [RayService] Revisit the conditions under which a RayService is considered unhealthy and the default threshold (#1293, @kevin85421)
- [RayService][Observability] Add more loggings about networking issues (#1282, @kevin85421)
RayJob
- [Feature] Improve observability for flaky RayJob test (#1587, @kevin85421)
- [Bug][RayJob] Fix FailedToGetJobStatus by allowing transition to Running (#1583, @architkulkarni)
- [RayJob] Fix RayJob status reconciliation (#1539, @astefanutti)
- [RayJob]: Always use target RayCluster image as default RayJob submitter image (#1548, @astefanutti)
- [RayJob] Add default CPU and memory for job submitter pod (#1319, @architkulkarni)
- [Bug][RayJob] Check dashboard readiness before creating job pod (#1381) (#1429, @rueian)
- [Feature][RayJob] Use RayContainerIndex instead of 0 (#1397) (#1427, @rueian)
- [RayJob] Enable job log streaming by setting
PYTHONUNBUFFERED
in job container (#1375, @architkulkarni) - Add field to expose entrypoint num cpus in rayjob (#1359, @shubhscoder)
- [RayJob] Add runtime env YAML field (#1338, @architkulkarni)
- [Bug][RayJob] RayJob with custom head service name (#1332, @kevin85421)
- [RayJob] Add e2e sample yaml test for shutdownAfterJobFinishes (#1269, @architkulkarni)
RayCluster
- [Enhancement] Remove unused variables in constant.go (#1474, @evalaiyc98)
- [Enhancement] GPU RayCluster doesn't work on GKE Autopilot (#1470, @kevin85421)
- [Refactor] Parameterize TestGetAndCheckServeStatus (#1450, @evalaiyc98)
- [Feature] Make replicas optional for WorkerGroupSpec (#1443, @kevin85421)
- use raycluster app's name as podgroup name key word (#1446, @lowang-bh)
- [Refactor] Make port name variables consistent and meaningful (#1389, @evalaiyc98)
- [Feature] Use image of Ray head container as the default Ray Autoscaler container (#1401, @kevin85421)
- Update Autoscaler YAML for the Autoscaler tutorial (#1400, @kevin85421)
- [Feature] Ray container must be the first application container (#1379, @kevin85421)
- [release blocker][Feature] Only Autoscaler can make decisions to delete Pods (#1253, @kevin85421)
- [release blocker][Autoscaler] Randomly delete Pods when scaling down the cluster (#1251, @kevin85421)
Helm charts
- Remove miniReplicas in raycluster-cluster.yaml (#1473, @evalaiyc98)
- Helm chart ray-cluster template reference fix (#1469, @chrisxstyles)
- fix: Issue #1391 - Custom labels not being pulled in (#1398, @rxraghu)
- Remove unnecessary kustomize in make helm (#1370, @shubhscoder)
- [Feature] Allow RayCluster Helm chart to specify different images for different worker groups (#1352, @Darren221)
- Allow manually creating init containers in Kuberay helm charts (#1287, @richardsliu)
KubeRay API Server
- Added Python API server client (#1561, @blublinsky)
- updating url use v1 (#1577, @blublinsky)
- Fixed processing of job submitter (#1562, @blublinsky)
- extended job APIs (#1537, @blublinsky)
- fixed volumes test in cluster test (#1498, @blublinsky)
- Add documentation for API Server monitoring (#1479, @blublinsky)
- created HA example for API server (#1461, @blublinsky)
- Numerous fixes to the API server to make RayJob APIs working (#1447, @blublinsky)
- Updated API server documentation (#1435, @z103cb)
- servev2 support for API server (#1419, @blublinsky)
- replacement for https://github.com/ray-project/kuberay/pull/1312 (#1409, @blublinsky)
- Updates to the apiserver swagger-ui (#1410, @z103cb)
- implemented liveness/readyness probe for the API server (#1369, @blublinsky)
- Operator support for openShift (#1371, @blublinsky)
- Removed use of the of BUILD_FLAGS in apiserver makefile (#1336, @z103cb)
- Api server makefile (#1301, @z103cb)
Documentation
- [Doc] Update release docs (#1621, @kevin85421)
- [Doc] Fix release doc format (#1578, @kevin85421)
- Update kuberay mcad integration doc (#1373, @tedhtchang)
- [Release][Doc] Add instructions to release Go modules. (#1546, @kevin85421)
- [Post v1.0.0-rc.1] Reenable sample YAML tests for latest release and update some docs (#1544, @kevin85421)
- Update operator development instruction (#1458, @tedhtchang)
- doc: fix moved link (#1462, @hongchaodeng)
- Fix mkDocs (#1448, @kevin85421)
- Update Kuberay doc to version 1.0.0 rc.0 (#1441, @Yicheng-Lu-llll)
- [Doc] Delete unused docs (#1440, @kevin85421)
- [Post Ray 2.7.0 Release] Update Ray versions to Ray 2.7.0 (#1423, @GeneDer)
- [Doc] Update README (#1433, @kevin85421)
- [release] Redirect users to Ray website (#1431, @kevin85421)
- [Docs] Update Security Guidance on Dashboard Ingress (#1413, @ijrsvt)
- Update Volcano integration doc (#1380, @annajung)
- [Doc] Add gke bucket yaml (#1372, @architkulkarni)
- [RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay (#1361, @architkulkarni)
- Delete ray_v1alpha1_rayjob.batch-inference.yaml (#1360, @architkulkarni)
- Documentation and example for running simple NLP service on kuberay (#1340, @gvspraveen)
- Add a document for profiling (#1299, @Yicheng-Lu-llll)
- Fix: Typo (#1295, @ArgonQQ)
- [Post release v0.6.0] Update CHANGELOG.md (#1274, @kevin85421)
- Release v0.6.0 doc validation (#1271, @kevin85421)
- [Doc] Develop Ray Serve Python script on KubeRay (#1250, @kevin85421)
- [Doc] Fix the order of comments in sample Job YAML file (#1242, @architkulkarni)
- [Doc] Upload a screenshot for the Serve page in Ray dashboard (#1236, @kevin85421)
- Fix typo (#1241, @mmourafiq)
CI
- [Bug] Fix flaky sample YAML tests (#1590, @kevin85421)
- Allow to install and remove operator via scripts (#1545, @jiripetrlik)
- [CI] Create release tag for ray-operator Go module (#1574, @astefanutti)
- [Test][Bug] Update worker replias idempotently in rayjob autoscaler envtest (#1471) (#1543, @rueian)
- Update Dockerfiles to address CVE-2023-44487 (HTTP/2 Rapid Reset) (#1540, @astefanutti)
- [CI] Skip redis raycluster sample YAML test (#1465, @architkulkarni)
- Revert "[CI] Skip redis raycluster sample YAML test" (#1490, @rueian)
- Remove GOARCH in ray-operator/Dockfile to support multi-arch images (#1442, @ideal)
- Update Dockerfile to address closed CVEs (#1488, @anishasthana)
- [CI] Update latest release to v1.0.0-rc.0 in tests (#1467, @architkulkarni)
- [CI] Reenable rayjob sample yaml latest test (#1464, @architkulkarni)
- [CI] Skip redis raycluster sample YAML test (#1465, @architkulkarni)
- Updating logrus and net packages in go.mod (#1495, @jbusche)
- Allow E2E tests to run with arbitrary k8s cluster (#1306, @jiripetrlik)
- Bump golang.org/x/net from 0.0.0-20210405180319-a5a99cb37ef4 to 0.7.0 in /proto (#1345, @dependabot[bot])
- Bump golang.org/x/text from 0.3.5 to 0.3.8 in /proto (#1344, @dependabot[bot])
- Bump go.mongodb.org/mongo-driver from 1.3.4 to 1.5.1 in /apiserver (#1407, @dependabot[bot])
- Bump golang.org/x/sys from 0.0.0-20210510120138-977fb7262007 to 0.1.0 in /proto (#1346, @dependabot[bot])
- Bump golang.org/x/net from 0.0.0-20210813160813-60bc85c4be6d to 0.7.0 in /cli (#1405, @dependabot[bot])
- Bump github.com/emicklei/go-restful from 2.9.5+incompatible to 2.16.0+incompatible in /ray-operator (#1348, @dependabot[bot])
- Bump golang.org/x/sys from 0.0.0-20211210111614-af8b64212486 to 0.1.0 in /cli (#1347, @dependabot[bot])
- [CI] Remove RayService tests from comopatibility-test.py (#1395, @kevin85421)
- [CI] Remove extraPortMappings from kind configurations (#1366, @kevin85421)
- [CI] Update latest ray version 2.5.0 -> 2.6.3 (#1320, @architkulkarni)
- Bump the golangci-lint version in the api server makefile (#1342, @z103cb)
- [CI] Refactor pipeline and test RayCluster sample yamls (#1321, @architkulkarni)
- Update doc and base image for Go 1.19 (#1330, @tedhtchang)
- Fix release actions (#1323, @anishasthana)
- Upgrade to Go 1.19 (#1325, @kevin85421)
- [CI] Run sample job YAML tests in buildkite (#1315, @architkulkarni)
- [CI] Downgrade
kind
from tov0.20.0
tov0.11.1
(#1313, @architkulkarni) - [CI] Publish KubeRay operator / apiserver images to Quay (#1307, @kevin85421)
- [CI] Install kuberay operator in buildkite test (#1308, @architkulkarni)
- [CI] Verify kubectl in kind-in-docker step (#1305, @architkulkarni)
- [Quay] Sanity check for KubeRay repository setup (#1300, @kevin85421)
- [CI] Only run test_ray_serve for Ray 2.6.0 and later (#1288, @kevin85421)
- Update ray operator Dockerfile (#1213, @anishasthana)
- [Golang] Remove
go get
(#1283, @ijrsvt) - Dependencies: Upgrade golang.org/x packages (#1281, @ijrsvt)
- [CI] Add
kind
-in-Docker test to Buildkite CI (#1243, @architkulkarni)
Others
- Fix: odd number of arguments (#1594, @chenk008)
- [Feature][Observability] Scrape Autoscaler and Dashboard metrics (#1493, @kevin85421)
- [Benchmark] KubeRay memory / scalability benchmark (#1324, @kevin85421)
- Do not update pod labels if they haven't changed (#1304, @JoshKarpel)
- Add Ray cluster spec for TPU pods (#1292, @richardsliu)
- [Grafana][Observability] Embed Grafana dashboard panels into Ray dashboard (#1278, @kevin85421)
- [Feature] Allow custom labels&annotations for kuberay operator (#1275) (#1276, @mariusp)
1、 v1.0.0-darwin-amd64.zip 7.55MB
2、 v1.0.0-linux-amd64.zip 7.74MB