python-v0.19.0
版本发布时间: 2024-08-15 06:14:22
delta-io/delta-rs最新发布版本:rust-v0.20.1(2024-09-28 21:21:07)
Breaking changes!
Default writer engine has changed to rust. Replace your partition_filters with a predicate (sql) instead. PyArrow engine is deprecated now, and will be removed in v1.0.
Highlights
- CDF support in write_deltalake, delete, and merge operation
- Expired logs cleanup during post-commit. Can be disabled with
delta.enableExpiredLogCleanup = false
- Improved MERGE performance by using predicate non-partition columns min/max for prefiltering
-
ADD column
operation - Speed up log parsing
Performance improvements
- perf: apply projection when reading checkpoint parquet by @alexwilcoxson-rel in https://github.com/delta-io/delta-rs/pull/2717
- perf: grab file size in rust by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2734
- feat: improve merge performance by using predicate non-partition columns min/max for prefiltering by @JonasDev1 in https://github.com/delta-io/delta-rs/pull/2513
- perf: early stop if all values in arr are null by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2764
New features
- feat(python, rust): cdc write-support for
delete
operation by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2721 - feat(python, rust): cdc write-support for
overwrite
andreplacewhere
writes by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2722 - feat: introduce CDC generation for merge operations by @rtyler in https://github.com/delta-io/delta-rs/pull/2747
- feat: use logical plan in delete, delta planner refactoring by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2725
- feat: use logical plan in update, refactor/simplify CDCTracker by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2727
- feat(python, rust): arrow large/view types passthrough, rust default engine by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2738
- feat(python, rust): cleanup expired logs post-commit hook by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2459
- feat(python, rust):
add column
operation by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2562 - feat(python): handle PyCapsule interface objects in write_deltalake by @kylebarron in https://github.com/delta-io/delta-rs/pull/2534
- feat(rust): fix size_in_bytes in last_checkpoint_ to i64 by @sherlockbeard in https://github.com/delta-io/delta-rs/pull/2649
- feat(rust,python): cast each parquet file to delta schema by @HawaiianSpork in https://github.com/delta-io/delta-rs/pull/2615
- feat: support userMetadata in CommitInfo by @jkylling in https://github.com/delta-io/delta-rs/pull/2670
- feat(python, rust): add projection in CDF reads by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2704
- feat(python): add DeltaTable.is_deltatable static method (#2662) by @omkar-foss in https://github.com/delta-io/delta-rs/pull/2715
- feat: improved test fixtures by @roeap in https://github.com/delta-io/delta-rs/pull/2749
- feat: fail fast on forked process by @Tom-Newton in https://github.com/delta-io/delta-rs/pull/2765
- feat: restore the TryFrom for DeltaTablePartition by @rtyler in https://github.com/delta-io/delta-rs/pull/2767
- feat: more economic data skipping with datafusion by @roeap in https://github.com/delta-io/delta-rs/pull/2772
Bug Fixes
- fix(rust): inconsistent order of partitioning columns (#2494) by @aditanase in https://github.com/delta-io/delta-rs/pull/2614
- fix(rust,python): checkpoint with column nullable false by @sherlockbeard in https://github.com/delta-io/delta-rs/pull/2680
- fix: update delta kernel version by @jeppe742 in https://github.com/delta-io/delta-rs/pull/2685
- fix(python): empty dataset fix for "pyarrow" engine by @sherlockbeard in https://github.com/delta-io/delta-rs/pull/2689
- fix: ensure DataFusion SessionState Parquet options are applied to DeltaScan by @alexwilcoxson-rel in https://github.com/delta-io/delta-rs/pull/2702
- fix(python, rust): use url encoder when encoding partition values by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2705
- fix(python, rust): use input schema to get correct schema in cdf reads by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2723
- fix: change arrow map root name to follow with parquet root name by @sclmn in https://github.com/delta-io/delta-rs/pull/2538
- fix: schema adapter doesn't map partial batches correctly by @alexwilcoxson-rel in https://github.com/delta-io/delta-rs/pull/2735
- fix: optimize Spark written tables by @rtyler in https://github.com/delta-io/delta-rs/pull/1650
- fix(python, rust): cdc in writer not creating inserts by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2751
- fix(python, rust): don't flatten fields during cdf read by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2763
- fix: column parsing to include nested columns and enclosing char by @gtrawinski in https://github.com/delta-io/delta-rs/pull/2737
Other Changes
- chore: missed one macos runner reference in actions by @rtyler in https://github.com/delta-io/delta-rs/pull/2645
- chore: add a reproduction case for merge failures with struct
by @rtyler in https://github.com/delta-io/delta-rs/pull/2644 - ci: update CODEOWNERS by @hntd187 in https://github.com/delta-io/delta-rs/pull/2650
- chore: increase subcrate versions by @rtyler in https://github.com/delta-io/delta-rs/pull/2648
- docs: fix bullets on hdfs docs by @Kimahriman in https://github.com/delta-io/delta-rs/pull/2653
- docs: improve navigation fixes by @avriiil in https://github.com/delta-io/delta-rs/pull/2660
- docs: add integration docs for s3 backend by @avriiil in https://github.com/delta-io/delta-rs/pull/2658
- chore: bump ruff to 0.5.2 by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2673
- chore: enable
RUF
ruleset forruff
by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2677 - chore: pin
ruff
andmypy
versions in thelint
stage in the CI pipeline by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2679 - chore: update README.md by @veronewra in https://github.com/delta-io/delta-rs/pull/2684
- chore: create separate action to setup python and rust in the cicd pipeline by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2687
- chore: add test coverage command to
Makefile
by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2688 - chore: improve contributing.md by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2672
- chore: remove stale code for conditional import of
Literal
by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2676 - chore: remove references to black from the project by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2674
- chore: refactor
write_deltalake
inwriter.py
by @fpgmaas in https://github.com/delta-io/delta-rs/pull/2695 - chore: upgrade to datafusion 40 by @rtyler in https://github.com/delta-io/delta-rs/pull/2661
- chore: prepare python release 0.18.3 by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2707
- chore: enabling actions for merge groups by @rtyler in https://github.com/delta-io/delta-rs/pull/2718
- chore(deps): update sqlparser requirement from 0.47 to 0.49 by @dependabot in https://github.com/delta-io/delta-rs/pull/2714
- chore: try an alternative docke compose invocation syntax by @rtyler in https://github.com/delta-io/delta-rs/pull/2724
- chore(deps): update which requirement from 4 to 6 by @dependabot in https://github.com/delta-io/delta-rs/pull/2730
- chore: update changelog and versions for next release by @rtyler in https://github.com/delta-io/delta-rs/pull/2740
- chore: add to code_owner crates by @ion-elgreco in https://github.com/delta-io/delta-rs/pull/2741
- chore: update delta_kernel to 0.3.0 by @alexwilcoxson-rel in https://github.com/delta-io/delta-rs/pull/2742
- docs: fix broken link in docs by @astrojuanlu in https://github.com/delta-io/delta-rs/pull/2746
- chore: upgrade to datafusion 41 by @rtyler in https://github.com/delta-io/delta-rs/pull/2761
- chore: prepare the next notable release of 0.19.0 by @rtyler in https://github.com/delta-io/delta-rs/pull/2768
- chore: fix a bunch of clippy lints and re-enable tests by @rtyler in https://github.com/delta-io/delta-rs/pull/2773
New Contributors
- @aditanase made their first contribution in https://github.com/delta-io/delta-rs/pull/2614
- @fpgmaas made their first contribution in https://github.com/delta-io/delta-rs/pull/2673
- @kylebarron made their first contribution in https://github.com/delta-io/delta-rs/pull/2534
- @veronewra made their first contribution in https://github.com/delta-io/delta-rs/pull/2684
- @jeppe742 made their first contribution in https://github.com/delta-io/delta-rs/pull/2685
- @sclmn made their first contribution in https://github.com/delta-io/delta-rs/pull/2538
- @astrojuanlu made their first contribution in https://github.com/delta-io/delta-rs/pull/2746
- @gtrawinski made their first contribution in https://github.com/delta-io/delta-rs/pull/2737
Full Changelog: https://github.com/delta-io/delta-rs/compare/python-v0.18.2...python-v0.19.0