hoodie-0.4.5
版本发布时间: 2019-05-29 11:00:12
apache/hudi最新发布版本:release-1.0.0-beta2(2024-07-16 15:41:55)
Highlights
- Dockerized demo with support for different Hive versions
- Smoother handling of append log on cloud stores
- Introducing a global bloom index, that enforces unique constraint across partitions
- CLI commands to analyze workloads, manage compactions
- Migration guide for folks wanting to move datasets to Hudi
- Added Spark Structured Streaming support, with a Hudi sink
- In-built support for filtering duplicates in DeltaStreamer
- Support for plugging in custom transformation in DeltaStreamer
- Better support for non-partitioned Hive tables
- Support hard deletes for Merge on Read storage
- New slack url & site urls
- Added presto bundle for easier integration
- Tons of bug fixes, reliability improvements
Full PR List
- @bhasudha - Create hoodie-presto bundle jar. fixes #567 #571
- @bhasudha - Close FSDataInputStream for meta file open in HoodiePartitionMetadata . Fixes issue #573 #574
- @yaoqinn - handle no such element exception in HoodieSparkSqlWriter #576
- @vinothchandar - Update site url in README
- @yaooqinn - typo: bundle jar with unrecognized variables #570
- @bvaradar - Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions #565
- @bvaradar - Fix Hoodie Record Reader to work with non-partitioned dataset ( ISSUE-561) #569
- @bvaradar - Hoodie Delta Streamer Features : Transformation and Hoodie Incremental Source with Hive integration #485
- @vinothchandar - Updating new slack signup link #566
- @yaooqinn - Using immutable map instead of mutables to generate parameters #559
- @n3nash - Fixing behavior of buffering in Create/Merge handles for invalid/wrong schema records #558
- @n3nash - cleaner should now use commit timeline and not include deltacommits #539
- @n3nash - Adding compaction to HoodieClient example #551
- @n3nash - Filtering partition paths before performing a list status on all partitions #541
- @n3nash - Passing a path filter to avoid including folders under .hoodie directory as partition paths #548
- @n3nash - Enabling hard deletes for MergeOnRead table type #538
- @msridhar - Add .m2 directory to Travis cache #534
- @artem0 - General enhancements #520
- @bvaradar - Ensure Hoodie works for non-partitioned Hive table #515
- @xubo245 - fix some spell errorin Hudi #530
- @leletan - feat(SparkDataSource): add structured streaming sink #486
- @n3nash - Serializing the complete payload object instead of serializing just the GenericRecord in HoodieRecordConverter #495
- @n3nash - Returning empty Statues for an empty spark partition caused due to incorrect bin packing #510
- @bvaradar - Avoid WriteStatus collect() call when committing batch to prevent Driver side OOM errors #512
- @vinothchandar - Explicitly handle lack of append() support during LogWriting #511
- @n3nash - Fixing number of insert buckets to be generated by rounding off to the closest greater integer #500
- @vinothchandar - Enabling auto tuning of insert splits by default #496
- @bvaradar - Useful Hudi CLI commands to debug/analyze production workloads #477
- @bvaradar - Compaction validate, unschedule and repair #481
- @shangxinli - Fix addMetadataFields() to carry over 'props' #484
- @n3nash - Adding documentation for migration guide and COW vs MOR tradeoffs #470
- @leletan - Add additional feature to drop later arriving dups #468
- @bvaradar - Fix regression bug which broke HoodieInputFormat handling of non-hoodie datasets #482
- @vinothchandar - Add --filter-dupes to DeltaStreamer #478
- @bvaradar - A quickstart demo to showcase Hudi functionalities using docker along with support for integration-tests #455
- @bvaradar - Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source #473
- @leletan - Adds HoodieGlobalBloomIndex #438