MyGit

v3.1.0rc3

delta-io/delta

版本发布时间: 2024-01-27 07:07:11

delta-io/delta最新发布版本:v3.2.1rc1(2024-09-05 00:48:36)

Delta Lake 3.1.0

We are excited to announce the preview release of Delta Lake 3.1.0. This release includes several exciting new features.

Delta Spark

Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

The key features of this release are:

The key features of this release are:

Delta Sharing Spark

This release of Delta adds a new module called delta-sharing-spark which enables reading Delta tables shared using the Delta Sharing protocol in Apache Spark™. It is migrated from https://github.com/delta-io/delta-sharing/tree/main/spark repository to https://github.com/delta-io/delta/tree/master/sharing repository. The last release version of delta-sharing-spark is 1.0.4 from the previous location. Next release of delta-sharing-spark is with the current release of Delta which is 3.1.0.

Supported read types are: read snapshot of the table, incrementally read the table using streaming or read the changes (Change Data Feed) between two versions of the table.

“Delta Format Sharing” is newly introduced since delta-sharing-spark 3.1, which supports reading shared Delta tables with advanced Delta features such as deletion vectors and column mapping.

Below is an example of reading a Delta table shared using the Delta Sharing protocol in a Spark environment. For more examples refer to the documentation.

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .appName("...")
  .master("...")
  .config(
     "spark.sql.extensions",
      "io.delta.sql.DeltaSparkSessionExtension"
  ).config(
     "spark.sql.catalog.spark_catalog",
      "org.apache.spark.sql.delta.catalog.DeltaCatalog"
  ).getOrCreate()

val tablePath = "<profile-file-path>#<share-name>.<schema-name>.<table-name>"

// Batch query
spark.read
  .format("deltaSharing")
  .option("responseFormat", "delta")
  .load(tablePath)
  .show(10)

Delta Universal Format (UniForm)

Delta Universal Format (UniForm) allows you to read Delta tables from Iceberg and Hudi (coming soon) clients. Delta 3.1.0 provided the following improvements:

Delta Kernel

The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).

Delta 3.0.0 released the first version of Kernel. In this release, read support is further enhanced and APIs are solidified by taking into account the feedback received from connectors trying out the first version Delta 3.0.0.

For more information, refer to:

Delta Flink

Delta-Flink 3.1.0 is built on top of Apache Flink™ 1.16.1.

The key features of this release are

Delta Standalone

There are no updates to standalone in this release.

Credits

Ala Luszczak, Allison Portis, Ami Oka, Amogh Akshintala, Andreas Chatzistergiou, Bart Samwel, BjarkeTornager, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Dhruv Arya, EJ Song, Eric Maynard, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Haejoon Lee, Hao Jiang, Jared Wang, Jiaheng Tang, Jing Wang, Johan Lasperas, Kaiqi Jin, Kam Cheung Ting, Lars Kroll, Li Haoyi, Lin Zhou, Lukas Rupprecht, Mark Jarvin, Max Gekk, Ming DAI, Nick Lanham, Ole Sasse, Paddy Xu, Patrick Leahey, Peter Toth, Prakhar Jain, Renan Tomazoni Pinzon, Rui Wang, Ryan Johnson, Sabir Akhadov, Scott Sandre, Serge Rielau, Shixiong Zhu, Tathagata Das, Thang Long Vu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wei Luo, Wenchen Fan, Xin Zhao, jintao shen, panbingkun

How to use the preview release

Delta-Spark

Download Spark 3.5.0 from https://spark.apache.org/downloads.html

For this preview, we have published the artifacts to a staging repository. Here’s how you can use them:

spark-submit

Add –-repositories https://oss.sonatype.org/content/repositories/iodelta-1133 to the command line arguments. Example:

spark-submit --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories \
https://oss.sonatype.org/content/repositories/iodelta-1133 examples/examples.py

Currently, Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.1.0 by just providing the --packages io.delta:delta-spark_2.12:3.1.0 argument.

Spark-shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1133 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark-SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1133 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven project

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1133</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.1.0</version>
</dependency>

SBT project

libraryDependencies += "io.delta" %% "delta-spark" % "3.1.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1133

Delta-spark PyPi:

Name: delta-spark
Version: 3.1.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: delta-users@googlegroups.com
License: Apache-2.0
Location: <user-home>/.conda/envs/delta-release/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark

相关地址:原始地址 下载(tar) 下载(zip)

1、 delta-hive-assembly_2.12-3.1.0.jar 19.29MB

2、 delta-hive-assembly_2.12-3.1.0.jar.asc 833B

3、 delta-hive-assembly_2.12-3.1.0.jar.asc.sha256 105B

4、 delta-hive-assembly_2.12-3.1.0.jar.sha256 101B

5、 delta-hive-assembly_2.13-3.1.0.jar 20.74MB

6、 delta-hive-assembly_2.13-3.1.0.jar.asc 833B

7、 delta-hive-assembly_2.13-3.1.0.jar.asc.sha256 105B

8、 delta-hive-assembly_2.13-3.1.0.jar.sha256 101B

9、 delta-spark-3.1.0.tar.gz 21.43KB

10、 delta_spark-3.1.0-py3-none-any.whl 20.51KB

查看:2024-01-27发行的版本