2.4.0
版本发布时间: 2022-10-20 20:53:25
StarRocks/starrocks最新发布版本:3.3.6(2024-11-20 14:38:43)
New Features
- Supports creating a materialized view based on multiple base tables to accelerate queries with JOIN operations.
- Supports overwriting data via INSERT OVERWRITE.
- [Preview] Provides stateless Compute Nodes (CN) that can be horizontally scaled. You can use StarRocks Operator to deploy CN into your Kubernetes (K8s) cluster to achieve automatic horizontal scaling.
- Outer Join supports non-equi joins in which join items are related by comparison operators including
<
,<=
,>
,>=
, and<>
. - Supports creating Iceberg catalogs and Hudi catalogs, which allow direct queries on data from Apache Iceberg and Apache Hudi.
- Supports querying ARRAY-type columns from Apache Hive™ tables in CSV format.
- Supports viewing the schema of external data via DESC.
- Supports granting a specific role or IMPERSONATE permission to a user via GRANT and revoking them via REVOKE, and supports executing an SQL statement with IMPERSONATE permission via EXECUTE AS.
- Supports FDQN access: now you can use a domain name or the combination of hostname and port as the unique identification of a BE or an FE node. This prevents access failures caused by changing IP addresses.
- flink-connector-starrocks supports Primary Key model partial update.
- Provides the following new functions:
- array_contains_all: checks whether a specific array is a subset of another.
- percentile_cont: calculates the percentile value with linear interpolation.
Improvements
- The Primary Key model supports flushing VARCHAR-type primary key indexes to disks. From version 2.4.0, the Primary Key model supports the same data types for primary key indexes regardless of whether the persistent primary key index is turned on or not.
- Optimized the query performance on external tables.
- Supports late materialization during queries on external tables in Parquet format to optimize the query performance on data lakes with small-scale filtering involved.
- Small I/O operations can be merged to reduce the delay for querying data lakes, thereby improving the query performance on external tables.
- Optimized the performance of window functions.
- Optimized the performance of Cross Join by supporting predicate pushdown.
- Histograms are added to CBO statistics. Full statistics collection is further optimized.
- Adaptive multi-threading is enabled for tablet scanning to reduce the dependency of scanning performance on the tablet number. As a result, you can set the number of buckets more easily.
- Supports querying compressed TXT files in Apache Hive.
- Adjusted the mechanisms of default PageCache size calculation and memory consistency check to avoid OOM issues during multi-instance deployments.
- Improved the performance of large-size batch load on the PRIMARY KEY model up to two times by removing final_merge operations.
- Supports a Stream Load transaction interface to implement a two-phase commit (2PC) for transactions that are run to load data from external systems such as Apache Flink® and Apache Kafka®, improving the performance of highly concurrent stream loads.
- Functions:
- You can use COUNT DISTINCT over multiple columns to calculate the number of distinct column combinations.
- Window functions min() and max() support sliding windows.
- Optimized the performance of the window_funnel function.
Bug Fixes
The following bugs are fixed:
- DECIMAL data types returned by DESC are different from those specified in the CREATE TABLE statement. #7309
- FE metadata management issues that affect the stability of FE. #6685 #9445 #7974 #7455
- Data load-related issues:
- Data Lake analytic-related issues:
- Metadata can be inconsistent between the Leader FE and Follower FE nodes. #11215
- BE crashes when BITMAP type data size is larger than 2GB. #11178
Behavior Change
- Page Cache is enabled by default. The default cache size is 20% of the system memory.
Others
- Announcing the stable release of Resource Group.
- Announcing the stable release of the JSON data type and its related functions.