2.5.0
版本发布时间: 2023-01-23 09:19:47
StarRocks/starrocks最新发布版本:3.3.6(2024-11-20 14:38:43)
New Features
- Supports querying Merge On Read tables using Hudi catalogs and Hudi external tables. #6780
- Supports querying STRUCT and MAP data using Hive catalogs, Hudi catalogs, and Iceberg catalogs. #10677
- Provides Block Cache to improve access performance of hot data stored in external storage systems, such as HDFS. #11597
- Supports creating Delta Lake catalogs, which allow direct queries on data from Delta Lake. #11972
- Hive, Hudi, and Iceberg catalogs are compatible with AWS Glue. #12249
- Supports creating file external tables, which allow direct queries on - Parquet and ORC files from HDFS and object stores. #13064
- Supports creating materialized views based on Hive, Hudi, Iceberg catalogs, and materialized views. For more information, see Materialized view. #11116 #11873
- Supports conditional updates for tables that use the Primary Key model. For more information, see Change data through loading. #12159
- Supports Query Cache, which stores intermediate computation results of queries, improving the QPS and reduces the average latency of highly-concurrent, simple queries. #9194
- Supports specifying the priority of Broker Load jobs. For more information, see BROKER LOAD #11029
- Supports specifying the number of replicas for data loading for StarRocks native tables. For more information, see CREATE TABLE. #11253
- Supports query queues. #12594
- Supports isolating compute resources occupied by data loading, thereby limiting the resource consumption of data loading tasks. For more information, see Resource group. #12606
- Supports specifying the following data compression algorithms for StarRocks native tables: LZ4, Zstd, Snappy, and Zlib. For more information, see Data compression. #10097 #12020
- Supports user-defined variables. #10011
- Supports lambda expression and the following higher-order functions: array_map, array_filter, array_sum, and array_sortby. #9461 #9806 #10323 #14034
- Provides the QUALIFY clause that filters the results of window functions. #13239
- Supports using the result returned by the uuid() and uuid_numeric() functions as the default value of a column when you create a table. For more information, see CREATE TABLE. #11155
- Supports the following functions: map_size, map_keys, map_values, max_by, sub_bitmap, bitmap_to_base64, host_name, and date_slice. #11299 #11323 #12243 #11776 #12634 #14225
Improvements
- Optimized the metadata access performance when you query external data using Hive catalogs, Hudi catalogs, and Iceberg catalogs. #11349
- Supports querying ARRAY data using Elasticsearch external tables. #9693
- Optimized the following aspects of materialized views:
- Multi-table async refresh materialized views support automatic and transparent query rewrite based on the SPJG-type materialized views. For more information, see Materialized view. #13193
- Multi-table async refresh materialized views support multiple async refresh mechanisms. For more information, see Materialized view. #12712 #13171 #13229 #12926
- The efficiency of refreshing materialized views is improved. #13167
- StarRocks automatically sets an appropriate number of tablets when you create a table, eliminating the need for manual operations. For more information, see CREATE TABLE. #10614
- Optimized the following aspects of data loading:
- Optimized loading performance in multi-replica scenarios and supported single_leader_replication, which gains a one-fold increase. For more information about single_leader_replication, see CREATE TABLE. #10138
- Broker Load and Spark Load no longer need to depend on brokers for data loading when only one HDFS system or Kerberos is configured. For more information, see Load data from HDFS or cloud storage and Bulk load using Apache Spark™. #9049 #9228
- Optimized the performance of Broker Load when a large number of small ORC files are loaded. #11380 Reduced the memory usage when you load data into tables of the Primary Key Model.
- Optimized the information_schema database and the tables and columns tables within. Adds a new table table_config. For more information, see Information Schema. #10033 Optimized data backup and restore:
- Supports backing up and restoring data from multiple tables in a database at a time. For more information, see Backup and restore data. #11619 Supports backing up and restoring data from Primary Key tables. For more information, see Backup and restore. #11885 Optimized the following functions:
- Added an optional parameter for the time_slice function, which is used to determine whether the beginning or end of the time interval is returned. #11216
- Added a new mode INCREASE for the window_funnel function to avoid computing duplicate timestamps. #10134
- Supports specifying multiple arguments in the unnest function. #12484 lead() and lag() functions support querying HLL and BITMAP data. For more information, see Window function. #12108
- The following ARRAY functions support querying JSON data: array_agg, array_sort, array_concat, array_slice, and reverse. #13155
- Optimized the use of some functions. The current_date, current_timestamp, current_time, localtimestamp, and localtime functions can be executed without using (), for example, you can directly run select current_date;. # 14319 Removed some redundant information from FE logs. # 15374
Bug Fixes
The following bugs are fixed:
- The append_trailing_char_if_absent() function may return an incorrect result when the first argument is empty. #13762
- After a table is restored using the RECOVER statement, the table does not exist. #13921
- The result returned by the SHOW CREATE MATERIALIZED VIEW statement does not contain the database and catalog specified in the query statement when the materialized view was created. #12833
- Schema change jobs in the waiting_stable state cannot be canceled. #12530
- Running the SHOW PROC '/statistic'; command on a Leader FE and non-Leader FE returns different results. #12491
- The position of the ORDER BY clause is incorrect in the result returned by SHOW CREATE TABLE. # 13809
- When users use Hive Catalog to query Hive data, if the execution plan generated by FE does not contain partition IDs, BEs fail to query Hive partition data. # 15486.
Behavior Change
- Changed the default value of the AWS_EC2_METADATA_DISABLED parameter to False, which means that the metadata of Amazon EC2 is obtained to access AWS resources.
- Renamed session variable is_report_success to enable_profile, which can be queried using the SHOW VARIABLES statement.
- Added four reserved keywords: CURRENT_DATE, CURRENT_TIME, LOCALTIME, and LOCALTIMESTAMP. # 14319
- The maximum length of table and database names can be up to 1023 characters. # 14929 # 15020
Upgrade Notes
- You can upgrade your cluster to 2.5.0 from 2.0.x, 2.1.x, 2.2.x, 2.3.x, or 2.4.x. However, if you need to perform a rollback, we recommend that you roll back only to 2.4.x.
- If you have a partitioned table that uses LIST partitioning, you must delete this table before the upgrade.