We are a company that processes more than 100 million data a day in South Korea.
Currently, we are working on a plan to migrate ad hoc queries from trino + iceberg + s3 to starrocks.
In addition, for the near real-time analysis engine, StarRocks were selected while reviewing clickhouse and druid.
Our migration plan is as follows.
- Change existing iceberg + hms table to query from trino to starrocks
- Migrate data from iceberg + hms to the starrocksolap table
While proceeding with the first process, I share some issues.
- There are some parts that are not supported for map type.
An error occurs when the key value of the map is dynamically used under the where condition.
Query
// query
WITH A AS (
SELECT
'a' as event_key,
'x' as property_key
),
B AS (
SELECT
'a' as event_key,
map { 'x' :1 } as props
)
SELECT
*
FROM
A
JOIN B ON A.event_key = B.event_key
WHERE
props [property_key] = 1
Error Message
java.lang.IllegalStateException: null
at com.google.common.base.Preconditions.checkState(Preconditions.java:496) ~[spark-dpp-1.0.0.jar:?]
at com.starrocks.sql.optimizer.OptExpression.getOutputColumns(OptExpression.java:135) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.generatePushDownProject(PushDownSubfieldRule.java:157) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitLogicalJoin(PushDownSubfieldRule.java:292) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitLogicalJoin(PushDownSubfieldRule.java:86) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.operator.logical.LogicalJoinOperator.accept(LogicalJoinOperator.java:205) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitChildren(PushDownSubfieldRule.java:132) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitLogicalProject(PushDownSubfieldRule.java:226) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitLogicalProject(PushDownSubfieldRule.java:86) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.operator.logical.LogicalProjectOperator.accept(LogicalProjectOperator.java:103) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visitChildren(PushDownSubfieldRule.java:132) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visit(PushDownSubfieldRule.java:146) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule$PushDowner.visit(PushDownSubfieldRule.java:86) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.OptExpressionVisitor.visitLogicalTreeAnchor(OptExpressionVisitor.java:99) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.operator.logical.LogicalTreeAnchorOperator.accept(LogicalTreeAnchorOperator.java:52) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.rule.tree.prunesubfield.PushDownSubfieldRule.rewrite(PushDownSubfieldRule.java:59) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.Optimizer.pruneSubfield(Optimizer.java:619) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.Optimizer.logicalRuleRewrite(Optimizer.java:385) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.Optimizer.rewriteAndValidatePlan(Optimizer.java:572) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.Optimizer.optimizeByCost(Optimizer.java:195) ~[starrocks-fe.jar:?]
at com.starrocks.sql.optimizer.Optimizer.optimize(Optimizer.java:142) ~[starrocks-fe.jar:?]
at com.starrocks.sql.StatementPlanner.createQueryPlanWithReTry(StatementPlanner.java:249) ~[starrocks-fe.jar:?]
at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:128) ~[starrocks-fe.jar:?]
at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:87) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:486) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:394) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:588) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:872) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
- If unest is used, the memory required is higher than expected.
WITH RAW AS (
SELECT
'a' as first,
map{'type_1': 1, 'type_2': 2, 'type_3': 3} as types
)
SELECT
first,
ELEMENT_AT(types, type) AS val
FROM RAW
CROSS JOIN UNNEST(map_keys(types)) AS t(type)
Memory usage is expected to increase with the number of value values, but the actual memory usage is increased by the number of keys.
예) 10GB → 10 * Key count GB
- Reduce function not supported
Array functions and operators — Trino 439 Documentation
We are using trino’s reduce function for funnel calculation, which is not currently supported by starrocks.
- The window_funnel result value is abnormal.
window_funnel | StarRocks
In the query engine, the resulting value of the function must be guaranteed to be accurate.
When using window_funnel, it cannot be trusted and used because it is not the same as the result value in the document.
- The disk cache option for external catalog (iceberg) does not seem to work.
Variable
SET GLOBAL enable_query_cache = true;
SET GLOBAL enable_scan_datacache = true;
SET GLOBAL enable_sort_aggregate = true;
fe.conf
enable_iceberg_metadata_disk_cache = true
enable_iceberg_custom_worker_thread = true
enable_background_refresh_connector_metadata = true
background_refresh_metadata_interval_millis = 60000
background_refresh_metadata_time_secs_since_last_access_secs = 1440
hive_meta_cache_refresh_interval_s = 60
hive_meta_cache_ttl_s = 3600
cn.conf
storage_root_path = /data/starrocks
datacache_enable = true
datacache_disk_size = 80%
datacache_disk_path = /data/starrocks/datacache
datacache_meta_path = /data/starrocks/meta
lake_enable_vertical_compaction_fill_data_cache = true
When performing a query, memory cache seems to work, but when looking at the disk usage, the disk cache option does not seem to work.
Are there any additional parts that need to be setting?
Thank you for reading this long article.
We would like to successfully introduce and service StarRocks.
Please help us a lot.