Starrocks 3.4.1 slower than 3.3.9 in shared data mode?

Hi all,

We have a deployment of starrocks 3.3.9 with about 160GB of data (zstd compressed).

In shared data mode, we’re quite happy with it, compared to shared nothing it’s a bit slower, but with enough compute nodes it’s close enough to be acceptable, and for sure it’s a lot more flexible.

However when upgrading to 3.4.1, now it looks like queries became at least twice slower.

Reverting back to 3.3.9, we get back the faster performances.

Is that something expected ? maybe being worked on ?

Could you get a profile to analyze the issue?

Hello,
It turns out it is already slower in version 3.3.11. Here’s the profile with 3.3.9 of a typical query

Query:
  Summary:
     - Query ID: 221634a1-134f-11f0-af79-cefa87bafdab
     - Start Time: 2025-04-07 01:25:04
     - End Time: 2025-04-07 01:25:06
     - Total: 1s739ms
     - Query Type: Query
     - Query State: Finished
     - StarRocks Version: 3.3.9-dfae8f9
     - User: root
     - Default Db: project_metrics
     - Sql Statement: /* ApplicationName=DBeaver 25.0.1 - SQLEditor <fsn3.sql> */ select date_trunc('day', time) as time, cluster, count(*) as count
from N_events group by 1, 2 order by 1 desc, 2
     - Variables: parallel_fragment_exec_instance_num=1,max_parallel_scan_instance_num=-1,pipeline_dop=0,enable_adaptive_sink_dop=true,enable_runtime_adaptive_dop=false,runtime_profile_report_interval=10,resource_group=default_wg
     - NonDefaultSessionVariables: {"sql_mode_v2":{"defaultValue":32,"actualValue":2097184},"sql_select_limit":{"defaultValue":9223372036854775807,"actualValue":200},"character_set_results":{"defaultValue":"utf8","actualValue":"NULL"},"query_timeout":{"defaultValue":300,"actualValue":3500},"enable_adaptive_sink_dop":{"defaultValue":false,"actualValue":true},"enable_profile":{"defaultValue":false,"actualValue":true}}
     - Collect Profile Time: 4ms
     - IsProfileAsync: true
  Planner:
     - -- Parser[1] 7ms
     - -- Total[1] 748ms
     -     -- Analyzer[1] 0
     -         -- Lock[1] 0
     -         -- AnalyzeDatabase[1] 0
     -         -- AnalyzeTemporaryTable[1] 0
     -         -- AnalyzeTable[1] 0
     -     -- Transformer[1] 0
     -     -- Optimizer[1] 175ms
     -         -- MVPreprocess[1] 0
     -         -- MVTextRewrite[1] 0
     -         -- RuleBaseOptimize[1] 122ms
     -         -- CostBaseOptimize[1] 43ms
     -         -- PhysicalRewrite[1] 7ms
     -         -- PlanValidate[1] 0
     -             -- InputDependenciesChecker[1] 0
     -             -- TypeChecker[1] 0
     -             -- CTEUniqueChecker[1] 0
     -             -- ColumnReuseChecker[1] 0
     -     -- ExecPlanBuild[1] 570ms
     - -- Pending[1] 0
     - -- Prepare[1] 13ms
     - -- Deploy[1] 152ms
     -     -- DeployLockInternalTime[1] 152ms
     -         -- DeploySerializeConcurrencyTime[3] 26ms
     -         -- DeployStageByStageTime[9] 0
     -         -- DeployWaitTime[9] 124ms
     -             -- DeployAsyncSendTime[7] 0
     - DeployDataSize: 6946613
    Reason:
  Execution:
     - Topology: {"rootId":6,"nodes":[{"id":6,"name":"MERGE_EXCHANGE","properties":{"sinkIds":[],"displayMem":true},"children":[5]},{"id":5,"name":"TOP_N","properties":{"sinkIds":[6],"displayMem":true},"children":[4]},{"id":4,"name":"AGGREGATION","properties":{"displayMem":true},"children":[3]},{"id":3,"name":"EXCHANGE","properties":{"displayMem":true},"children":[2]},{"id":2,"name":"AGGREGATION","properties":{"sinkIds":[3],"displayMem":true},"children":[1]},{"id":1,"name":"PROJECT","properties":{"displayMem":false},"children":[0]},{"id":0,"name":"OLAP_SCAN","properties":{"displayMem":false},"children":[]}]}
     - FrontendProfileMergeTime: 6.728ms
     - QueryAllocatedMemoryUsage: 16.458 GB
     - QueryCumulativeCpuTime: 45s771ms
     - QueryCumulativeNetworkTime: 29.044ms
     - QueryCumulativeOperatorTime: 927.920ms
     - QueryCumulativeScanTime: 88.110ms
     - QueryDeallocatedMemoryUsage: 16.424 GB
     - QueryExecutionWallTime: 955.284ms
     - QueryPeakMemoryUsagePerNode: 153.208 MB
     - QueryPeakScheduleTime: 250.296ms
     - QuerySpillBytes: 0.000 B
     - QuerySumMemoryUsage: 372.824 MB
     - ResultDeliverTime: 0ns
...

I’m cutting because the forum won’t allow me to post longer message

And then same query with 3.3.11 :

Query:
  Summary:
     - Query ID: 9a7bd2ce-1351-11f0-9711-9e8a2311af49
     - Start Time: 2025-04-07 01:42:45
     - End Time: 2025-04-07 01:42:52
     - Total: 6s868ms
     - Query Type: Query
     - Query State: Finished
     - StarRocks Version: 3.3.11-bc77e6b
     - User: root
     - Default Db: project_metrics
     - Sql Statement: /* ApplicationName=DBeaver 25.0.1 - SQLEditor <fsn3.sql> */ select date_trunc('day', time) as time, cluster, count(*) as count
from N_events group by 1, 2 order by 1 desc, 2
     - Variables: parallel_fragment_exec_instance_num=1,max_parallel_scan_instance_num=-1,pipeline_dop=0,enable_adaptive_sink_dop=true,enable_runtime_adaptive_dop=false,runtime_profile_report_interval=10,resource_group=default_wg
     - NonDefaultSessionVariables: {"sql_mode_v2":{"defaultValue":32,"actualValue":2097184},"sql_select_limit":{"defaultValue":9223372036854775807,"actualValue":200},"character_set_results":{"defaultValue":"utf8","actualValue":"NULL"},"query_timeout":{"defaultValue":300,"actualValue":3500},"enable_adaptive_sink_dop":{"defaultValue":false,"actualValue":true},"enable_profile":{"defaultValue":false,"actualValue":true}}
     - Collect Profile Time: 3ms
     - IsProfileAsync: true
  Planner:
     - -- Parser[1] 0
     - -- Total[1] 770ms
     -     -- Analyzer[1] 0
     -         -- Lock[1] 0
     -         -- AnalyzeDatabase[1] 0
     -         -- AnalyzeTemporaryTable[1] 0
     -         -- AnalyzeTable[1] 0
     -     -- Transformer[1] 0
     -     -- Optimizer[1] 191ms
     -         -- MVPreprocess[1] 0
     -         -- MVTextRewrite[1] 0
     -         -- RuleBaseOptimize[1] 150ms
     -         -- CostBaseOptimize[1] 36ms
     -         -- PhysicalRewrite[1] 2ms
     -         -- PlanValidate[1] 0
     -             -- InputDependenciesChecker[1] 0
     -             -- TypeChecker[1] 0
     -             -- CTEUniqueChecker[1] 0
     -             -- ColumnReuseChecker[1] 0
     -     -- ExecPlanBuild[1] 577ms
     - -- Pending[1] 0
     - -- Prepare[1] 14ms
     - -- Deploy[1] 199ms
     -     -- DeployLockInternalTime[1] 199ms
     -         -- DeploySerializeConcurrencyTime[3] 54ms
     -         -- DeployStageByStageTime[9] 0
     -         -- DeployWaitTime[9] 143ms
     -             -- DeployAsyncSendTime[7] 0
     - DeployDataSize: 6946656
    Reason:
  Execution:
     - Topology: {"rootId":6,"nodes":[{"id":6,"name":"MERGE_EXCHANGE","properties":{"sinkIds":[],"displayMem":true},"children":[5]},{"id":5,"name":"TOP_N","properties":{"sinkIds":[6],"displayMem":true},"children":[4]},{"id":4,"name":"AGGREGATION","properties":{"displayMem":true},"children":[3]},{"id":3,"name":"EXCHANGE","properties":{"displayMem":true},"children":[2]},{"id":2,"name":"AGGREGATION","properties":{"sinkIds":[3],"displayMem":true},"children":[1]},{"id":1,"name":"PROJECT","properties":{"displayMem":false},"children":[0]},{"id":0,"name":"OLAP_SCAN","properties":{"displayMem":false},"children":[]}]}
     - FrontendProfileMergeTime: 8.382ms
     - QueryAllocatedMemoryUsage: 19.208 GB
     - QueryCumulativeCpuTime: 20s793ms
     - QueryCumulativeNetworkTime: 34.018ms
     - QueryCumulativeOperatorTime: 5s779ms
     - QueryCumulativeScanTime: 5s465ms
     - QueryDeallocatedMemoryUsage: 19.174 GB
     - QueryExecutionWallTime: 6s67ms
     - QueryPeakMemoryUsagePerNode: 63.181 MB
     - QueryPeakScheduleTime: 102.874ms
     - QuerySpillBytes: 0.000 B
     - QuerySumMemoryUsage: 166.695 MB
     - ResultDeliverTime: 0ns
...

With 3.3.11 it took more than 3 times longer.

When testing, I run the query a few times first to make sure all caches are warm. I’m picking the best result in both case. This result is also reproduced across all kind of queries on different tables, it might vary between 1.2 to 4 times slower.