StarRocks performs slower than Spark when running a groupBy query

Hello, I’m testing StarRocks as a potential replacement for Spark.

At first glance, StarRocks’ performance has been impressive. However, I encountered a specific query where StarRocks performs significantly slower.

Can someone point me out how to improve?

1) query pattern and etc

  • query pattern
    SELECT {dimension}, COUNT(*)
    FROM iceber.db.tab1
    GROUP BY {dimension}
    
  • Iceberg catalog with parquet file
  • tab1 has 1,413,914,103 (1.4B) rows
  • all fields are of type STRING
  • StarRocks version: 3.3.8-e3816ec

2) test result

case type dimension num output rows Spark (sec) StarRocks (sec)
case 1 high cardinality c,u 121,374,662 (1.2B) rows 44 36
case 2 low cardinality ec,f,g,a,d1,d2,...,d10 11,563 rows 28 12
case 3 high + low cardinality c,u,ec,f,g,a,d1,d2,...,d10 328,798,206 (3.2B) rows 111 178

3) problem

StarRocks outperforms Spark when grouping by either high-cardinality fields or low-cardinality fields individually (case 1 and case 2).

However, when grouping by a combination of “high + low cardinality” fields in a single query (case 3), StarRocks shows poor performance.

I don’t understand why this happen

4) question

Is there a way to optimize the query for case 3 to improve performance?

5) query profiles

  • case 1
    |     TotalTime: 35s890ms                                                                                                                           |
    |         ExecutionTime: 35s835ms [Scan: 1s472ms (4.11%), Network: 38.923ms (0.11%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 10s0ms (27.91%)] |
    |         CollectProfileTime: 1ms                                                                                                                   |
    |         FrontendProfileMergeTime: 17.345ms                                                                                                        |
    |     QueryPeakMemoryUsage: ?, QueryAllocatedMemoryUsage: 1.491 TB                                                                                  |
    |     Top Most Time-consuming Nodes:                                                                                                                |
    |         1. AGGREGATION (id=4) [finalize, merge]: 24s688ms (47.23%)                                                                           |
    |         2. AGGREGATION (id=2) [serialize, update]: 19s136ms (36.61%)                                                                         |
    |         3. EXCHANGE (id=3) [SHUFFLE]: 5s730ms (10.96%)                                                                                            |
    |         4. ICEBERG_SCAN (id=0) : 2s640ms (5.05%)                                                                                                  |
    |         5. PROJECT (id=1) : 60.589ms (0.12%)                                                                                                      |
    |         6. PROJECT (id=5) : 5.855ms (0.01%)                                                                                                       |
    |         7. AGGREGATION (id=6) [serialize, update]: 3.469ms (0.01%)                                                                                |
    |         8. EXCHANGE (id=7) [GATHER]: 1.102ms (0.00%)                                                                                              |
    |         9. RESULT_SINK: 113.661us (0.00%)                                                                                                         |
    |         10. AGGREGATION (id=8) [finalize, merge]: 82.216us (0.00%)                                                                                |
    |     Top Most Memory-consuming Nodes:                                                                                                              |
    |     NonDefaultVariables:                                                                                                                          |
    |         connector_sink_compression_codec: uncompressed -> snappy                                                                                  |
    |         enable_adaptive_sink_dop: false -> true                                                                                                   |
    |         enable_profile: false -> true                                                                                                             |
    |         enable_spill: false -> true                                                                                                               |
    |         query_mem_limit: 0 -> 6442450944                                                                                                          |
    |         query_timeout: 300 -> 3600                                                                                                                |
    |         spill_mode: auto -> force                                                                                                                 |
    
  • case 2
    |     TotalTime: 12s317ms                                                                                                                            |
    |         ExecutionTime: 12s249ms [Scan: 1s678ms (13.70%), Network: 4.231ms (0.03%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 9s998ms (81.62%)] |
    |         CollectProfileTime: 2ms                                                                                                                    |
    |         FrontendProfileMergeTime: 19.762ms                                                                                                         |
    |     QueryPeakMemoryUsage: ?, QueryAllocatedMemoryUsage: 398.331 GB                                                                                 |
    |     Top Most Time-consuming Nodes:                                                                                                                 |
    |         1. AGGREGATION (id=2) [serialize, update]: 8s884ms (74.05%)                                                                           |
    |         2. ICEBERG_SCAN (id=0) : 2s947ms (24.56%)                                                                               |
    |         3. PROJECT (id=1) : 119.493ms (1.00%)                                                                                                      |
    |         4. AGGREGATION (id=4) [finalize, merge]: 33.417ms (0.28%)                                                                                  |
    |         5. EXCHANGE (id=3) [SHUFFLE]: 10.724ms (0.09%)                                                                                             |
    |         6. EXCHANGE (id=7) [GATHER]: 802.820us (0.01%)                                                                                             |
    |         7. RESULT_SINK: 85.428us (0.00%)                                                                                                           |
    |         8. AGGREGATION (id=6) [serialize, update]: 63.621us (0.00%)                                                                                |
    |         9. AGGREGATION (id=8) [finalize, merge]: 50.800us (0.00%)                                                                                  |
    |         10. PROJECT (id=5) : 16.366us (0.00%)                                                                                                      |
    |     Top Most Memory-consuming Nodes:                                                                                                               |
    |     NonDefaultVariables:                                                                                                                           |
    |         connector_sink_compression_codec: uncompressed -> snappy                                                                                   |
    |         enable_adaptive_sink_dop: false -> true                                                                                                    |
    |         enable_profile: false -> true                                                                                                              |
    |         enable_spill: false -> true                                                                                                                |
    |         query_mem_limit: 0 -> 6442450944                                                                                                           |
    |         query_timeout: 300 -> 3600                                                                                                                 |
    
  • case 3
    |     TotalTime: 2m58s                                                                                                                            |
    |         ExecutionTime: 2m58s [Scan: 3s235ms (1.82%), Network: 9.028ms (0.01%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 234.810ms (0.13%)] |
    |         CollectProfileTime: 0                                                                                                                   |
    |         FrontendProfileMergeTime: 12.752ms                                                                                                      |
    |     QueryPeakMemoryUsage: ?, QueryAllocatedMemoryUsage: 5.194 TB                                                                                |
    |     Top Most Time-consuming Nodes:                                                                                                              |
    |         1. AGGREGATION (id=4) [finalize, merge]: 2m42s (69.59%)                                                                            |
    |         2. AGGREGATION (id=2) [serialize, update]: 44s999ms (19.23%)                                                         |
    |         3. EXCHANGE (id=3) [SHUFFLE]: 20s640ms (8.82%)                                                                                          |
    |         4. ICEBERG_SCAN (id=0) : 5s400ms (2.31%)                                                                                                |
    |         5. PROJECT (id=1) : 227.832ms (0.10%)                                                                                                   |
    |         6. PROJECT (id=5) : 31.266ms (0.01%)                                                                                                    |
    |         7. AGGREGATION (id=6) [serialize, update]: 14.065ms (0.01%)                                                                             |
    |         8. EXCHANGE (id=7) [GATHER]: 1.577ms (0.00%)                                                                                            |
    |         9. RESULT_SINK: 100.160us (0.00%)                                                                                                       |
    |         10. AGGREGATION (id=8) [finalize, merge]: 74.986us (0.00%)                                                                              |
    |     Top Most Memory-consuming Nodes:                                                                                                            |
    |     NonDefaultVariables:                                                                                                                        |
    |         connector_sink_compression_codec: uncompressed -> snappy                                                                                |
    |         enable_adaptive_sink_dop: false -> true                                                                                                 |
    |         enable_profile: false -> true                                                                                                           |
    |         enable_spill: false -> true                                                                                                             |
    |         query_mem_limit: 0 -> 6442450944                                                                                                        |
    |         query_timeout: 300 -> 3600                                                                                                              |
    |         spill_mode: auto -> force                                                                                                               |
    

Congrat. StarRocks 3.4 is out.

but showed same speed on my query.