Hello, I’m new to StarRocks.
It was great during the initial tests with my data.
I have a question about sub-queries on bitmap indexes.
table structure
tab1
has the following schema:
CREATE TABLE `tab1` (
...
`c_bitmap` varchar(65533) NULL COMMENT "",
`c_bloom_filter` varchar(65533) NULL COMMENT "",
`c_wo_index` varchar(65533) NULL COMMENT "",
...
INDEX idx (`c_bitmap`) USING BITMAP COMMENT ''
) ENGINE=OLAP
DUPLICATE KEY(`ymd`)
DISTRIBUTED BY RANDOM
ORDER BY(some random field)
PROPERTIES (
"bloom_filter_columns" = "c_bloom_filter",
"replication_num" = "1"
);
c_bitmap
,c_bloom_filter
, andc_wo_index
have the same value. The only difference is whether they are indexed or not.tab1
has 630M rows.- Currently, there is only 1 BE node with 8 cores.
- Shared-nothing architecture, data is saved on an SSD disk.
IN sub-query
I’m testing the following query to see if the bitmap index is used in the sub-query.
SELECT COUNT(*)
FROM tab1
WHERE <field-name> IN (SELECT c FROM tab2 LIMIT <limit>);
query speed
field name | LIMIT 100 |
LIMIT 200 |
LIMIT 1k |
LIMIT 10k |
LIMIT 100k |
LIMIT 500k |
LIMIT 2M |
---|---|---|---|---|---|---|---|
c_bitmap |
0.1 sec | 0.5 sec | 3.0 sec | 3.6 sec | 4.1 sec | 5.9 sec | 19 sec |
c_bloom_filter |
2.4 sec | 2.5 sec | 2.8 sec | 3.7 sec | 4.1 sec | 6 sec | 19 sec |
c_wo_index |
3.2 sec | 3.5 sec | 2.6 sec | 3.6 sec | 4.1 sec | 6.0 sec | 19 sec |
num of COUNT(*) |
29k | 35k | 206k | 1.7M | 17M | 84M | 340M |
WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 100)
: took only 0.1 sec. It seems that the bitmap index was used to run this query.- However,
WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 200)
or above took longer than I expected. It seems that the bitmap index was not used. INNER JOIN
showed the same performance.
I compared EXPLAIN ANALYZE
but I don’t see any differences between WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 100)
and WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 10k)
in terms of execution plan.
my question
How can I speed up WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT more than 100)
?
I checked https://docs.starrocks.io/docs/sql-reference/System_variable/
to see if there is a threshold related to the optimizer but couldn’t find any.
Thanks.
Thanks.