How can I optimize IN subquery base on bitmap index

Hello, I’m new to StarRocks.

It was great during the initial tests with my data.

I have a question about sub-queries on bitmap indexes.

table structure

tab1 has the following schema:

CREATE TABLE `tab1` (
  ...
  `c_bitmap` varchar(65533) NULL COMMENT "",
  `c_bloom_filter` varchar(65533) NULL COMMENT "",
  `c_wo_index` varchar(65533) NULL COMMENT "",
  ...
  INDEX idx (`c_bitmap`) USING BITMAP COMMENT ''
) ENGINE=OLAP
DUPLICATE KEY(`ymd`)
DISTRIBUTED BY RANDOM
ORDER BY(some random field)
PROPERTIES (
  "bloom_filter_columns" = "c_bloom_filter",
  "replication_num" = "1"
);
  • c_bitmap, c_bloom_filter, and c_wo_index have the same value. The only difference is whether they are indexed or not.
  • tab1 has 630M rows.
  • Currently, there is only 1 BE node with 8 cores.
  • Shared-nothing architecture, data is saved on an SSD disk.

IN sub-query

I’m testing the following query to see if the bitmap index is used in the sub-query.

SELECT COUNT(*)
FROM tab1
WHERE <field-name> IN (SELECT c FROM tab2 LIMIT <limit>);

query speed

field name LIMIT 100 LIMIT 200 LIMIT 1k LIMIT 10k LIMIT 100k LIMIT 500k LIMIT 2M
c_bitmap 0.1 sec 0.5 sec 3.0 sec 3.6 sec 4.1 sec 5.9 sec 19 sec
c_bloom_filter 2.4 sec 2.5 sec 2.8 sec 3.7 sec 4.1 sec 6 sec 19 sec
c_wo_index 3.2 sec 3.5 sec 2.6 sec 3.6 sec 4.1 sec 6.0 sec 19 sec
num of COUNT(*) 29k 35k 206k 1.7M 17M 84M 340M
  • WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 100): took only 0.1 sec. It seems that the bitmap index was used to run this query.
  • However, WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 200) or above took longer than I expected. It seems that the bitmap index was not used.
  • INNER JOIN showed the same performance.

I compared EXPLAIN ANALYZE but I don’t see any differences between WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 100) and WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT 10k) in terms of execution plan.

my question

How can I speed up WHERE c_bitmap IN (SELECT c FROM tab2 LIMIT more than 100)?

I checked https://docs.starrocks.io/docs/sql-reference/System_variable/ to see if there is a threshold related to the optimizer but couldn’t find any.

Thanks.

Thanks.

version info

version info
Version: 3.3.5
Git: 6d81f75
Build Info: StarRocks@localhost (CentOS Linux 7 (Core))
Build Time: 2024-10-23 10:40:33

after reading https://docs.starrocks.io/docs/table_design/indexes/Bitmap_index/#verify-whether-the-bitmap-index-accelerates-queries I found that bitmap_max_filter_ratio matters

query profile (default settings)

  • “LIMIT 100”
    - BitmapIndexFilterRows: 630.145M (630145193)
     - __MAX_OF_BitmapIndexFilterRows: 47.493M (47493021)
     - __MIN_OF_BitmapIndexFilterRows: 20.323M (20323451)
    
  • “LIMIT 10k”
    - BitmapIndexFilterRows: 0 <= not bitmap indexed scan
    

changing bitmap_max_filter_ratio

bitmap_max_filter_ratio=1000 is added to be/conf/be.conf (default: 1)

and BE restated

bitmap_max_filter_ratio LIMIT 100 LIMIT 10k
1 (default) 0.1 sec 3.6 sec
1000 0.1 sec 2 min 52.25 sec
num of COUNT(*) 29k 1.7M

query profile of LIMIT 10k

- SegmentInit: 1m22s
 - __MAX_OF_SegmentInit: 2m41s
 - __MIN_OF_SegmentInit: 2.517ms
 - BitmapIndexFilter: 1m22s
   - __MAX_OF_BitmapIndexFilter: 2m41s
   - __MIN_OF_BitmapIndexFilter: 2.305ms
 - BitmapIndexFilterRows: 87.024K (87024)
   - __MAX_OF_BitmapIndexFilterRows: 2.090K (2090)
   - __MIN_OF_BitmapIndexFilterRows: 0

I don’t know why this post and comment were marked as a spam.

2nd comment was not marked.

The only difference is that if it has a external url to https://docs_starrocks_io/.