Best practice ingesting data from Kafka

overthetop · September 12, 2024, 7:28am

Hi all
I’m trying to understand what is best practice when it comes to ingesting data into Starrocks from Kafka. I’d prefer to have the same strategy for all table types (if possible). Our data comes from Kafka and I’d like to ingest data into a few different table types - primary key tbl, duplicate key tbl and aggregate tbl. From the docs I can see that at least for the Aggregate tbl, the db is creating new versions for each batch and on read time all these versions are read (unless compaction has occurred).
Is that the same for the other table types?
Does that mean that my ingesting strategy should be in big batches, so that the number of versions is not that big?
What is the best practice to ingest from Kafka? Is there something I should pay attention to?

Topic		Replies	Views
Loading Data From Kafka to Starrocks Slow Infrastructure and Operations	1	209	May 24, 2024
During INSERT/UPDATE operations, I get the following errors frequently - too many tablet versions (>1000) Working with Data	7	717	June 29, 2025
Can we load data into a single StarRocks table using multiple Kafka topics? Data Loading Tools & Integrations	1	120	January 5, 2024
Real time change data capture (CDC) using Apache Kafka and Aiven's JDBC Sink Connector for Apache Kafka® to insert data into StarRocks Data Loading Tools & Integrations	1	514	January 26, 2024
Information on Apache Kafka integration Data Loading Tools & Integrations	0	101	January 25, 2024

Best practice ingesting data from Kafka

Related topics