Common Data (read and write) Patterns in StarRocks

I get this question a lot from various people about the best way to read and write data into StarRocks.

TL-DR; The gist is that as of right now, if you care about open table format it’s more performant to have some other application write Iceberg or Hudi or Delta Lake and then use StarRocks as a read database query engine (which most of the value of an OLAP is it’s read query performance). If you don’t care about open table format, then use the StarRocks native internal format.

Screenshot 2023-11-17 at 8 20 59 AM

Scenario A: When using the default StarRocks format storage for storing your data

This the default table format when you install StarRocks.

Use Case Technique
INSERT/UPSERT individual record mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended)
INSERT/UPSERT bulk record Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools
SELECT mysql SQL statements
CREATE mysql SQL statements
DELETE mysql SQL statements

Note

If you need to import data for a one-off or POC from another database or from an open table format (data lake), you can use the external catalog feature to hook up a source and then CTAS, INSERT INTO SELECT, or INSERT INTO VALUES into a table within StarRocks.

Scenario B: When using Apache Iceberg for storing your data

We support Apache Iceberg via StarRocks’ External Catalog feature. Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case Technique
INSERT/UPSERT individual record mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg
INSERT/UPSERT bulk records Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg
SELECT mysql SQL statements
CREATE mysql SQL statements (limited), Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg
DELETE mysql SQL statements, Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg

Scenario C: When using Apache Hudi for storing your data

We support Apache Hudi via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Apache Hudi. So when using Hudi, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case Technique
INSERT/UPSERT individual record Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
INSERT/UPSERT bulk records Apache Spark or Apache Spark SQL (recommended), other tool thatcan write Apache Hudi
SELECT mysql SQL statements
CREATE Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
DELETE Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi

Scenario D: When using Delta Lake for storing your data

We support Delta Lake via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Delta Lake. So when using Delta Lake, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case Technique
INSERT/UPSERT individual record Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
INSERT/UPSERT bulk records Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
SELECT mysql SQL statements
CREATE Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
DELETE Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake

Scenario E: When using Apache Hive for storing your data

We support Apache Hive via StarRocks’ External Catalog feature.

Use Case Technique
INSERT/UPSERT individual record mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive
INSERT/UPSERT bulk records Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive
SELECT mysql SQL statements
CREATE Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
DELETE Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
1 Like