Common Data (read and write) Patterns in StarRocks

atwong · January 26, 2024, 12:03am

I get this question a lot from various people about the best way to read and write data into StarRocks.

TL-DR; The gist is that as of right now, if you care about open table format it’s more performant to have some other application write Iceberg or Hudi or Delta Lake and then use StarRocks as a read database query engine (which most of the value of an OLAP is it’s read query performance). If you don’t care about open table format, then use the StarRocks native internal format.

Scenario A: When using the default StarRocks format storage for storing your data

This the default table format when you install StarRocks.

Use Case	Technique
INSERT/UPSERT individual record	mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended)
INSERT/UPSERT bulk record	Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools
SELECT	mysql SQL statements
CREATE	mysql SQL statements
DELETE	mysql SQL statements

Note

If you need to import data for a one-off or POC from another database or from an open table format (data lake), you can use the external catalog feature to hook up a source and then CTAS, INSERT INTO SELECT, or INSERT INTO VALUES into a table within StarRocks.

Scenario B: When using Apache Iceberg for storing your data

We support Apache Iceberg via StarRocks’ External Catalog feature. Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg
INSERT/UPSERT bulk records	Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg
SELECT	mysql SQL statements
CREATE	mysql SQL statements (limited), Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg
DELETE	mysql SQL statements, Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg

Scenario C: When using Apache Hudi for storing your data

We support Apache Hudi via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Apache Hudi. So when using Hudi, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool thatcan write Apache Hudi
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi

Scenario D: When using Delta Lake for storing your data

We support Delta Lake via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Delta Lake. So when using Delta Lake, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.

Use Case	Technique
INSERT/UPSERT individual record	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
INSERT/UPSERT bulk records	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake

Scenario E: When using Apache Hive for storing your data

We support Apache Hive via StarRocks’ External Catalog feature.

Use Case	Technique
INSERT/UPSERT individual record	mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive
INSERT/UPSERT bulk records	Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive
SELECT	mysql SQL statements
CREATE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive
DELETE	Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive

Topic		Replies	Views
StarRocks \| OneTable Open Table Formats (Iceberg, Hudi, Hive, Delta)	0	108	February 3, 2024
StarRocks Internal Table Format advantages over Apache Iceberg Open Table Formats (Iceberg, Hudi, Hive, Delta)	1	662	October 21, 2024
StarRocks listed on Official Delta Lake Integration page Open Table Formats (Iceberg, Hudi, Hive, Delta)	0	85	January 31, 2024
StarRocks / CelerData as a vendor that supports Iceberg Tables Open Table Formats (Iceberg, Hudi, Hive, Delta)	0	105	January 31, 2024
What are my StarRocks Data Loading options Working with Data	1	405	January 5, 2024