I get this question a lot from various people about the best way to read and write data into StarRocks.
TL-DR; The gist is that as of right now, if you care about open table format it’s more performant to have some other application write Iceberg or Hudi or Delta Lake and then use StarRocks as a read database query engine (which most of the value of an OLAP is it’s read query performance). If you don’t care about open table format, then use the StarRocks native internal format.
Scenario A: When using the default StarRocks format storage for storing your data
This the default table format when you install StarRocks.
Use Case | Technique |
---|---|
INSERT/UPSERT individual record | mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended) |
INSERT/UPSERT bulk record | Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools |
SELECT | mysql SQL statements |
CREATE | mysql SQL statements |
DELETE | mysql SQL statements |
Note
If you need to import data for a one-off or POC from another database or from an open table format (data lake), you can use the external catalog feature to hook up a source and then CTAS, INSERT INTO SELECT, or INSERT INTO VALUES into a table within StarRocks.
Scenario B: When using Apache Iceberg for storing your data
We support Apache Iceberg via StarRocks’ External Catalog feature. Although you can insert records using the mysql interface, it was not designed to insert/upsert in bulk or be fast for individual records. Generally speaking, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Use Case | Technique |
---|---|
INSERT/UPSERT individual record | mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg |
INSERT/UPSERT bulk records | Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Iceberg |
SELECT | mysql SQL statements |
CREATE | mysql SQL statements (limited), Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg |
DELETE | mysql SQL statements, Apache Spark or Apache Spark SQL, other tool that can write Apache Iceberg |
Scenario C: When using Apache Hudi for storing your data
We support Apache Hudi via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Apache Hudi. So when using Hudi, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Use Case | Technique |
---|---|
INSERT/UPSERT individual record | Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi |
INSERT/UPSERT bulk records | Apache Spark or Apache Spark SQL (recommended), other tool thatcan write Apache Hudi |
SELECT | mysql SQL statements |
CREATE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi |
DELETE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hudi |
Scenario D: When using Delta Lake for storing your data
We support Delta Lake via StarRocks’ External Catalog feature. As of Jan 2024, StarRocks doesn’t support write to Delta Lake. So when using Delta Lake, the suggested pattern is to write data using Apache Spark or other tool and then read the data using StarRocks via sql.
Use Case | Technique |
---|---|
INSERT/UPSERT individual record | Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake |
INSERT/UPSERT bulk records | Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake |
SELECT | mysql SQL statements |
CREATE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake |
DELETE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Delta Lake |
Scenario E: When using Apache Hive for storing your data
We support Apache Hive via StarRocks’ External Catalog feature.
Use Case | Technique |
---|---|
INSERT/UPSERT individual record | mysql SQL statements (recommended); stream load or one of the StarRocks data loading tools (recommended); Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive |
INSERT/UPSERT bulk records | Insert methods that support micro batching like sql bulk insert, stream load or one of the StarRocks data loading tools; Apache Spark or Apache Spark SQL (recommended); other tool that can write Apache Hive |
SELECT | mysql SQL statements |
CREATE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive |
DELETE | Apache Spark or Apache Spark SQL (recommended), other tool that can write Apache Hive |