What are my StarRocks Data Loading options
Use Case
Below are a list of use case for loading data into StarRocks. Just note that this is general advice. Your environment and context may dictate a different solution.
I have my data in another SQL capable database, what data loading tool should I use?
Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables or export the data and use stream load. We are working on more sink connectors that will use stream load under the covers.
I have billions of rows in another SQL capable database, what data loading tool should I use?
Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables.
I have source data in Kafka, what data loading tool should I use?
Routine Load which is a Kafka specific data loading tool or StarRocks Kafka Connector.
I have my data hosted in S3, what data loading tool should I use?
Broker Load. We are working on more sink connectors that will use stream load under the covers.
I have my data on my local machine, what data loading tool should I use?
Stream Load. We are working on more sink connectors that will use stream load under the covers.
I have my data in Spark, what data load tool should I use?
Spark Load which is a spark specific data loading tool
I only care about performance, what data loading tools should I use?
Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables.
Data Loading Tooling Detail
Below are a list of tools that are available from StarRocks and some tools that we have tested to work with StarRocks.
Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables
StarRocks | StarRocks Use case: One time load or batch Performance: Fastest load performance
Stream Load
Pushing data into StarRocks via HTTP/HTTPS endpoint Supports: JSON, CSV Use case: One time load or batch
Broker Load
Pushing data into StarRocks via a database SQL job Alternative: Use the Files SQL function. StarRocks | StarRocks Supports: Parquet, CSV, ORC Across: HDFS, AWS S3, Google GCS, Microsoft Azure Storage, compatible storage systems Use case: One time load or batch
Routine Load
Pushing data into StarRocks via Kafka Supports: CSV, JSON, AVRO Use case: Real time data streaming Performance: Fastest load performance for streaming
StarRocks Kafka Connector
Pushing data into StarRocks via Kafka Supports: CSV, JSON, AVRO Use case: Real time data streaming Performance: Fastest load performance for streaming
Spark Load
StarRocks | StarRocks Performance: Very efficient if you currently have a Spark environment. Also, this uses spark’s resources to perform load, does not affect query as much.
Apache Flink
StarRocks | StarRocks Use case: Real time data streaming, from another database Performance: Fastest load performance for streaming Note: Uses Stream Load underneath
Airbyte
Use the StarRocks Airbtye Connector. Currently it supports StarRocks as a destination. Use case: One time load or batch Note: Uses Stream Load underneath