What are my StarRocks Data Loading options

atwong · January 5, 2024, 12:02am

atwong · January 5, 2024, 12:03am

Use Case

Below are a list of use case for loading data into StarRocks. Just note that this is general advice. Your environment and context may dictate a different solution.

I have my data in another SQL capable database, what data loading tool should I use?

Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables or export the data and use stream load. We are working on more sink connectors that will use stream load under the covers.

I have billions of rows in another SQL capable database, what data loading tool should I use?

Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables.

I have source data in Kafka, what data loading tool should I use?

Routine Load which is a Kafka specific data loading tool or StarRocks Kafka Connector.

I have my data hosted in S3, what data loading tool should I use?

Broker Load. We are working on more sink connectors that will use stream load under the covers.

I have my data on my local machine, what data loading tool should I use?

Stream Load. We are working on more sink connectors that will use stream load under the covers.

I have my data in Spark, what data load tool should I use?

Spark Load which is a spark specific data loading tool

I only care about performance, what data loading tools should I use?

Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables.

Data Loading Tooling Detail

Below are a list of tools that are available from StarRocks and some tools that we have tested to work with StarRocks.

Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables

StarRocks | StarRocks Use case: One time load or batch Performance: Fastest load performance

Stream Load

Pushing data into StarRocks via HTTP/HTTPS endpoint Supports: JSON, CSV Use case: One time load or batch

Broker Load

Pushing data into StarRocks via a database SQL job Alternative: Use the Files SQL function. StarRocks | StarRocks Supports: Parquet, CSV, ORC Across: HDFS, AWS S3, Google GCS, Microsoft Azure Storage, compatible storage systems Use case: One time load or batch

Routine Load

Pushing data into StarRocks via Kafka Supports: CSV, JSON, AVRO Use case: Real time data streaming Performance: Fastest load performance for streaming

StarRocks Kafka Connector

Pushing data into StarRocks via Kafka Supports: CSV, JSON, AVRO Use case: Real time data streaming Performance: Fastest load performance for streaming

Spark Load

StarRocks | StarRocks Performance: Very efficient if you currently have a Spark environment. Also, this uses spark’s resources to perform load, does not affect query as much.

Apache Flink

StarRocks | StarRocks Use case: Real time data streaming, from another database Performance: Fastest load performance for streaming Note: Uses Stream Load underneath

Airbyte

Use the StarRocks Airbtye Connector. Currently it supports StarRocks as a destination. Use case: One time load or batch Note: Uses Stream Load underneath

Topic		Replies	Views
Picking the optimal data loading methodology Data Loading	0	241	January 26, 2024
What if I want a on demand or scheduled job to pull data from Kafka and insert it into StarRocks? Data Loading Tools & Integrations	1	90	January 4, 2024
Can we load data into a single StarRocks table using multiple Kafka topics? Data Loading Tools & Integrations	1	120	January 5, 2024
What are my sink options with Apache Kafka Data Loading Tools & Integrations	1	140	January 4, 2024
Information on Apache Kafka integration Data Loading Tools & Integrations	0	97	January 25, 2024

What are my StarRocks Data Loading options

Use Case

I have my data in another SQL capable database, what data loading tool should I use?

I have billions of rows in another SQL capable database, what data loading tool should I use?

I have source data in Kafka, what data loading tool should I use?

I have my data hosted in S3, what data loading tool should I use?

I have my data on my local machine, what data loading tool should I use?

I have my data in Spark, what data load tool should I use?

I only care about performance, what data loading tools should I use?

Data Loading Tooling Detail

Connect to an external catalog/table and then perform INSERT INTO SELECT to local tables

Stream Load

Broker Load

Routine Load

StarRocks Kafka Connector

Spark Load

Apache Flink

Airbyte

Related topics