[Guide] Developing Custom Connectors for StarRocks

atwong · March 8, 2024, 6:38pm

Building connectors for StarRocks involves creating software components that allow other applications to interact with the StarRocks database. While there isn’t a one-size-fits-all approach, here’s a general guide to get you started:

1. Choose a Development Approach:

There are two main ways to build StarRocks connectors:

From Scratch: This offers maximum control but requires a deep understanding of StarRocks’ communication protocols and APIs. You’ll need to handle data serialization, query execution, and error handling.
Leveraging Existing Connectors: StarRocks already has a Kafka connector available GitHub - StarRocks/starrocks-connector-for-kafka. You can use this as a reference or even extend its functionalities for your specific needs.

2. Understand StarRocks Communication:

StarRocks uses a Frontend (FE) and Backend (BE) architecture. The FE acts as the entry point for client applications and communicates with the BE nodes that store the data. Familiarize yourself with StarRocks’ communication protocols like SQL over MySQL wire protocol or Restful APIs to establish communication between your connector and StarRocks.

3. Design the Connector Functionalities:

Define the functionalities your connector will offer. Common functionalities include:

Establishing connections: Define how applications will connect to StarRocks using your connector.
Data manipulation: Allow reading, writing, and updating data in StarRocks tables.
Querying data: Enable applications to send SQL queries to StarRocks and retrieve results.
Authentication: Implement mechanisms for secure connections using usernames and passwords.

Here is a suggested requirements list:

Be able to query
Be able to create table (CREATE TABLE | StarRocks)
- Support primary key table with defaults
- Support duplicate key table with defaults
- Support aggregate key table with defaults
Be able to insert data
- Support StarRocks stream load
- Support SQL insert values
No requirements to be performant (eg your connector doesn’t need to be faster than existing SR tools, just integrate / pass through to existing StarRocks API and tools)

4. Development Tools and Libraries:

The choice of tools depends on your chosen approach. If building from scratch, consider using libraries like:

HTTP/HTTPS: For database CRUD connectivity.
JDBC/ODBC Drivers: For database CRUD connectivity
HTTP/HTTPS: For database bulk data loading connectivity

In addition, here are some issues you might hit with building your connector with StarRocks

Sling Data ran into these issues

Issues with lower and upper case tables name Use the low case db/table name for StarRocks target, but the generated CREATE TABLE STATEMENT use the uppper case databasename/table name. · Issue #193 · slingdata-io/sling-cli · GitHub
Column case sensitivity when load snowflake_to_starrocks, all rows are filled with null value. · Issue #180 · slingdata-io/sling-cli · GitHub
varchar(x) in StarRocks is in byte, not the usual char.length Local Parquet to StarRocks error · Issue #176 · slingdata-io/sling-cli · GitHub
Random timout when using stream load. Starrocks: random timeouts when using stream load feature · Issue #159 · slingdata-io/sling-cli · GitHub
Adding in stream load URI support Starrocks: How do we set the fe_url in ENV variable or Python wrapper? · Issue #151 · slingdata-io/sling-cli · GitHub
Support primary key and aggregiate key StarRocks. Support primary key tables · Issue #150 · slingdata-io/sling-cli · GitHub
Supporting sort key and distribution hash key Handle Custom Column Keys on Target Table Creation · Issue #149 · slingdata-io/sling-cli · GitHub
Cast from varchar to varbinary postgresql to starrocks issue. cast from VARCHAR to VARBINARY failed · Issue #145 · slingdata-io/sling-cli · GitHub
Enhancing sling to understand starrocks insert overwrite vs insert into Testing mysql and StarRocks. table count (324) != stream count (300024). Records missing. Aborting · Issue #143 · slingdata-io/sling-cli · GitHub
Have Sling reorder table create so that first column is the sort key StarRocks: Have Sling reorder table create so that first column is the sort key · Issue #194 · slingdata-io/sling-cli · GitHub and StarRocks: JSON issue. Key columns must be the first few columns of the schema and the order of the key columns must be consistent with the order of the schema. · Issue #195 · slingdata-io/sling-cli · GitHub
source custom SQL can’t be terminated by ; It seems source custom SQL can't be terminated by ; . · Issue #197 · slingdata-io/sling-cli · GitHub
StarRocks Sink with large parquet file. Client.Timeout exceeded while awaiting headers StarRocks Sink with large parquet file. Client.Timeout exceeded while awaiting headers · Issue #204 · slingdata-io/sling-cli · GitHub
Sling performance testing for StarRocks Sling performance testing for StarRocks · Issue #201 · slingdata-io/sling-cli · GitHub

5. Testing and Deployment:

Implement unit tests to ensure your connector functions as expected.
Develop integration tests to verify how your connector interacts with StarRocks.
Package your connector for easy deployment and distribution

Additional Resources:

StarRocks Kafka Connector: GitHub - StarRocks/starrocks-connector-for-kafka
Consider contributing to existing open-source connectors: GitHub - StarRocks/starrocks-connector-for-kafka

Remember, building connectors can be complex. If you’re new to this area, consider starting by studying the existing Kafka connector as a reference.

Topic		Replies	Views
Apache SeaTunnel StarRocks connector Data Loading Tools & Integrations	0	142	January 26, 2024
What are my StarRocks Data Loading options Working with Data	1	398	January 5, 2024
Sling Data \| StarRocks Data Loading Tools & Integrations	2	226	February 7, 2024
Information on Apache Flink integrations Data Loading Tools & Integrations	0	53	January 25, 2024
Information on Apache Kafka integration Data Loading Tools & Integrations	0	98	January 25, 2024

[Guide] Developing Custom Connectors for StarRocks

Related topics