Here’s an example of getting ray.io working with StarRocks.
Code
atwong@Alberts-MBP-3 sandbox % cat script.py
import mysql.connector
import ray
def create_connection():
return mysql.connector.connect(
user="root",
password="",
host="localhost",
port=9030,
connection_timeout=30,
database="demo",
)
ds = ray.data.read_sql("SELECT * FROM sr_member", create_connection)
ds.show();
Results
atwong@Alberts-MBP-3 sandbox % python script.py
2023-08-09 15:47:32,582 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2023-08-09 15:47:33,316 INFO read_api.py:374 -- To satisfy the requested parallelism of 200, each read task output will be split into 200 smaller blocks.
2023-08-09 15:47:33,321 INFO dataset.py:2180 -- Tip: Use `take_batch()` instead of `take() / show()` to return records in pandas or numpy batch format.
2023-08-09 15:47:33,322 INFO streaming_executor.py:92 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[ReadSQL->SplitBlocks(200)]
2023-08-09 15:47:33,322 INFO streaming_executor.py:93 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)
2023-08-09 15:47:33,322 INFO streaming_executor.py:95 -- Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`
[dataset]: Run `pip install tqdm` to enable progress reporting.
{'sr_id': 1, 'name': 'tom', 'city_code': 100000, 'reg_date': datetime.date(2022, 3, 13), 'verified': 1}
{'sr_id': 6, 'name': 'mohammed', 'city_code': 300000, 'reg_date': datetime.date(2022, 3, 17), 'verified': 1}
{'sr_id': 4, 'name': 'ronaldo', 'city_code': 100000, 'reg_date': datetime.date(2022, 3, 15), 'verified': 0}
{'sr_id': 2, 'name': 'johndoe', 'city_code': 210000, 'reg_date': datetime.date(2022, 3, 14), 'verified': 0}
{'sr_id': 3, 'name': 'maruko', 'city_code': 200000, 'reg_date': datetime.date(2022, 3, 14), 'verified': 1}
{'sr_id': 5, 'name': 'pavlov', 'city_code': 210000, 'reg_date': datetime.date(2022, 3, 16), 'verified': 0}