Loading and Querying Speedtest by Ookla Global Fixed and Mobile Network Performance data (Parquet) with StarRocks

Prerequisites
For this tutorial you need to:

  • Have Docker Desktop or podman container runtime installed.
  • Have a MySQL client.
  • A StarRocks or CelerData database.
  • Create a database, database table and query the data.
  • Visualize this dataset via Apache SuperSet.

Have Docker Desktop or podman container runtime installed

This is out of scope for the tutorial.

Have a MySQL client

This is out of scope for the tutorial.

A StarRocks or CelerData database cluster

This is out of scope for the tutorial.

Create a database, database table and query the data

Create the database.

create database demo;

Create the table based off of the data dictionary.

drop table performance_mobile_tiles;
create table performance_mobile_tiles (
     quadkey VARCHAR(16)     
  , tile STRING                          
  , avg_d_kbps int   
  , avg_u_kbps int                   
  , avg_lat_ms int                   
  , avg_lat_down_ms int         
  , avg_lat_up_ms int              
  , tests int                        
  , devices int            
)
ENGINE=OLAP
DUPLICATE KEY(`quadkey`)
DISTRIBUTED BY HASH(`quadkey`) BUCKETS 9;

Execute the load command to get data from S3 into StarRocks. The ookla data is from a public S3 bucket. Even if it’s a public S3 bucket, you still need to provide your access and secret key.

LOAD LABEL demo.ooklaupload
(
    DATA INFILE("s3a://ookla-open-data/parquet/performance/type=mobile/year=2023/quarter=1/2023-01-01_performance_mobile_tiles.parquet")
    INTO TABLE performance_mobile_tiles
    (quadkey, tile, avg_d_kbps, avg_u_kbps, avg_lat_ms, avg_lat_down_ms, avg_lat_up_ms, tests, devices )
)
WITH BROKER
(
    "aws.s3.access_key" = "xxxx",
    "aws.s3.secret_key" = "yyyy",
    "aws.s3.use_instance_profile" = "false",
    "aws.s3.region" = "us-west-2"
);

See the status of the load. Keep on running the show load command until you see a success or failure.

use demo;
show load;

Finally query the data.

select count(*) from performance_mobile_tiles;
select * from performance_mobile_tiles;

And you will see this as a result.

StarRocks > select count(*) from performance_mobile_tiles;
+----------+
| count(*) |
+----------+
|  3730549 |
+----------+
1 row in set (0.07 sec)

Visualize this dataset via Apache SuperSet

Check out the StarRocks and Apache SuperSet tutorial at #23210 or StarRocks and preset.io tutorial at #24506.