Background
A detailed table was created to log API call details, and a materialized view was created based on the detailed table. The materialized view aggregates metrics like failed calls and average latency at the api + hour granularity. The detailed table contains a column event_time, which is partitioned by day. The materialized view is aggregated at the event_time hour level.
The table and materialized view are configured as follows:
Detailed Table Partition Configuration:
“dynamic_partition.time_unit” = “DAY”, – Partitioned by day
“dynamic_partition.start” = “-2”, – Retain data for the last 2 days
Materialized View TTL Setting:
“partition_ttl” = “30 DAY”,
Issue
The retention period of the materialized view aligns with that of the detailed table (2 days), rather than the configured 30-day TTL. The desired outcome is for the materialized view to retain data for a longer period (30 days) without being constrained by the base table’s retention period.
Simplified Table and Materialized View Models
Detailed Table (table_A):
CREATE TABLE table_A (
api_id bigint(20) NULL COMMENT “API ID”,
event_time datetime NULL COMMENT “Event Time”,
trace_id varchar(65533) NULL COMMENT “Trace ID”,
latency bigint(20) NULL COMMENT “”
) ENGINE=OLAP
DUPLICATE KEY(api_id, event_time)
PARTITION BY RANGE (event_time) (
PARTITION p20250110 VALUES LESS THAN (“2025-01-10”)
)
DISTRIBUTED BY HASH (trace_id) – Default distribution strategy by trace_id
PROPERTIES (
“replication_num” = “2”, – Set replication factor to 2
“dynamic_partition.enable” = “true”, – Enable dynamic partitioning
“dynamic_partition.time_unit” = “DAY”, – Partitioned by day
“dynamic_partition.start” = “-2”, – Retain data for the last 2 days
“dynamic_partition.end” = “1”, – Current day
“dynamic_partition.prefix” = “p” – Partition prefix
);
Materialized View (view_A):
CREATE MATERIALIZED VIEW view_A (api_id, event_hour, total_count, avg_latency)
PARTITION BY (event_hour)
DISTRIBUTED BY RANDOM
REFRESH ASYNC START(“2025-01-01 00:00:00”) EVERY(INTERVAL 10 SECOND)
PROPERTIES (
“replicated_storage” = “true”,
“partition_ttl” = “30 DAY”, – Retain aggregated data for 30 days
“replication_num” = “2”
)
AS SELECT
api_id,
date_trunc(‘hour’, event_time) AS event_hour,
count(*) AS total_count,
avg(latency) AS avg_latency
FROM table_A
GROUP BY api_id, event_hour;
Environment Details
• Cluster type: Compute and storage separation
• StarRocks version: 3.2.9
• Cluster size: FE (4 cores, 8 GB) x 1 + BE (4 cores, 8 GB) x 3
• Contact: (gaogeng581@gmail.com)