StarRocks Roadmap 2024

atwong · January 26, 2024, 5:43am

StarRocks Roadmap 2024

opened 11:08AM - 22 Jan 24 UTC

type/feature-request

> Refer to roadmap [2023](https://github.com/StarRocks/starrocks/issues/16445) …[2022](https://github.com/StarRocks/starrocks/issues/1244) # Shared-data & StarOS - Align with all functionalities to shared-nothing - [ ] Sync materialized view - [ ] Generated column - [ ] Partial update with column mode - [ ] Optimize table and manual compaction - Better cache system - [ ] Multi-layer cache - [ ] Global cache - [ ] Cache Auto warmup - [ ] Cache black/whitelist - [ ] Refine evict algorithm - StarOS internal optimization - [ ] Multi-replicas for shard management - [ ] Shard schedule optimization for large scale (more than 10M shards) - [ ] Local storage for StarOS - [ ] Open API for StarRocks table format (sink and source) - [ ] Time Travel - [ ] Backup support # Performance - [ ] Full columnar Json index - [ ] Cost model with primary key and foreign key constrains - [ ] Arm optimization for codecs - [ ] Adaptive DOP and adaptive query engine - [ ] Global dictionary encoding - [ ] Enhance IO schedule framework - [ ] JIT / Codegen # Easy to use - [ ] List partition optimization - Improve `files` table function - [ ] Improve schema inference - [ ] CSV and json format support - [ ] Other format: Avro, Arrow, Protobuf - [ ] Better performance for read, predicates pushdown - Insert statement improvement (on duplicate key, insert properties) - Unified data ingestion with Pipe - [ ] Pipe for continuous ingestion from Kafka - [ ] Read from external stream table(Kafka) - [ ] Continues data ingestion from SQS with Pipe - [ ] Out-of-the-box parameters # Data lake analytics - Better lake format support Lake | Query | Insert | DDL | Update/Delete/Merge into | MV --- | --- | --- | --- | --- | --- Hive | 1.18 | 3.2 | | | 2.5 Iceberg | 2.1 | 3.1 | 3.3 | 3.3 | 3.0 Hudi | 2.2 | | | | 3.0 Paimon | 3.0 | | | | 3.2 Delta lake | 3.0 | | | | 3.2 - Materialized view improvement - [ ] Improve partition mapping (list partition, expression partition) - [ ] Task scheduler DAG & Lineage - [ ] Better query rewrite - [ ] JDBC catalog improvement - [ ] Enhance JNI reader and implement JNI writer - [ ] Text File format support - [ ] Presto/Trino/Spark/Hive SQL compatibility - [ ] Presto/Trino/Spark/Hive UDF compatibility - [ ] Automatic cooldown to lake format - [ ] Lake metadata optimization for Iceberg / Hudi # Data warehousing(batch and streaming) ## Batch processing & ETL improvement - [ ] Enable spilling by default globally - [ ] Multi-statement transaction - [ ] Temporary table - [ ] Group execution - [ ] Task auto retry ## Streaming processing & real-time update - [ ] Schemaless partial update - [ ] Merge into statement - [ ] Binlog to flink and spark streaming - [ ] Transaction level incremental refresh in materialized view (Aggregation, Join, functions) - [ ] Incremental refresh for iceberg/Hudi/Paimon materialized view ## Metadata - [ ] Fine granularity Fe lock(from db level to table level) - [ ] Decoupled storage for FE (kv store) # All-in-one scenarios - [ ] Search: Optimize full text inverted index - [ ] Row store: Optimize row store for high concurrent point lookup - [ ] Time series db: Asof join, high concurrent ingestion - [ ] Vector database: vector index # Release - 3.3 release plan - 3.4 release plan -

Topic		Replies	Views
StarRocks 3.1.0 Release Notes Release Notes	0	212	January 26, 2024
StarRocks 3.2.0 Release Notes Release Notes	0	349	January 25, 2024
StarRocks Internal Table Format advantages over Apache Iceberg Open Table Formats (Iceberg, Hudi, Hive, Delta)	1	653	October 21, 2024
StarRocks Technical Overview (as of April 2024) Presentations, Articles and Webinars	1	430	February 21, 2024
Common Data (read and write) Patterns in StarRocks Query and Materialized Views	0	506	January 26, 2024

StarRocks Roadmap 2024

Related topics