opened 11:08AM - 22 Jan 24 UTC
type/feature-request
> Refer to roadmap [2023](https://github.com/StarRocks/starrocks/issues/16445) …[2022](https://github.com/StarRocks/starrocks/issues/1244)
# Shared-data & StarOS
- Align with all functionalities to shared-nothing
- [ ] Sync materialized view
- [ ] Generated column
- [ ] Partial update with column mode
- [ ] Optimize table and manual compaction
- Better cache system
- [ ] Multi-layer cache
- [ ] Global cache
- [ ] Cache Auto warmup
- [ ] Cache black/whitelist
- [ ] Refine evict algorithm
- StarOS internal optimization
- [ ] Multi-replicas for shard management
- [ ] Shard schedule optimization for large scale (more than 10M shards)
- [ ] Local storage for StarOS
- [ ] Open API for StarRocks table format (sink and source)
- [ ] Time Travel
- [ ] Backup support
# Performance
- [ ] Full columnar Json index
- [ ] Cost model with primary key and foreign key constrains
- [ ] Arm optimization for codecs
- [ ] Adaptive DOP and adaptive query engine
- [ ] Global dictionary encoding
- [ ] Enhance IO schedule framework
- [ ] JIT / Codegen
# Easy to use
- [ ] List partition optimization
- Improve `files` table function
- [ ] Improve schema inference
- [ ] CSV and json format support
- [ ] Other format: Avro, Arrow, Protobuf
- [ ] Better performance for read, predicates pushdown
- Insert statement improvement (on duplicate key, insert properties)
- Unified data ingestion with Pipe
- [ ] Pipe for continuous ingestion from Kafka
- [ ] Read from external stream table(Kafka)
- [ ] Continues data ingestion from SQS with Pipe
- [ ] Out-of-the-box parameters
# Data lake analytics
- Better lake format support
Lake | Query | Insert | DDL | Update/Delete/Merge into | MV
--- | --- | --- | --- | --- | ---
Hive | 1.18 | 3.2 | | | 2.5
Iceberg | 2.1 | 3.1 | 3.3 | 3.3 | 3.0
Hudi | 2.2 | | | | 3.0
Paimon | 3.0 | | | | 3.2
Delta lake | 3.0 | | | | 3.2
- Materialized view improvement
- [ ] Improve partition mapping (list partition, expression partition)
- [ ] Task scheduler DAG & Lineage
- [ ] Better query rewrite
- [ ] JDBC catalog improvement
- [ ] Enhance JNI reader and implement JNI writer
- [ ] Text File format support
- [ ] Presto/Trino/Spark/Hive SQL compatibility
- [ ] Presto/Trino/Spark/Hive UDF compatibility
- [ ] Automatic cooldown to lake format
- [ ] Lake metadata optimization for Iceberg / Hudi
# Data warehousing(batch and streaming)
## Batch processing & ETL improvement
- [ ] Enable spilling by default globally
- [ ] Multi-statement transaction
- [ ] Temporary table
- [ ] Group execution
- [ ] Task auto retry
## Streaming processing & real-time update
- [ ] Schemaless partial update
- [ ] Merge into statement
- [ ] Binlog to flink and spark streaming
- [ ] Transaction level incremental refresh in materialized view (Aggregation, Join, functions)
- [ ] Incremental refresh for iceberg/Hudi/Paimon materialized view
## Metadata
- [ ] Fine granularity Fe lock(from db level to table level)
- [ ] Decoupled storage for FE (kv store)
# All-in-one scenarios
- [ ] Search: Optimize full text inverted index
- [ ] Row store: Optimize row store for high concurrent point lookup
- [ ] Time series db: Asof join, high concurrent ingestion
- [ ] Vector database: vector index
# Release
- 3.3 release plan
- 3.4 release plan
-