We have a problem that if one CN goes down any query stops to process with message
SQL Error [1064] [42000]: Backend node not found. Check if any backend node is down.backend: [starrocks-be-2.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-be-0.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-be-4.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-be-5.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-be-3.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-be-1.starrocks-be-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-cn-7.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-6.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-5.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-4.starrocks-cn-search.starrocks.svc.cluster.local alive: false inBlacklist: false] [starrocks-cn-3.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-2.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-9.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: true] [starrocks-cn-0.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-8.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false] [starrocks-cn-1.starrocks-cn-search.starrocks.svc.cluster.local alive: true inBlacklist: false]
Should we set replication_factor for local table cache? e.g. 2 or 3 to stay alive while one node goes down? Does replication_factor is working with shared data? If CN is stateless and can be replaced with any other CN why this error can happen?