FE Leader Node failed after hours

Before asking, did you search first? Press :mag: at the upper right to search.

Questions Template: Steps to reproduce the behavior, Expected behavior, Real behavior, version number, BE/FE logs, query plan, schema, sql statement. See Guide: All the data we need to help with debugging / troubleshooting a database issue for more details.

  • Support Latest Releases: Community support is prioritized on the latest community release and the immediate previous release (e.g., 3.2 and 3.1). This keeps support efficient and helps you access the latest features and improvements.
  • One Question per Post: Please create separate posts for multiple questions. This improves discoverability and helps others find answers easily.
  • Open Source SLA: Community support operates with a best-effort SLA. For guaranteed response times and higher priority concerns, please consider CelerData commercial support plans.
  • English Language: We aim to serve an international audience, so please post questions in English. Chinese questions can be directed to the mirrorship community.
  • Answer Visibility: Mark resolved questions as ā€œcompletedā€ to assist with community task management.
  • Functional Needs Testing: Before investing in setting up an proper StarRocks infrastructure environment, consider testing StarRocks with the quickstart container, StarRocks k8s operator or the free developer tier on http://cloud.celerdata.com to ensure it meets your functional requirements. This helps our StarRocks Community team focus on functional issues rather than infrastructure debugging.

Recently when launch StarRocks 3.3.0 cluster with the 3 FE + 3 BE nodes, it works fine. Mysql meta database can be accessible through mysql client and see Leader node and followers with Alive:true with SHOW PROC ā€˜frontendsā€™\g;
However, after hours, it reports that mysql cannot be connected even from the Leader node, so itā€™s not network issue.
Donā€™t see useful errors in FE log and warn.log files.
I knew that this situation could be quickly repaired by deleting the meta directory and re-creating one. However, all cluster info is TOTALLY lost. And looks like NO follower is elected as Leader. This didnā€™t happen.
Then, I tried with only 1 FE Leader node + 3 BE nodes. After hours, this FE leader node fails and the entire cluster becomes unreachable.
Anyone experienced such frequent FE leader node failure as I did? See if you can share your experience/solution?

Metadata Recovery | StarRocks You can follow this guidance to recover the cluster.