Failures happen. Downtime doesn't.
The HA Scale makes your Tophan cluster resilient to hardware failures, network partitions, and maintenance windows. VMs restart automatically, storage rebuilds itself, and workloads migrate without downtime.
| Feature | Description | Status |
|---|---|---|
| VRRP Failover | Virtual IP failover between nodes. Sub-second switchover for critical services. | Beta |
| VM Auto-Restart | If a node fails, its VMs restart on surviving nodes automatically. Priority-based ordering. | Beta |
| Live Migration | Move running VMs between nodes with zero downtime for planned maintenance. | Beta |
| Storage Migration | Relocate VM storage between pools or nodes without shutting down the VM. | Planned |
| Maintenance Mode | Drain a node of all workloads before maintenance. One click, zero downtime. | Beta |
| Self-Healing | Automatic recovery from detected failures. Failed services restart, failed nodes are fenced. | Beta |
| Fencing | IPMI/iLO-based node fencing ensures failed nodes are definitively shut down before recovery begins. No split-brain data corruption. | Beta |
| Split-Brain Prevention | Quorum-based decisions prevent both halves of a partition from operating independently. Configurable quorum policies. | Beta |
| Health Monitoring | Continuous health checks across all nodes and services. Failure detection in seconds. | Beta |
| Affinity Rules | Control VM placement: keep VMs together, keep them apart, prefer specific nodes. | Planned |
The entire sequence — detection through recovery — completes in under a minute for most configurations. No human intervention required.
When you need to patch, upgrade, or replace hardware:
This is routine, not exceptional. Tophan is designed to be maintained without maintenance windows.