Beta

HA Scale

Failures happen. Downtime doesn't.

HA Scale

The HA Scale makes your Tophan cluster resilient to hardware failures, network partitions, and maintenance windows. VMs restart automatically, storage rebuilds itself, and workloads migrate without downtime.

Features

Feature	Description	Status
VRRP Failover	Virtual IP failover between nodes. Sub-second switchover for critical services.	Beta
VM Auto-Restart	If a node fails, its VMs restart on surviving nodes automatically. Priority-based ordering.	Beta
Live Migration	Move running VMs between nodes with zero downtime for planned maintenance.	Beta
Storage Migration	Relocate VM storage between pools or nodes without shutting down the VM.	Planned
Maintenance Mode	Drain a node of all workloads before maintenance. One click, zero downtime.	Beta
Self-Healing	Automatic recovery from detected failures. Failed services restart, failed nodes are fenced.	Beta
Fencing	IPMI/iLO-based node fencing ensures failed nodes are definitively shut down before recovery begins. No split-brain data corruption.	Beta
Split-Brain Prevention	Quorum-based decisions prevent both halves of a partition from operating independently. Configurable quorum policies.	Beta
Health Monitoring	Continuous health checks across all nodes and services. Failure detection in seconds.	Beta
Affinity Rules	Control VM placement: keep VMs together, keep them apart, prefer specific nodes.	Planned

How Failover Works

Detection: Health monitoring detects a node failure (missed heartbeats, failed health checks)
Fencing: The failed node is fenced via IPMI/iLO to ensure it’s truly down — no assumptions
Recovery: VMs from the failed node are restarted on surviving nodes based on priority and resource availability
Storage: The Storage Scale automatically begins reconstructing any data that was on the failed node’s disks

The entire sequence — detection through recovery — completes in under a minute for most configurations. No human intervention required.

Maintenance Without Downtime

When you need to patch, upgrade, or replace hardware:

Put the node in maintenance mode
All VMs live-migrate to other nodes (zero downtime)
Perform maintenance
Exit maintenance mode
VMs can migrate back (or stay where they are — the scheduler optimises placement)

This is routine, not exceptional. Tophan is designed to be maintained without maintenance windows.