PRTG Manual: Clustering
Clustering is a high-availability feature to help you reach 100% uptime of your IT network infrastructure. With a cluster, you can ensure fail-safe monitoring that allows you to continuously collect data from your network. This way, you can avoid downtimes caused by failing connections because of an internet outage at a PRTG server's location, failing hardware, or because of downtime caused by a software update for the operating system or PRTG itself.
A PRTG Cluster consists of two or more installations of PRTG that work together to form a high availability monitoring system. All PRTG on premises licenses allow you to have a simple cluster, composed of two PRTG installations that work together.
This feature is not available in PRTG hosted by Paessler.
A PRTG cluster consists of at least two nodes: one Primary Master Node and one or more Failover Nodes, where up to 4 failover nodes are possible. Each cluster node is a full installation of PRTG that could perform all of the monitoring and alerting on its own.
Cluster nodes are connected to each other using two TCP/IP connections. They communicate in both directions and a single node only needs to connect to one other node to integrate into the cluster.
During normal operation, you configure devices, sensors, and all other monitoring objects on the primary master using the web interface or PRTG Desktop. The master node automatically distributes the configuration among all other nodes in real time.
All devices that you create on the Cluster Probe are monitored by all nodes in the cluster, so data from different perspectives is available and monitoring for these devices always continues, even if one of the nodes fails. If the primary master fails, one of the failover nodes takes over the master role and controls the cluster until the master node is back. This ensures fail-safe monitoring and continuous data collection.
A PRTG cluster works in active-active mode. This means that all cluster nodes permanently monitor the network according to the common configuration received from the current master node and each node stores the results into its own database. The storage of monitoring results is also distributed among the cluster. PRTG updates need to be installed on one node only. This node automatically deploys the new version to all other nodes.
If downtimes or threshold breaches are discovered by one or more nodes, only one installation, either the primary master or the failover master, sends out notifications (for example, via email, SMS text message, or push message). Because of this, you are not flooded with notifications from all cluster nodes in case failures occur.
During the outage of a node, it cannot collect monitoring data. The data of this single node shows gaps. However, monitoring data for this time span is still available on the other nodes. There is no functionality to actually fill those gaps with the data of other nodes.
Because the monitoring configuration is managed centrally, you can only change it on the master node, but you can review the monitoring results by logging in to the web interface of any of the failover nodes in read-only mode.
If you use remote probes in a cluster, each probe connects to each node of your cluster and sends the data to all cluster nodes, the current primary master as well as the failover nodes. You can define the Cluster Connectivity of each probe in the Probe Administrative Settings.
As a consequence of this concept, monitoring traffic and load on the network is multiplied by the number of used cluster nodes. Moreover, the devices on the cluster probe are monitored by all cluster nodes, so the monitoring load increases on these devices.
This is not a problem for most usage scenarios, but consider the Detailed System Requirements. As a rule of thumb, each additional node in the cluster results in dividing the number of sensors that you can use by two.
More than 5,000 sensors per cluster are not officially supported. Contact your presales team if you exceed this limit. For possible alternatives to a cluster, see the Knowledge Base: Are there alternatives to the PRTG cluster when running a large installation?
For detailed information, see section Failover Cluster Configuration.
Knowledge Base: What's the Clustering Feature in PRTG?
Knowledge Base: In which web interface do I log in if the Master Node fails?
Knowledge Base: What are the bandwidth requirements for running a PRTG Cluster?
Knowledge Base: Are there alternatives to the PRTG cluster when running a large installation?
Paessler Website: How to connect PRTG through a firewall in 4 steps
Video Tutorial: PRTG – How to Set Up a PRTG Cluster