PRTG Manual: Clustering
Clustering is a high-availability feature to help you reach 100% uptime of your IT network infrastructure. With a cluster, you can ensure fail-safe monitoring that lets you continuously collect data from your network. This way, you can avoid downtimes caused by failing connections because of an internet outage at a PRTG core server's location, failing hardware, or because of downtime caused by a software update for the operating system or for PRTG.
A cluster consists of two or more PRTG core servers that work together to form a high availability monitoring system. With all PRTG on premises licenses, you can have a simple cluster, composed of two PRTG core servers that work together.
This feature is not available in PRTG hosted by Paessler.
A cluster consists of at least two cluster nodes: one Primary Master Node and one or more Failover Nodes, where up to 4 failover nodes are possible. Each cluster node is a full PRTG core server installation that could perform all of the monitoring and alerting on its own.
Cluster nodes are connected to each other using two TCP/IP connections. They communicate in both directions and a single cluster node only needs to connect to one other cluster node to integrate into the cluster.
During normal operation, you configure devices, sensors, and all other monitoring objects on the primary master. The master node automatically distributes the configuration among all other cluster nodes in real time.
All devices that you create on the cluster probe are monitored by all cluster nodes, so data from different perspectives is available and monitoring always continues, even if one of the cluster nodes fails. If the primary master fails, one of the failover nodes takes over the master role and controls the cluster until the master node is back. This ensures fail-safe monitoring and continuous data collection.
A cluster works in active-active mode. This means that all cluster nodes permanently monitor the network according to the common configuration received from the current master node and each cluster node stores the results into its own database. The storage of monitoring results is also distributed among the cluster. PRTG updates need to be installed on one cluster node only. This cluster node automatically deploys the new version to all other cluster nodes.
If downtime or threshold breaches are discovered by one or more cluster nodes, only one installation, either the primary master or the failover master, sends out notifications (for example, via email, SMS text message, or push message). Because of this, you are not flooded with notifications from all cluster nodes in case failures occur.
During the outage of a cluster node, it cannot collect monitoring data. The data of this single cluster node shows gaps. However, monitoring data for this time span is still available on the other cluster nodes. There is no functionality to actually fill these gaps with the data of other cluster nodes.
Because the monitoring configuration is managed centrally, you can only change it on the master node, but you can review the monitoring results by logging in to the PRTG web interface of any of the failover nodes in read-only mode.
If you use remote probes in a cluster, each probe connects to each cluster node and sends the data to all cluster nodes, the current primary master as well as the failover nodes. You can define the Cluster Connectivity of each probe in the Probe Administrative Settings.
As a consequence of this concept, monitoring traffic and load on the network is multiplied by the number of used cluster nodes. Moreover, the devices on the cluster probe are monitored by all cluster nodes, so the monitoring load increases on these devices.
This is not a problem for most usage scenarios, but consider the Detailed System Requirements. As a rule of thumb, each additional cluster node results in dividing the number of sensors that you can use by two.
More than 5,000 sensors per cluster are not officially supported. Contact your presales team if you exceed this limit. For possible alternatives to a cluster, see the Knowledge Base: Are there alternatives to the cluster when running a large installation?
For detailed information, see section Failover Cluster Configuration.
Knowledge Base: What's the Clustering Feature in PRTG?
Knowledge Base: In which web interface do I log in if the Master Node fails?
Knowledge Base: What are the bandwidth requirements for running a Cluster?
Knowledge Base: Are there alternatives to the cluster when running a large installation?
Paessler Website: How to connect PRTG through a firewall in 4 steps
Video Tutorial: PRTG – How to Set Up a PRTG Cluster