PRTG Manual: Clustering
The main goal of any monitoring solution is to help you reach 100% availability of your IT and network infrastructure and avoid costly downtimes. Because of this, it is necessary to permanently monitor the IT infrastructure and so the objective is to reach true 100% uptime for the monitoring tool. It needs a high availability mechanism for this purpose like, for example, clustering.
PRTG Network Monitor not only allows you to monitor all your infrastructure with only one tool, but also to monitor it twenty-four hours a day and offers a high availability cluster out of the box. With clustering, the uptime will no longer be degraded by failing connections because of an internet outage at a PRTG server's location, failing hardware, or because of downtime due to a software update for the operating system or PRTG itself.
A PRTG Cluster consists of two or more installations of PRTG that work together to form a high availability monitoring system. All PRTG on premises licenses allow you to have a simple cluster, composed of two PRTG installations working together.
This feature is not available in PRTG hosted by Paessler.
A PRTG cluster consists of at least two nodes: one Primary Master Node and one or more Failover Nodes, where up to 4 failover nodes are possible. Each cluster node is simply a full installation of PRTG that could perform the whole monitoring and alerting on its own.
Cluster nodes are connected to each other using two TCP/IP connections. They communicate in both directions and a single node only needs to connect to one other node to integrate into the cluster.
During normal operation you configure devices, sensors, and all other monitoring objects on the Primary Master using the web interface or Enterprise Console. The master node automatically distributes the configuration to all other nodes in real time.
All devices that you create on the Cluster Probe are monitored by all nodes in the cluster, so data from different perspectives is available and monitoring for these devices always continues, even if one of the nodes fails. If the Primary Master fails, one of the Failover Nodes will take over the master role and control the cluster until the master node is back. This ensures a fail-safe monitoring with gapless data.
A PRTG cluster works in active-active mode. This means that all cluster nodes are permanently monitoring the network according to the common configuration received from the current master node and each node stores the results into its own database. The storage of monitoring results is also distributed among the cluster. PRTG updates need to be installed on one node only. This node will automatically deploy the new version to all other nodes.
If downtimes or threshold breaches are discovered by one or more nodes, only one installation, either the Primary Master or the Failover Master, will send out notifications (for example, via email, SMS text message, or push message). Because of this, the administrator will not be flooded with notifications from all cluster nodes in case failures occur.
During the outage of a node, it will not be able to collect monitoring data. The data of this single node will show gaps. However, monitoring data for this time span is still available on the other node(s). There is no functionality to actually fill in other nodes' data into those gaps.
Because the monitoring configuration is managed centrally, you can only change it on the master node, but you can review the monitoring results by logging in to the web interface of any of the failover nodes in read-only mode.
If you use remote probes in a cluster, each probe connects to each node of your cluster and sends the data to all cluster nodes, the current primary master as well as the failover nodes. You can define Cluster Connectivity of each probe in the Probe Administrative Settings.
As a consequence of this concept, monitoring traffic and load on the network is multiplied by the number of used cluster nodes. Moreover, the devices on the cluster probe are monitored by all cluster nodes, so you will encounter an increase in monitoring load on these devices.
This will not be a problem for most usage scenarios, but consider the Detailed System Requirements. As a rule of thumb, each additional node in the cluster results in dividing the number of sensors that you can use by two.
More than 5,000 sensors per cluster are not officially supported. Please contact your presales team if you exceed this limit and see this Knowledge Base article for possible alternatives to a cluster: Are there alternatives to the PRTG cluster when running a large installation?
For detailed information, see Failover Cluster Configuration.
Knowledge Base: What's the Clustering Feature in PRTG?
Knowledge Base: In which web interface do I log in if the Master Node fails?
Knowledge Base: Are there alternatives to the PRTG cluster when running a large installation?
Video Tutorial: Cluster in PRTG — This is how it works
Video Tutorial: How to set up a PRTG cluster