Nowadays, there are a lot of Content Delivery Network (CDN) providers available to choose from. When you're testing the performance of CDN Providers as part of your selection process, there are certain factors to be considered. Some of these factors might be beyond the control of both the provider and the customer:
Generally, the network conditions vary according to the geographical areas and are beyond direct control.
The conditions might vary depending on the peering relationship maturity levels operating in that geographical scope.
The timing and type of the test also play a prominent role. Major traffic-generating events such as significant news announcements or headline sports events tend to affect all CDNs, regardless of the PoP architecture.
However, there are some generic guidelines that can be followed. The following post aims to highlight the comprehensive framework of the CDNs, to uncover certain untold facts and also to aid you in choosing the right CDN provider for your specific situation.
CDN Core Values
The value of a CDN boils down to 4 main criteria;
Commonly, when discussing the speed of something, many terms are thrown into the mix, such as bandwidth, throughput, latency and jitter. But the concept of "speed" does not necessarily reveal which component is performing the "best". The "best" or "fastest" CDN comes down to what the users are actually seeing, not individual metrics. A CDN cannot be evaluated solely from the technical or "feature availability" standpoint. It must be assessed based on its operational and monitoring effectiveness. After all, it's not just about PoP design and technical features -- the performance does matter. For example, a DIY CDN may perform well in certain tests in particular geographical areas, but then fall apart operationally due to insufficient understanding of how to effectively manage a CDN.
So let's make these complex things plain sailing for you.
Key CDN Considerations
Different CDN Designs
When you examine the performance of a CDN, you need to consider the architecture, because the architecture plays a significant role in how that CDN responds to certain types of performance tests. There are two common CDN architectures:
1) Traditional design of largely dispersed smaller PoPs
2) Latest design class of Mega PoPs in the major Internet Exchange points (IXP)
The type of design plays an important role when analysing CDN performance metrics. Both designs have strengths and weaknesses, especially when undergoing performance testing in different geographical areas such as India, Latin America, North America and Europe.
Mega PoP designs are more suited to countries with mature and intelligent peering relationships. However, a smaller, distributed PoP design would be recommended for continents such as Latin America and India, where the peering relationships are less developed.
A widely-dispersed smaller PoP design looks better only from the point of latency tests. However, the cache hit ratio is higher, as the smaller PoP cannot hold as much data as the larger Mega PoP. For this reason, it is recommended to test with a variety of performance metrics, not only latency, even though latency is the first (and sometimes only) step for many testers.
Essentially, it boils down to individual solution requirements and where your eyeball network is located. If your eyeball network is in one location, it might be simpler and advisable to connect to a small PoP in that particular region, as opposed to traversing to a Mega PoP at an IXP.
The performance of a CDN is impacted by the relationship the CDN has with the local Internet Service Provider (ISP). Big players such as Azure CDN or Akamai might look awful in some remote counties, as the local CDN Provider may have better peering relationships with their local ISP.
Effective Performance Criteria
Effective performance testing is vital when it comes to CDN. So, let's look at the "not to miss" sectors, and then conclude with the recommended testing strategy. For adequate testing, the following sectors must be included:
2. Throughput and Bandwidth
3. DNS lookup times
4. TCP connection times
5. Content download
6. Cache hit times
7. Round trip times
8. Time-to-first byte
Latency is the time it takes to send and receive a packet between two endpoints. It prevails everywhere, and everything contributes to latency. Latency is in the hardware stack, networking conditions and within the basic physics of the speed of light.
High latency degrades the performance and one should always aim for low latency communications between two endpoints. The only real way to decrease latency is to move endpoints closer together. When you talk about CDNs, everyone mentions latency. However, latency is a temperamental little beast and should never be the sole performance metric for evaluation. As mentioned earlier, a widely-dispersed smaller PoP design looks great from the latency perspective.
The 100ms mark is a common aim for latency. At this state, the user feels everything is happening in real time and their thought process is not interrupted. Above this level, the user becomes distracted and may move to a different task. One way to quickly measure latency is to issue the Ping command.
Ping is one of the most basic network diagnostic tools used to test the accessibility to a destination node. It is usually the starting point for diagnostic troubleshooting and performance testing. It does not stress the link bandwidth and sends only a very small amount of information - it sends a message to a server and waits for a reply. Ping sends an ICMP echo_request to the target destination and waits for an ICMP echo_response. It measures both the round-trip delay time and any packet loss.
THE Problem with PING
However, ping does not represent a true CDN environment as it only checks individual hops along the way to report back the accessibility state and RTT. It tells you that an endpoint can respond to messages, but doesn'T provide the key performance criteria to determine whether a CDN is performing adequately.
Ping is ICMP-based, not TCP-based, so Ping results don't accurately represent testing as a TCP-based application. Many router Quality of Service (QoS) configurations give ping a low priority under congestion, and some administrators even block all ICMP messages. Therefore, it should only be used to test basic accessibility during initial stages.
Traceroute is another network diagnostic tool, used to discover the route a packet takes while travelling to its destination. It reports the packet's paths and RTT. It is implemented mostly with ICMP or UDP, depending on the device type that triggers the command. Some implementations use TCP.
Most Microsoft Windows deployments use ICMP echoes while Linux stations use both ICMP echoes and UDP. Most router platforms like Cisco IOS use UDP.
For an ICMP-based traceroute, ICMP echo_requests are sent with an ICMP echo_reply that is sent by the final destination.
UDP-based traceroute sends UDP packets starting at port 33434.
TCP-based traceroute sends TCP SYN packets attempting to complete the TCP 3-way handshake with the final destination.
Throughput & Bandwidth
Throughput offers a benchmark for the actual capacity achievable between two endpoints. It is affected by the current traffic load or the delay that exists on the segment. Bandwidth, on the other hand, is the total capacity the segment supports. The maximum throughput between two endpoints is a key value.
To understand the difference between throughput and bandwidth more clearly, let's visualise the analogy of a motorway with the ability to carry traffic on four lanes. The four lanes refer to the bandwidth and any increase in bandwidth would involve increasing the number of lanes. While still visualising the analogy of the highway, let's address throughput. Throughput refers to the number of cars that can travel between the given endpoints over a period of time.
The throughput (number of cars between two endpoints) is influenced dynamically by an increase in traffic (number of cars) or delays (cars slowing down). The bandwidth cannot be dynamically altered by the amount of traffic, while throughput can.
Effective Testing Procedure
The following points list some of the most important steps to consider when testing the performance of a CDN provider:
- First and foremost, set a baseline performance metric value. Once a benchmark is in place, it is easier to consider what is acceptable.
- Use publicly available reports from a trusted Real User Monitoring (RUM) Provider. Employing RUM as a testing metric that offers a complete picture of what users are seeing.
- Synthetic testing should be deployed for customer specific use cases. Catchpoint, for example, provide synthetic testing capabilities.
Synthetic type tests are used for corner case scenarios and to test specific environments. Synthetic measurements test from a client to a particular data centre and indicate what users may see in that specific environment. However, they won't provide comprehensive results of what is actually happening. In the real world, different users come from various ISPs, all with different peering relationships, so a synthetic test doesn't really match real conditions.
- Run the actual tests based on your solution requirements over a 30 - 60 day period.
Testing should never be done on a single day or time. Specific sport events or popular news broadcasts at a particular time of day will affect the latency for all providers, regardless of geography or PoP architecture.
- Analyse the results and dive deep into the specifics. Examine the performance results for each geographic region, file size and type, and times of day.
It may be the case that a CDN provider is not the right solution. Either the customer eyeball networks are close to the origin servers, or you do large point-to-point file transfers where you don't require a distributed architecture.
A note on Self-tests
If you are carrying out self-tests and have configured the origin server to point to the CDN Provider, it's important to run the tests a few times and to use browser monitoring tools to investigate the results. The files, either zip, media or image files, should be downloaded a few times, as there will always be an initial penalty for the first connecting user.
The content for the first user connection will never already be on the PoP. As a result, it will require additional time to transfer to the origin. This additional relaying increases the RTT and degrades performance.
Individual PoPs are not connected; some may hold the content while others may not. Caching is location-by-location independent, so the tests will need to be re-run in places that don't have the content cached. Examining the headers using the built in browser tools can inform of any cache hits and misses. If there is a hit you will see additional information in the server header.
Effective MONITORING TOOLS
Ultimately, it is the page loads that count. Therefore, you should be targeting the tools that evaluate the speed at which the page loads for your users. Period. It doesn't matter how fast the CDN is at the time of first byte or during a synthetic test. These tests are good measurements but they have a major drawback -- they don't actually measure what the user is seeing. The tests can be conducted solely with limited frequency and only provide a subset of the actual conditions. They do not provide the complete picture.
So when you have real RUM measurements available, there are real ways to measure the performance of a CDN. The utility of the CDN is the endgame - it doesn't matter if you have a fast CDN if it loads your page at a snail's pace.
Cache hit rate is a great metric to observe. You can correlate it against performance, but surely there are CDNs available that have poor cache hit rates even though they can load your site faster. However, performance metrics such latency, throughput, bandwidth, DNS lookup times, TCP connection times, content download, cache hit times, round trip times, time-to-first byte and other variables that lead to good RUM results are just part of the story. These are all good variables to consider but they need to be prevailing in the website. Looking at CDN variables from a vacuum will only tell you how the CDN is performing by the virtue of that variable.
The tool for monitoring the performance of a CDN has to be RUM. It has to be real time, and anything short of this in today's fast-paced world of 2017 would be a disservice to you since it won't do a very good job in measuring CDN performance.
Whatever type of CDN monitoring you use, it needs to blend into your business as you are buying or moving to a CDN to improve your business. These are the metrics that matter the most. There are lots of great metrics to consider but until you measure the impact of those metrics on your users, it's just going to be up in the air. If you want to measure the experience of your users, you ought to have relevant tools.
Cedexis have a free report at this link. It provides comparisons based on availability, latency, and throughput for a range of service types. Cloudharmony has a comprehensive CDN test page that tests both download rates and latency. For additional information on testing CDN's, Microsoft Azure has released a great video.
want to know more?
- Matt Conran's Network Insight blog
Related Blog Articles:
- A Step-by-Step Guide: Figure Out Who's Hogging Your Bandwidth
- Conquer the Slow: How To Accelerate Your Apps With A CDN
- Understand the Inner Workings of a CDN: PoP Architecture & DNS Load Balancing vs. Anycast
- How to Optimize Your Apps for a CDN and Then Monitor the Impact
- A Brief History of Ping, or How A Thousand Lines of Code Changed Networking Forever