How does a CDN work?

Last update: 20170628

Last Friday I was asked “how does a CDN work?” The first time I was asked the same question was year 1999. I acquired a lot of CDN knowledge and experience since I join the industry in 2012. CDN is not yet an everyday business in Asia nowadays. I want to help. An one page brief explanation. Hope you like it.

What is a CDN?

A Content Delivery Network (CDN) is a collection of web servers distributed among multiple locations to deliver content to users more efficiently.[1]

What is the job to be done by CDN?

To send web content to enduser faster. How? By moving data close to the endusers to reduce round trips and network.

Ilya Grigorik, web performance engineer at Google, says [2]:

  • Latency, not bandwidth, is the performance bottleneck for most websites.
  • Four major latency types: propagation delay, transmission delay, process delay, queuing delay
  • To improve performance of our applications… we need to reduce round trips, move the data closer to the client.


Centralized Server vs CDN [3]

How does a CDN work?

In essence, CDN puts your content in many places at once, providing superior coverage to your users. For example, when someone in London accesses your US-hosted website, it is done through a local UK CDN server. This is much quicker than having the visitor’s requests, and your responses, travel the full width of the Atlantic and back.[4]

How??

By connecting users who request your website content to CDN, instead of connecting directly to your web server.

Using traditional centralized server model, a website, say www .abc.com, will usually have a “A Record” in the DNS configuration (of abc.com) to inform people to connect to the web server with the specific IP address, say 1.1.1.1. We call this Customer Origin. When a user John in Beijing browses www .abc.com the following (behind the scene) steps will happen:

  1. John’s computer will ask the its local DNS resolver to figure out the IP address of www .abc.com
  2. the resolver will then ask the root DNS server what is the authoritative DNS server of abc.com
  3. The resolver will then ask the authoritative DNS sever of abc.com what is the IP address of www .abc.com and will get the answer of 1.1.1.1.
  4. John’s computer will then connect to 1.1.1.1 to get the web content of www .abc.com

With CDN, www .abc.com will have a “CNAME Record” in the DNS configuration to inform people to connect to the server with the specific hostname, say abc.customer.cdn.com provided by the CDN provider. When John browses the website www .abc.com, we will have the same first two steps above-mentioned, but subsequent steps will become:

  1. The resolver will then ask the authoritative DNS server of abc.com what is the IP address of www .abc.com and will get the answer of abc.customer.cdn.com
  2. The resolver will then ask the authoritative DNS server of cdn.com what is the IP address of abc.customer.cdn.com. The authorization DNS server of cdn.com will, based on the IP address of John’s local DNS resolver and the CDN provider’s routing algorithm, provide the answer, say 2.2.2.2, which is the IP address of the CDN cache server located in/near Beijing and is the best machine to deliver the content of www .abc.com to John
  3. John’s computer will then connect to 2.2.2.2 to retrieve the web content of www .abc.com

Say there is another user Mary in Singapore going to browse www .abc.com, we will have the similar four steps, but the CDN provider will use, say 3.3.3.3, which is the IP address of the CDN cache server located in/near Singapore which can serve Mary most efficiently.

Do I need to upload my website content to CDN?

Not necessary. In most cases, CDN will on-demand fetch content from the Customer Origin.

When the CDN cache serve receives the HTTP GET request of www .abc.com from an enduser, the cache server will check if the requested content of www .abc.com is already stored in its storage:

  • If yes, then the cache server will send the content stored on it to the enduser. We call this “Cache Hit”
  • If no, then the cache server will fetch the content from the Customer Origin, then store it and send it to the enduser. We call the situation of missed content “Cache Miss”, and the process of fetching content from Customer Origin “Cache Fill”. CDN will Cache Fill when it needs to update the stored content.

The higher the Cache Hit rate, the faster the content delivery. High Cache Hit rate better utilizes CDN and reduces Customer Origin loading.

How long will the CDN keep my web content?

CDN provider will usually honor the Cache-Control header setting in Customer Origin. During the Cache Fill process, the CDN cache server will cache both the web content and the HTTP header information, including the Cache-Control header attributes. CDN provider will keep the content according to the Cache-Control setting or based on the customer CDN configuration.

What else?

CDN nowadays offers more features and services. Below are the key benefits of using CDN:

  • Faster delivery of both cacheable and non-cacheable content
  • Higher availability by leveraging CDN’s global footprint and scale.
  • More secure by shielding the Customer Origin from public access.
  • Reduce the expense on Customer Origin resources (computer and connectivity)

That’s all.

Reference:

  1. High Performance Web Sites, Steve Souders
  2. High Performance Browser Networking, Ilya Grigorik
  3. CDN, Wikipedia
  4. Essential Guide to CDN, Incapsula
  5. Cache-Control header, Mozilla.org

Notes

I prepare this blogpost on the new iPad Pro 10.5” with Smart Keyboard. It is GREAT. Looking forward to iOS 11!

img_1194

[Update: 20170628] Apple iPad Pro 10.5” with iOS 11 beta

IMG_3538