Can you imagine what life would be like if you had to remember the phone number of all your friends, family and co-workers? No one does that, partly because smartphones have absolutely obviated the need to remember phone numbers. Instead, we store our contacts and when we need to call someone we just touch that persons name and tap send.
The best way I can explain DNS (Domain Name System) is to show its affinity to an address book. The two aren’t perfectly analogous but generally speaking: phone numbers are to contacts as IP addresses are to host names.
What does DNS do?
The cardinal purpose of DNS is to translate a human friendly domain name such as fixedbyvonnie.com into a machine friendly IP address: 220.127.116.11.
Without DNS you would have to remember the IP address of not only my website but any device connected to the internet. You would have to keep track of the IP address of any network connected printer, TV, bluetooth device, laptop, print server, directory server, file server, web server, mail server or FTP server!
DNS alleviates this burden by translating, technically called resolving, names to IP addresses so the internet can be both practical and accessible.
This ostensibly rudimentary job of resolving hostnames to IP addresses is exacerbated by the billions of names, yes billions, that need to be resolved at any given time and the millions of people all over the world who are constantly registering new domain names, changing existing domain names and swapping hosts.
DNS has a herculean task on it’s hands especially when you realize DNS has to:
- Lookup and return IP addresses of billions of machines in less than a few milliseconds
- Be available and consequently reliable; always ready to respond when people need it
It’ll unveil how DNS does that it a moment but before I do I think it’s important to realize that resolving hostnames isn’t the only purpose of DNS.
One fortuitous side effect of name resolution is that DNS keeps the physical location of the internet asset distinct from the name of that asset. This isn’t obvious at first so let me explain.
Let’s say you have a WordPress blog hosted on a GoDaddy webserver that’s physically located in Phoenix, Arizona. After two years, you move the file on your webserver from GoDaddy to Hostgator which is physically located in Provo, Utah.
After the site move, your web address remains http://www.mysupercoolwordpresssite.com/ even though the actual location of the hosting server changed. DNS makes stuff like this a reality.
But DNS doesn’t just apply to web addresses. Email is also relevant. For example, when you email firstname.lastname@example.org your Mail server has to discern how to get emails to the fixedbyvonnie domain. So it starts by asking a question, called a query. The query returns an almost instant reply with something like this:
fixedbyvonnie.com 1800 IN MX 5 mailman.fixedbyvonnie.com fixedbyvonnie.com 1800 IN MX 10 mailgirl.fixedbyvonnie.com
This arcane piece of text tells the Mail Server that it should deliever the mail to a computer named mailman.fixedbyvonnie.com and then if mailman is down use mailboy.fixedbyvonnie.com as a backup. It knows to prioritize mailman over mailgirl because the former has a lower priority number of 5.
Next, the Mailserver needs the IP address of mailman.fixedbyvonnie.com so it can connect and send the message; therefore, the server sends another DNS query to fetch the IP and then connects.
In this article I’m going to chronicle a brief history of DNS and how it works. After reading, you’ll have a firm understand of why DNS is so important and how it works.
Origins of DNS and why we created it
Back in the embryonic stages of the web, 13 years before DNS was even a zygote, a handful of computer scientists designed a small network for the United States Department of Defense called the Advanced Research Projects Agency Network (ARPANET). The genesis of ARPANET became the precursor to the Internet because it was the worlds first fully operational packet switched network that leveraged the TCP/IP protocol suite.
Computers names, also known as Hosts, were managed through a centrally located file called HOSTS.txt which mapped all the hostnames to their respective IP addresses.
The Stanford Research Institute (SRI) took the responsibility of maintaining this list of hundreds of names-to-IP mappings.
Initially, using the HOSTS text file worked great because there were only a few hundred computers referencing it; however, as ARPANET became ascendant it was manifest that another solution was warranted.
For one, the text file didn’t scale well. As the size expanded it became increasingly difficult to manage. Secondly, name resolution performance began to decline as more computers referenced the file. After all, it was just a flat text file which isn’t ideal for containing multiple records of information. Thirdly, the SRI had a hard time managing the ponderous HOSTS file especially as people kept making changes and additions.
That was 1979.
Let’s fast forward to 1983. Paul Mockapetris, a computer scientist working on his doctorate from the University of California at Irvine, proposed an idea for a dynamic, distributed database of names and addresses. It became known as the Domain Name System and would forever change the landscape of the internet.
How DNS Works
The first thing I want to accentuate is that DNS isn’t one thing; in fact, it’s a hierarchy of databases that are distributed globally to minimize a single point of failure, maximize scalability and decrease the time it takes to resolve hostnames.
The names in the DNS database form a tree like structure known as the Domain Namespace. The namespace manifests itself as a series of dot separated names.
When you type http://www.fixedbyvonnie.com/ into Internet Explorer or Chrome or whatever browser you like, the first thing the browser does is look at the local DNS cache (special storage space on your computer for fast retrieval) to see if it has an existing record for the domain name. If it’s there the site quickly resolves and loads, if not it’s either because the record expired or was never added; thus, the computer has to send out a DNS query to resolve the IP.
The DNS query goes to the DNS server of your internet service provider (ISP). If the ISP has the record, it returns the IP and displays the page. Otherwise, the ISP sends a query to the Root Nameservers.
The Root Nameservers are the sine quo non for internet operations because they are the first step in resolving hostnames to IP addresses.
There are 13 logical Root Nameserver instances. (I say logical because there are actually more than 13 physical Root Nameservers). Almost all of them are on redundant hardware and are distributed across multiple geographic locations to mitigate a single failure point.
Remember, these servers are absolutely critical for the livelihood of the internet so it was imperative to maximize fault tolerance.
Think of the DNS namespace like the organizational tree of your company. Instead of having a CEO at the top you have 13 co-CEOs (these are the nameservers) then underneath these guys you have the Top Level Domain (TLD) servers and still under those are domain names.
If one level in the hierarchy can’t answer the DNS query, in other words, it doesn’t have a record for the name to resolve, it simply goes up the chain and asks “its boss”. If the boss doesn’t know then he asks his boss and so on. But the buck stops with the root servers.
When a web address is getting resolved, the computer interprets it from right to left and distills the URL into three dot separated sections.
So http://www.fixedbyvonnie.com is seen as
That first dot is usually omitted since all queries start with the Root Nameserver.
Next, you have the Top Level Domain (TLD) which in this case – is the dot com part.
There are a myriad of TLDs for different organizations such as .org for non-profit organizations or .mil for US military sites such as army.mil. The goal of the TLD is to find the Authoritative Server that’s responsible for the domain. In our example, this is the server that knows all the answers for all names that end in fixedbyvonnie.com
When the TLD server finds the Authoritative Server for that domain it sends the response to the Root Nameservers which responds to the ISP name server so that it can build another request to fixedbyvonnie.com’s name server and receive the IP.
It’s seems like an arduous, circuitous path to resolve names this way but it all happens pretty fast within a few milliseconds. Yet, some people use public DNS servers to speed up query responses; I wrote small section about how to use Google’s Public DNS in my article about speeding up Google Chrome.
Zones and Records
Every nameserver has a text file that contains records for part of the DNS namespace. The text file is called a Zone and contains all the data for a given domain. There are several record types but the main ones I’m going to discuss are called SOA, A, NS, CNAME, and MX.
One of the records is simply called an A record. A records map a hostname to an IP address. It might look something like this:
fixedbyvonnie A 18.104.22.168
There are three columns here: the hostname followed by the record type, A, and the IP address.
The other records all have their purposes too:
SOA: Start of Authority is the first record in every zone file that has the primary name server for the zone.
NS: Nameserver is the nameserver for the zone.
CNAME: Canonical Name, this is just a fancy word for a domain alias. So anyone accessing the alias in this record will automatically be directed to the server in the A record.
MX: Mail Exchanger Record details the mail server that responsible for receiving mail for the recipients domain.
A Full Example
So maybe you’re a little bemused on how this whole thing works (I know I was as I was researching this article) so let’s end with a practical example.
Let’s say you’ve never been to my site before and you enter fixedbyvonnie.com into your browser.
Your DNS cache searches for an existing record but comes up empty handed because you’ve never visited fixedbyvonnie – so it queries the Root Nameservers which respond with a list of all the Authoritative Servers for the dot com TLD.
Your computer sends a query for fixedbyvonnie.com to one of those Authoritative Servers and receives a list for all authoritative servers for fixedbyvonnie.com. You then send another DNS query to one of those and get the answer, which is the IP address: 22.214.171.124
And all this happens in less than a second. Whew.
Now, the IP for fixedbyvonnie.com gets cached on your local computer so subsequent requests for my site don’t take this serpentine path through all these servers; the browser simply pulls the data from the local system which has the dual effect of reducing the time to load my site and also reducing the load on the Domain Name Servers.
There’s also one thing I almost forgot to mention.
When DNS records are cached they comes with an expiration value set by the Authoritative Servers known as Time to Live (TTL). Once the timer expires the cached records are flushed and subsequent queries for the same site need an Authoritative Server to deliver answers.
The shorter the TTL the greater the load on the Authoritative Servers but this is actually an ideal setting when changing a web server or MX record because it minimizes downtime. On the flip side, if the TTL is too long then the DNS records point to stale entries which can disrupt service because the resolver tries to resolve something that doesn’t exist.
The Bottom Line
Computers talk the language of numbers. Servers, switches and routers don’t know what microsoft.com or http://www.mysupercoolwordpresssite.com/ is. Therefore, when computers are connected to a network they communicate using IP addresses – but IP addresses are hard for people to remember – that’s why we have DNS.
DNS bridges IP addresses to memorable names so people can get the most out of the internet.