Category Archives: Protocols

How DNS works – part 1

Even though the DNS (Domain Name System) protocol isn’t hard to understand, there seems to be quite some confusion about it.

So I’m going to explain a little what it is, why we need it, and how it works. Hopefully I’m going to make it short.

What is DNS?

Roughly speaking, the domain name system is like a phone book, but for computers. It is a big directory for computers to find other computers.

Why do we need DNS?

Why would a computer need a directory to locate another computer, as they are all connected already through the internet, aren’t they?

Well, computers connect to each other in a way that is a bit analogous to the way telephones connect to each other. When making a phone call, you know who you want to talk to, their name, maybe where they physically are located, but you need to dial the right number to reach their phone, because those devices only work with numbers, not names or anything else. In the same way, if I type www.google.com in the address bar of my browser, my computer doesn’t really know to which one, among the millions (billions?) of other computers on the internet, it should connect to. It needs to have the numerical address of Google’s server, what is called its IP address.

When you came to read this particular blog post, your computer took the web address (called a URL) that was mentioned in the link you clicked, queried a DNS server to get the only piece of information that it can use to connect to the machine that hosts this blog, which is: the IP address for nicolascadou.com.

So the domain name system is about names, and their correspondence to numbers. It makes it easy for humans, who won’t remember those pesky numbers unless forced to.

It also eases the job of system administrators, who mention domain names in the configuration of the machines which need to connect to other ones, so that when one server gets moved to another network, they don’t have to reconfigure everything, they just put the new IP address in the right DNS server, and that’s it, no need to modify the configuration of the other machines.

How does DNS work?

First, let’s take a quick look at the anatomy of the names used with The DNS protocol, taking “www.google.com” as an example.

The whole thing is called a hostname. It is a name that refers to the IP address of a computer (the host for Google’s search page.) This hostname is composed of several parts, separated by dots. Technically those parts are called labels, and in our example there are three of them: “com”, “google”, and “www”.

Notice that I listed the labels in reverse order, from right to left.

That may seem counter-intuitive, but those parts represent a cascading, tree-like hierarchy of domains and subdomains which is read from right to left, with the topmost domain being the rightmost one: “com”. This topmost (rightmost) domain has a special name: it is called the TLD (Top Level Domain.)

I said the domain name system is like a phone book, but in reality, it is a huge, hierarchical collection of phone books. This is important, as when the time comes to change DNS records (which are contained in a DNS zone, served by a DNS server) one must find out who’s the authority for that domain, because this party is the only one with the power to do so.

So, for the sake of keeping this blog post short, let’s see right away how the act by which a computer will retrieve the IP address corresponding to a hostname, the act of resolving the hostname by performing a DNS query, is done. The resolver will ask a DNS server for the IP address corresponding to the hostname, wait for an answer, and behind the scenes, the DNS server will itself perform as many DNS queries as needed to get the answer, traversing the DNS hierarchy in this manner:

  1. First, it has to start somewhere, it will query the root DNS servers. All DNS servers must know in advance either the IP addresses of those root servers, or the IP address of another DNS server which know either the root servers, or another DNS server which does, etc. Otherwise it can’t work. The root servers know who is responsible for each TLD. So our DNS server picks one of the root servers and asks: “what is the IP address for www.google.com?
  2. The root server answers: “I don’t know. But I know who’s responsible for the “com” TLD, here are the names and IP addresses of their DNS servers. Pick one.”
  3. So our DNS server proceeds to send the same query to one of those servers: “what is the IP address for www.google.com?
  4. The DNS server responsible for the “com” TLD then answers: “I don’t know. But I know who’s responsible for the “google.com” domain, here are the names and IP addresses of their DNS servers.”
  5. Our DNS server then sends the exact same query to one of those DNS servers at Google: “what is the IP address for www.google.com?
  6. Finally, an answer in given. Google’s DNS server says: “Here’s the IP addresses for www.google.com, pick any one.”

When a DNS server hasn’t got the answer, it can proceed to query directly the next DNS server in the hierarchy, if the resolver asks for it. A DNS server with such capabilities is called a recursive name server.

Conclusion

I hope this summary information is helpful. I think it goes a long way explaining the basics of the domain name system.

I will post a second part in a few days; or weeks, who knows. It will cover domain registration, WHOIS information and others.

Share