August 12, 2020
How DNS Works
@anthonycorletti

It's easy to take for granted the ability to type google.com into a web browser, press Enter, and be served a powerful search engine.

The DNS, or Domain Name System, is mostly responsible for getting that data from somewhere on earth to your screen.

So how does it work?

The idea is that we want to try and resolve IP addresses to domain names.

For example I go to youtube.com then at some point I'm going to have to work out what the IP address of that server is.

So to do that there needs to be some system in place because me just guessing an IP could take quite a long time.

So I'm going to type in a URL bar in my browser or on the command line with ping or something and then that URL is going to have to be resolved to an IP so I can actually communicate with that server to do whatever it was I want to do.

So let's imagine that I've gone to my browser and I've typed in https://google.com or something like that.

My computer will have a cache of lots of domain names that I've visited recently, but they will last maybe one or two hours for ones on my computer.

So let's assume that we've just turned our computer on after a week or something like that and so it doesn't know where https://google.com is.

It will have to make a DNS query to find out what it is now https://google.com.

The first point of contact will usually be a name server that belongs either to you your organization or ISP, say cmu.edu or verizon.net, something like that; or you might have configured something like Open DNS yourself!

But let's assume that for the sake of argument that we're using our ISP's domain name server.

Now, we don't know what the IP address of google.com is; so we're going to have to ask some other computer that might know.

What would normally happen here is we would ask a name server that we've either configured ourselves or is given to us by our ISP. In this case let's say I'm going to connect to my ISP's server and I'm going to request a web service by asking for https://google.com/?q=my-query.

Now this name server will either know because it has a little cache of records that it can look through. If someone else has been to google.com recently, which is quite likely, it would give that information back quickly, but let's assume that it doesn't and let's also assume that this is set up to be what we would call a recursive resolver; which means it can not only answer queries about DNS, but it can also ask queries about DNS too. So it's got to ask another machine that it thinks might know the answer.

Now our name server hasn't got the darndest idea, because there are a lot of different IP addresses. So what it's going to do is it's going to pick from a list of hard coded root name servers that all computers have access to.

So let's say we ask a root name server, and it says something like, "I don't know what google.com's IP address is this day" - because who knows maybe Google turned their computer off and on again and it's changed – so the root name server doesn't know what IP address to find.

So what happens? Well, first there are global top-level domain name servers (TLDs) to help us by sending us to a .com name server.

So now we're at a .com name server – we're kind of working right to left through the domain name.

So next we will put in a request to this .com name server and it will say I don't actually know ... but I do know that the next place you should ask; it should be kind of the name server or servers that are responsible for this zone or region of domains; let's say something like ns1.google.com, ns2.google.com, ns3.google.com, and so on. Now we're getting closer, right?

Let's let's say hypothetically that these name servers are run by Google and that they know what their own IPs are. If they don't then it isn't going to work because those specific name servers handle the management of records like CNAME, A, AAAA, MX, SPF, and more that are pointing to google.com.

So we actually put in a request to ns1.google.com and they're going to send back what we've actually queried for and it's what the IP of www.google.com is.

So we finally have an IP address! Now we can actually send an HTTP GET request or ping or whatever it is we want to do to that IP and then we can get a response. So that's how DNS works in a nutshell!

Now the DNS service is a little bit more clever than this sort of overview because of some caching mechanisms.

So each server has a cache and a recursive resolver which will also have a cache, and any other resolvers involved in a request will also have a cache.

Let's suppose that an ISPs name server is serving 10,000 customers, right, all of whom were going to google.com; doing what we talked about is a bit of a waste of time right? Especially given that Google are probably not changing their IP every hour.

So the first person in the morning that gets up and goes to google.com is going to have to wait fractions of a second longer because it's going to be doing what I've said.

But then google.com will be put into this cache with a time to live, or TTL, and then for that amount of time this will just serve that straight back.

Which is quite powerful; being able to serve all the ISPs customers really really quickly.

This is basically a kind of distributed database. This name server is going to be putting in lots and lots of requests right to hundreds of servers probably per second because there will be people all on the internet asking I want to go to google.com, youtube.com, facebook.com, randomwebsite.xyz and such.

So if DNS is going to be doing all this. How does it know what to do with any of this all this gibberish of information that's coming back from resolvers?

Well, what DNS does is it has a query ID, which it sends out whenever it sends out a request to a name server, and the name server that responds will respond with the same query ID.

So all dns requests are labeled and connected.

So that allows DNS to sort through the mess of requests a little bit better – but it leads to a slightly interesting quirk, which is that if I send a response to a name server, that it didn't want but I get the query ID, it will accept it, which is for a different post on DNS cache poisoning. For the most part that poisoning doesn't happen because the DNS resolver does actually query an authoritative nameserver, attackers have only a few milliseconds to properly send the fake reply before the real reply from the authoritative nameserver arrives. Never discount speed! 🏎💨