traceroute is a hack! (the meandering portion of Week 8 of Hacker School)
Like many of my weeks in Hacker School, this one started out meandering and got more focused towards the end. Some of my fellow students have told me they fall into the exact opposite pattern — they start out strong and then peter out — but I tend to get my wind at the end of the week and try to do as much as I can over the weekend.
That’s not to say that I in any way regret puttering around. One of my favorite things to do at Hacker School is to get into a conversation with someone about something I know nothing about, and then end up working on the project with them for a little while. Often I end up following some tangent that turns into a project idea for later in the summer, that I never would have thought of.
For instance, on Monday morning, my fellow Hacker Schooler Sunil Abraham got into a long conversation across the table from me with Jessica McKellar, our resident for the week. One of her specialties is networking, and Sunil was having her explain a few things, because one of his goals for the summer was to learn more about networking and the protocols governing the internet. One of his other goals was to learn a language like OCaml or Haskell, so he figured he would follow Jessica’s suggestion to re-implement the Unix utility ping, but he wanted to do it in OCaml.
So I figured I’d help him out, since I knew a little bit about SML, a forerunner to OCaml, and since I’d never even thought of implementing a network utility until that morning, so I was curious to see if it could be done.
Two mornings later, after having switched from trying to implement ping to trying to implement traceroute instead (for reasons too obscure to go into in this blog post), we wisely abandoned this project, but I came out of it knowing quite a few little tidbits I never knew before, like:
- How to get OCaml running in VIM and piped to a repl using the utility tmux, which I’d never used before,
- How to use the OCaml package manager (opam),
- How to access C networking utilities using the OCaml socket library,
- and last but not least, that traceroute is a total hack!
Let me explain that last point.
Before this week, I had a big misconception about what traceroute actually does. I assumed that it traces a specific route, meaning that a packet leaves my computer and sends back messages from every node along the way until it reaches its target, but that’s not what happens at all. What actually happens is that for each hop, three packets are sent to the target address with a special field called a TTL (Time To Live) set to expire immediately, and return messages to the originator. Then three more packets are sent out to the target, but set to expire after two hops, at which point the nodes two hops away send back messages. Then three more, set to expire after three hops, and so on, until the target is reached or a node is encountered which blocks traceroute (this happens a lot). So the path may or may not represent a path that any particular packet actually took, but it’s likely to be a close approximation, unless something major happens to the internet in between when the originator sends one set of packets and when it sends the next.
In case you’ve never seen a traceroute output, this is what it looks like. Notice that some nodes have multiple addresses, meaning that the three different packets went to more than one place, but that there are always three times listed, for each line number:
Harrington-MacBook-Pro:~ richardharrington$ traceroute nytimes.com traceroute: Warning: nytimes.com has multiple addresses; using 220.127.116.11 traceroute to nytimes.com (18.104.22.168), 64 hops max, 52 byte packets 1 10.0.1.1 (10.0.1.1) 1.044 ms 0.901 ms 0.704 ms 2 10.32.0.1 (10.32.0.1) 12.072 ms 9.925 ms 13.181 ms 3 gig-0-3-0-8-nycmnya-rtr2.nyc.rr.com (22.214.171.124) 12.111 ms 13.007 ms 14.814 ms 4 bun101.nycmnytg-rtr001.nyc.rr.com (126.96.36.199) 15.769 ms 20.855 ms 19.957 ms 5 bun6-nycmnytg-rtr002.nyc.rr.com (188.8.131.52) 18.767 ms 20.741 ms 13.818 ms 6 ae-4-0.cr0.nyc30.tbone.rr.com (184.108.40.206) 12.291 ms 16.625 ms 20.841 ms 7 ae-1-0.pr0.nyc30.tbone.rr.com (220.127.116.11) 14.191 ms 15.702 ms 18.104.22.168 (22.214.171.124) 14.615 ms 8 xe-4-2-0.edge4.frankfurt1.level3.net (126.96.36.199) 17.473 ms 13.600 ms ae9.edge3.newark1.level3.net (188.8.131.52) 13.725 ms 9 ae-21-52.car1.newark1.level3.net (184.108.40.206) 15.267 ms ae-11-51.car1.newark1.level3.net (220.127.116.11) 11.700 ms ae-21-52.car1.newark1.level3.net (18.104.22.168) 14.945 ms 10 new-york-ti.car1.newark1.level3.net (22.214.171.124) 17.421 ms 19.263 ms 14.790 ms 11 126.96.36.199 (188.8.131.52) 13.693 ms 17.985 ms 16.313 ms
Interestingly, its use in traceroute is not the main purpose of the TTL field, which was intended as a way for unreliable communications protocols to signal a message failure.
Communication protocols that guarantee delivery of packets, like TCP (the protocol layer that underlies web pages and email, among other things), use very robust communication channels, including requests for redelivery of packets that fail. The originator certainly knows very quickly when a packet has not reached its destination. But there are other protocols, such as UDP (used for streaming video, among other things) and ICMP (used for all kinds of things, particularly message-passing), which are much cheaper and more efficient by virtue of the fact that they’re not actually guaranteed to reach their destination. Packets sent via these protocols need some simple way to alert the sender that the packet has failed, though, and they normally do this by having their TTL field set to a high number (the default is 64, I believe) and decremented by one at each hop. If the TTL reaches zero at any node short of its destination, the node drops the packet and sends an ICMP “Time To Live exceeded in transit” message back to the originator of the packet.
So traceroute takes advantage of this system by sending out messages with TTLs of 1, then 2, then 3, etc., corresponding to the line numbers above. The contents of the message are irrelevant; only the TTL fields themselves matter.
So it’s a hack, but as my fellow Hacker Schooler Martin Törnwall points out, it’s a very elegant hack. And something I may try to implement in Clojure or ClojureScript later in the summer. If I have time.