Ping is Bad (Sometimes)
Ok, maybe ping isn't in and of itself "bad". But 2 or 3 times per week, I hear someone say something like "it must be down, I can't ping it". I have to say, this is a bit of a pet peeve for me - in our modern world of firewalls, it's very likely that ping - ICMP echo requests and ICMP echo replies (ICMP Types 8 and 0), are very likely blocked by one firewall or another in the path between the requester and the server or service being tested.
Even more dangerously, I'll hear "the link / server / network / internet is slow - look at my ping times!". Using Ping as a measure of RTT (Round Trip Time) performance is no longer a good way to go. Many ISPs now depress the priority of ICMP packets, so that they'll transport it, but they'll give priority to "real" traffic like HTTP, HTTPS or SMTP.. Network administrators will often also use PING to measure performance of corporate WANs. This can be *very* misleading, as on most such networks, the protocols deemed important are prioritized at various levels, and protocols such as ICMP that are not defined with a QOS will be transported at the default priority, on a best efforts basis. So using ICMP to measure networks that are used for VOIP (Voice over IP), Video over IP, or any traffic governed by QOS (Quality of Service) can be very misleading.
So for a lot of reasons, PING is simply a bad test in many situations. Either it shows things are down when they're up, or if you are using it as a measure of performance, it's not measuring what you think it's measuring.
What should people do? Well, first, test hosts for up/down status on transports that they will receive and reply with. So a webserver should probably be tested using tcp/80, not icmp echo and echo reply. Similarly, RTT (Round Trip Time) performance of networks should be measured using the protocols that we actually wish to measure. Protocols such as tcp/80 (http), tcp/443 (https), or tcp/445 (Server message block (SMB) over IP (Microsoft-DS)).
How do we do this? Well, there are several tools to test exactly this way. Let's cover a few of them:
HPING3:
HPING3 is an nifty little packet crafter, available for source or sometimes binary install on most linux/*nix distros
Let's test a common internet destination:
robv@robv-desktop:~$ hping3 -p 80 -c 2 -S www.google.ca
HPING www.google.ca (eth0 74.125.115.104): S set, 40 headers + 0 data bytes
len=46 ip=74.125.115.104 ttl=128 id=45928 sport=80 flags=SA seq=0 win=64240 rtt=19.6 ms
len=46 ip=74.125.115.104 ttl=128 id=45929 sport=80 flags=SA seq=1 win=64240 rtt=19.0 ms
--- www.google.ca hping statistic ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 19.0/19.3/19.6 ms
Let's use HPING3 to test a DNS server::
robv@robv-desktop:~# hping3 8.8.8.8 --udp -V -p 53
using eth0, addr: 192.168.169.145, MTU: 1500
HPING 8.8.8.8 (eth0 8.8.8.8): udp mode set, 28 headers + 0 data bytes
^C
--- 8.8.8.8 hping statistic ---
6 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
Wait, what happened there? We send udp/53 packets to a DNS server, and didn't get a reply - is it down? Nope, it's not down, it just won't reply unless it gets a properly formatted DNS request. This is common in many UDP services - there isn't a 3 way handshake that we can take advantage of to test up/down status of a service. A packet trace shows us exactly what's going on here (wireshark properly sees this as packets with a DNS destination that are not properly formed DNS packets):
NPING
But what if you're on Windows? NMAP now comes with NPING, a much more flexible echo tool than the traditional PING we've all used for years. Let's test response time to that web server again. I'm using the "-q" option to reduce the output of the command.
C: >nping --tcp -p 80 -q www.google.ca
Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2011-07-20 21:04 Eastern Daylight Time
Raw packets sent: 5 (270B) | Rcvd: 8 (368B) | Lost: 0 (0.00%)
Tx time: 4.39000s | Tx bytes/s: 61.50 | Tx pkts/s: 1.14
Rx time: 5.39000s | Rx bytes/s: 68.27 | Rx pkts/s: 1.48
Nping done: 1 IP address pinged in 6.02 seconds
If you need to explicitly set the QOS values in the test, nping will also do that (a handy reference from TOS - DSCP - binary/decimal/hex flag values can be found here ==> http://www.cisco.com/en/US/docs/voice_ip_comm/bts/4.1/command/reference/93PktCbl.pdf )
This example will send UDP packets on port 17000 (a port the RTP range normally used for VOIP calls), the QOS shown here is DSCP EF, or IP precedence FLASH (the 2 QOS values normally assigned to VOIP)
C: >nping --udp -g 17000 -p 17000 -q --tos 184 172.17.1.209
Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2011-08-07 21:54 Eastern Daylight Time
Raw packets sent: 5 (210B) | Rcvd: 0 (0B) | Lost: 5 (100.00%)
Tx time: 4.00100s | Tx bytes/s: 52.49 | Tx pkts/s: 1.25
Rx time: 5.00100s | Rx bytes/s: 0.00 | Rx pkts/s: 0.00
Nping done: 1 IP address pinged in 6.27 seconds
But again, UDP testing problems strike again - note that nping tells us that we have 100% loss on this test. If you do a packet trace, you'll see that the UDP packets are sent, nothing comes back at all - RTP is the right protocol, but the session needs to be negotiated properly before you'll see traffic. The packet trace below shows that there are no return packets.
Cisco Routers IP SLA
As a side note, Cisco routers have an "IP SLA" feature, which will (at the requesting router), send test TCP or UDP packets on any port, and at the other end, reply back on the same protocol. This neatly solves the "how can I measure my WAN QOS?" problem, but what it doesn't do is measure from a real client to a real destination, so this method won't tell you if the webserver in your datacenter is up or not. I won't show an example here, the product documentation does a good job of that.
PYTHON / SCAPY
Finally, what if you don't have these tools, or can't install tools, or can't change the config on your routers? You can do all of this in a short python script using the scapy library (it's python month for me). Note that the overhead of an interpreted language like python will throw off any RTT times, plus, while Scapy is just about the coolest thing ever, it isn't a speed demon (I think the majority of the delay is in Scapy actually). This method is not a good way to test performance, but it will accurately give you up/down status through ACLs.
The nice thing about using python for this is that it is so portable - if you can't install a tool but have python, you can generally throw your own tool together in short order (I put TCPING and UDPING together during a lunch break at SANSFIRE), especially if you can google for similar examples or documentation.
Here's an example TCPING run (note the high echo time due to the overhead of this method - over 1 second ! ):
robv@robv-desktop:~# python tcping.py www.google.ca 80
WARNING: No route found for IPv6 destination :: (no default route?)
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:55154 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:46152 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:31151 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:18754 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:39331 SA / Padding
Sent 5 packets, received 5 packets. 100.0% hits.
Host is up , approximate RTT is 1002.07920074 ms
I've attached the tcping.py script, as well as a companion udping.py (with the same syntax). They're certainly not the finest python coding you'll ever see, but feel free to review them, and mod them to fit your own requirements if you find them useful. Again, take care when testing UDP services.
I hope this is useful. If you use PING frequently, I hope this sheds some light on why PING might be a good test in some cases, but not in others, and what tools you might use to deal with reachability and QOS issues.
As always, your comments are welcome !
===============
Rob VandenBrink
Metafore
Comments