Auditing a Network for VOIP Call Quality Metrics
What I'm seeing in the security field is a focus on vulnerabilties and exploits, in short - the "cool stuff". And in "real life", we see a much stronger focus on operations and cost.
But what we rarely see is a focus on Audit. Where Audit differs from the day-to-day round of penetration tests, log review and like is that an audit compares a configuration or a set of parameters to a known standard for yes/no or exceed/deficient compliance.
In this diary, I'll do a short description of auditing a WAN link for metrics key to VOIP (Voice over IP) call quality. Just a short proviso - this is not a complete guide to VOIP call quality or auditing for VOIP metrics, it's meant as a starting point which you can take to your own environment and tailor to your own needs and toolset.
So, why would you want to audit a WAN link for VOIP call quality metrics?
1/ To assess if your edge routers are properly re-marking TOS or DSCP bits in the right packets, for delivery to the wAN (commonly done with PBR, Policy Based Routing)
2/ To assess if your WAN provider is honoring your QOS settings, and delivering the appropriate QOS to your various types of traffic
I'll assume that there's at least one Cisco device at each end of the WAN link we're assessing (the commands described are available on IOS switches and routers), but the functions I'm describing are certainly available in most of the other name-brand network platforms.
So first of all, what will we audit in this setup?
Delay - how long does it take a packet to make a round-trip from one end to the other?
Jitter - how much does Delay change during any given call? (zero would be ideal)
MOS (Mean Opinion Scores) - a mathematical distillation of overall call quality to a single value, with 5 being perfect fidelity.
Let's look at the configuration. On the core router, we'll create an IP SLA setup to send test packets to IP SLA Responders at the remote sites. These commands create a simulate actual VOIP traffic and reports on the results. Again, these are Cisco commands (routers or switches), but have analagous functions in other network platforms.
ip sla 1
udp-jitter 192.168.4.249 5001 codec g711alaw codec-numpackets 100 advantage-factor 10
tos 160
timeout 10000
threshold 10000
history enhanced interval 3600 buckets 12
ip sla schedule 1 life forever start-time now
snmp-server community somestring RO ACL_SNMP
access-list standard ACL_SNMP
permit ip snmp.monitor.host.1
permit ip snmp.monitor.host.2
What this does is:
- Simulates a voice conversation of 100 packets, with a TOS (Type of Service) setting of 160.
- The conversation is repeated every 60 seconds (the default), and statistics are kept for 1 hour.
- The "ip sla schedule" line does exactly what it looks like - starts the process with no end time. The snmp-server setup allows us to monitor the statistics using our Network Managemnt System (or the CLI which we'll get to).
Next, at the remote end we'll set up a responder, which responds to the request packets from the core. Basically it takes the UDP packet and sends it back where it came from. Note that the listener port of the responder (5001) has to match the target port in the SLA config on the core router (above)
ip sla responder
ip sla responder udp-echo port 5001
Now, to monitor this we'll use simple SNMP queries. The OIDs (Object IDs) to monitor for Cisco devices are documented at:
http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?local=en&translate=Translate&objectInput=1.3.6.1.4.1.9.9.42.1.5.2.1.42#oidContent
This is a really handy link if you need an explanation of what any particular OID or group of OIDs is about.
So, to monitor MOS from the command line:
C:\>snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.42.1 -ccomplex.string
SNMP++ Get to 192.168.2.237 SNMPV1 Retries=1 Timeout=100ms Community=complex.string
Oid = 1.3.6.1.4.1.9.9.42.1.5.2.1.42.1
Value = 434
Divide that value by 100, to get an MOS value of 4.34
To get Jitter, we'll query the rttMonLatestJitterOperAvgJitter parameter, which is defined as "The average of positive and negative jitter values in SD and DS direction for latest operation":
C:\>snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.46.1 -ccomplex.string
SNMP++ Get to 192.168.2.237 SNMPV1 Retries=1 Timeout=100ms Community=complex.string
Oid = 1.3.6.1.4.1.9.9.42.1.5.2.1.46.1
Value = 1
And for delay, we'll query the maximum RTT (Round Trip Time) value for the latest conversation:
C:\>snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.5.1 -ccomplex.string
SNMP++ Get to 192.168.2.237 SNMPV1 Retries=1 Timeout=100ms Community=complex.string
Oid = 1.3.6.1.4.1.9.9.42.1.5.2.1.5.1
Value = 4
I would expect that in most cases, you'd plug these OID values into your Network Management System and graph them over time. But in a pinch, you can collect them using a Windows CMD file and graph the values in Excel. The batch file QOS.CMD below creates a once-per-minute CSV file that you can read directly into most spreadsheets:
date /t > qos.tmp
time /t >> qos.tmp
rem round trip delay
snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.5.1 -cerbro | find "Value" >> qos.tmp
rem max jitter on last conversation
snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.46.1 -cerbro | find "Value" >> qos.tmp
rem MOS on last conversation
snmpget 192.168.2.237 1.3.6.1.4.1.9.9.42.1.5.2.1.42.1 -cerbro | find "Value" >> qos.tmp
type qos.tmp | sed "s/Value = //" | tr '\n' ',' | tr -d '\r' >> qos.out
echo. >>qos.out
sleep 60
goto LOOPSTART
I use the GNU utils for my sed and tr, mostly because I can bundle everything up in only a few exe and dll files to run on any version of Windows. But the "Services for Unix" (SFU) that's in Windows these days works great also, and is probably a better way to go in most cases - - that tr weirdness might be better handled with SFU for instance. There are also several "snmpget" utilities floating around, each with slightly different syntax.
To meet our standard for "good quality voice", the target values we are auditing against are:
Delay
When listening to speech, the human ear normally accepts up to about 150 ms of delay without noticing it (discussed in the ITU G.114 standard) Once the delay exceeds 150 ms, a conversation becomes akin to speaking on a walkie-talkie, with weird pauses in the conversation. In high delay environments, people tend to wait for their partner to finish speaking before talking (as on a walkie-talkie)
Jitter
Jitter values should be low, zero is the ideal. Values of over 20-30 ms will degrade voice quality, resulting in choppy sounding conversations or sometimes "echo". Jitter can be compensated for in the end devices (jitter buffers on the phones for instance), or by proper prioritization and queuing of voice traffic. Excessive Jitter can indicate a problem in the queueing and forwarding algorithms on the WAN, exceeding the EF (express forwarding) budget on the WAN, or non-uniform delays imposed by queuing, encapsulation or encryption, or any forwarding operation in the path.
MOS
MOS is a simple, overall measure of call quality. If you graph one statistic for Management, this should be it, with the caption "Greater than 4 is Good". The formal matrix for MOS definitions is:
MOS |
Quality | Impairment |
5 | Excellent | Imperceptible |
4 | Good | Perceptible but not annoying |
3 | Fair | Slightly annoying |
2 | Poor | Annoying |
1 | Bad | Very annoying |
I hope this short description helps you in assessing your network for VOIP readiness, or helps in troubleshooting VOIP issues. More importantly, I hope that this emphasizes the importance of Audit (the "A" in SANS) as an important part of your Security matrix.
===============
Rob VandenBrink
Metafore
Comments