A TCP/IP mystery (solved)

Published: 2006-03-13
Last Updated: 2006-03-14 04:03:30 UTC
by Jim Clausing (Version: 1)
0 comment(s)
For a little bit of background, the problem described below actually began for a previous client about a year ago, but it wasn't until Friday that I was actually able to get enough data to figure out what the problem really was, so bear with me as I walk you through the tale.

In the beginning

The client I was working for is a multinational Fortune 500 company headquartered in the US.  About a year ago, we started getting complaints that some individuals in one of the lines of business was no longer able to exchange e-mail with a supplier.  We got the parties involved and a conference call to troubleshoot, but the supplier's networking folks weren't all that knowledgeable.  Our conclusion was that their responses (the SYN/ACK packets) to the SYN packets from our external mail relays were not reaching our firewall.  After some fingerpointing back and forth, the supplier finally mentioned that they had recently "upgraded their firewall."  They decided to rollback to the previous version and suddenly e-mail was flowing again.  We tried to find out what had changed between versions, but were never able to find out.  Not an entirely satisfactory resolution from my point of view, but the client and their supplier were happy, so we chalked that one up to experience and moved on.  As you might expect, a few months later, they had the same problem with another business partner.  Again, rolling back to the previous version solved the problem, and again no one was able to show us what had changed so that we could try to figure out where the real problem existed.  Out of this second situation, we did learn a little bit more, namely that the firewall was actually Linux/iptables-based.  That should have made it easier, I have been using Linux and iptables for my home firewall for several years, but we still weren't able to hook up with anyone who understood the Linux well enough on the partner side to get a dump of the rules or anything.

Fast forward to last month.  The problem rears its ugly head again.  This time, the third-party is able to get us packet captures of the traffic (both working and failing) and a dump of the rules for the failure case.  Finally, we might have enough data to solve the mystery.  I should probably mention that while I've moved on to a different position in the company, I'm still in contact with my former team that is still working with this client.  After looking things over for a while they were still confused, so they gave me a call on Friday.

Now maybe we're getting somewhere

I decided to sit down and see if I could tell what was different between the capture when the traffic succeeded (all the firewall rules were turned off, the firewall was essentially acting as a router) and the case where the traffic was failing.  So I sat down and looked at the 2 captures.  Now, I'd like to say that I was 'man enough' to figure it out just by looking at it with tcpdump -x, but I didn't even try, I went to ethereal almost immediately because I didn't have Stevens handy at the time (I was working from home and the book was on my shelf at the office).

First, the traffic when the communication succeeded.

jac@leibnitz[513]$ tcpdump -r nofw.pcap -c4
reading from file nofw.pcap, link-type EN10MB (Ethernet)
19:15:24.492173 IP client.com.28680 > partner.com.smtp: S 604326096:604326096(0) win 65535 <mss 1460,nop,nop,sackOK>
19:15:24.492242 IP partner.com.smtp > client.com.28680: S 150399351:150399351(0) ack 604326097 win 5840 <mss 1460>
19:15:24.541465 IP client.com.28680 > partner.com.smtp: . ack 1 win 65535
19:15:24.550951 IP partner.com.smtp > client.com.28680: P 1:95(94) ack 1 win 5840

And when it failed

jac@leibnitz[514]$ tcpdump -r failed.pcap -c4
reading from file failed.pcap, link-type EN10MB (Ethernet)
19:08:51.231906 IP client.com.21644 > partner.com.smtp: S 2389181400:2389181400(0) win 65535 <mss 1460,nop,nop,sackOK>
19:08:51.231978 IP partner.com.smtp > client.com.21644: S 4040255024:4040255024(0) ack 2389181401 win 5840 <mss 1460>
19:08:54.391432 IP client.com.21644 > partner.com.smtp: S 2389181400:2389181400(0) win 65535 <mss 1460,nop,nop,sackOK>
19:08:54.391638 IP partner.com.smtp > client.com.21644: S 4040255024:4040255024(0) ack 2389181401 win 5840 <mss 1460>


Okay, so the partner server is sending the SYN/ACK back, but by sniffing on the firewall at the client end, it wasn't making it all the way back, so it was getting dropped somewhere upstream of the firewall.  So I decided to look at the SYN/ACK packets from the 2 captures a little more closely with ethereal (or actually tethereal) and since I'm suspecting a router is dropping this someplace upstream of my firewall, I'm going to concentrate on layer 3 (the IP header).  Why?  Because that should be all the further into the packet that a router should be looking.  So, first the one that works

Frame 2 (58 bytes on wire, 58 bytes captured)
    Arrival Time: Mar  8, 2006 19:15:24.492242000
    Time delta from previous packet: 0.000069000 seconds
    Time since reference or first frame: 0.000069000 seconds
    Frame Number: 2
    Packet Length: 58 bytes
    Capture Length: 58 bytes
    Protocols in frame: eth:ip:tcp
Ethernet II, Src: Intel_f3:78:c3 (00:0c:f1:f3:78:c3), Dst: Intel_a8:7d:0a (00:d0:b7:a8:7d:0a)
    Destination: Intel_a8:7d:0a (00:d0:b7:a8:7d:0a)
    Source: Intel_f3:78:c3 (00:0c:f1:f3:78:c3)
    Type: IP (0x0800)
Internet Protocol, Src: aa.bb.cc.dd (aa.bb.cc.dd), Dst: ee.ff.gg.hh (ee.ff.gg.hh)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 44
    Identification: 0x0000 (0)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (0x06)
    Header checksum: 0x0a19 [correct]
        Good: True
        Bad : False
    Source: aa.bb.cc.dd (aa.bb.cc.dd)
    Destination: ee.ff.gg.hh (ee.ff.gg.hh)

And then the one that doesn't

Frame 2 (58 bytes on wire, 58 bytes captured)
    Arrival Time: Mar  8, 2006 19:08:51.231978000
    Time delta from previous packet: 0.000072000 seconds
    Time since reference or first frame: 0.000072000 seconds
    Frame Number: 2
    Packet Length: 58 bytes
    Capture Length: 58 bytes
    Protocols in frame: eth:ip:tcp
Ethernet II, Src: Intel_f3:78:c3 (00:0c:f1:f3:78:c3), Dst: Intel_a8:7d:0a (00:d0:b7:a8:7d:0a)
    Destination: Intel_a8:7d:0a (00:d0:b7:a8:7d:0a)
    Source: Intel_f3:78:c3 (00:0c:f1:f3:78:c3)
    Type: IP (0x0800)
Internet Protocol, Src: aa.bb.cc.dd (aa.bb.cc.dd), Dst: ee.ff.gg.hh (ee.ff.gg.hh)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x04 (DSCP 0x01: Unknown DSCP; ECN: 0x00)
        0000 01.. = Differentiated Services Codepoint: Unknown (0x01)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 44
    Identification: 0x0000 (0)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (0x06)
    Header checksum: 0x0a15 [correct]
        Good: True
        Bad : False
    Source: aa.bb.cc.dd (aa.bb.cc.dd)
    Destination: ee.ff.gg.hh (ee.ff.gg.hh)


Aha!!  There is something different in the "Differentiated Services Field" (previously known as the TOS byte).  So, a quick search on Google and I found RFC 791, that defined the original use of this, but I also found RFCs 1812, 2474, 2780, 3154, and 3168 which supersede 791 with respect to the usage of that byte in the IP header.  Interesting.  This also pointed me at http://www.iana.org/assignments/dscp-registry which describes the valid DSCP codepoints.  As ethereal had already told me, 0x01 is unknown (i.e., not defined in the registry).  Okay, now a quick look at the rules from the firewall and I see the following:

# Completed on Tue Mar  7 15:58:27 2006
# Generated by iptables-save v1.2.9 on Tue Mar  7 15:58:27 2006
*mangle
:PREROUTING ACCEPT [64977797:109928599664]
:INPUT ACCEPT [64929633:109926287792]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [68495493:123011914416]
:POSTROUTING ACCEPT [68435927:123006680932]
-A PREROUTING -i eth0 -p tcp -m tcp --sport 110 -j TOS --set-tos 0x04
-A PREROUTING -i eth0 -p tcp -m tcp --sport 1110 -j TOS --set-tos 0x04
-A PREROUTING -i eth0 -p tcp -m tcp --sport 465 -j TOS --set-tos 0x04
-A PREROUTING -i eth0 -p tcp -m tcp --sport 993 -j TOS --set-tos 0x04
-A PREROUTING -i eth0 -p tcp -m tcp --sport 995 -j TOS --set-tos 0x04
-A PREROUTING -i eth0 -p tcp -m tcp --sport 20 -j TOS --set-tos 0x08
-A PREROUTING -i eth0 -p tcp -m tcp --sport 21 -j TOS --set-tos 0x08
-A PREROUTING -i eth0 -p tcp -m tcp --sport 22 -j TOS --set-tos 0x10
-A PREROUTING -i eth0 -p tcp -m tcp --sport 25 -j TOS --set-tos 0x10
-A PREROUTING -i eth0 -p tcp -m tcp --sport 53 -j TOS --set-tos 0x10
-A PREROUTING -i eth0 -p tcp -m tcp --sport 80 -j TOS --set-tos 0x10
-A PREROUTING -i eth0 -p tcp -m tcp --sport 443 -j TOS --set-tos 0x10
-A PREROUTING -i eth0 -p tcp -m tcp --sport 512:65535 -j TOS --set-tos 0x04
-A PREROUTING -p tcp -m tcp --sport 443 -j TOS --set-tos 0x08
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 110 -j TOS --set-tos 0x04
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 1110 -j TOS --set-tos 0x04
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 465 -j TOS --set-tos 0x04
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 993 -j TOS --set-tos 0x04
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 995 -j TOS --set-tos 0x04
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 20 -j TOS --set-tos 0x08
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 21 -j TOS --set-tos 0x08
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 22 -j TOS --set-tos 0x10
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 25 -j TOS --set-tos 0x10
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 53 -j TOS --set-tos 0x10
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 80 -j TOS --set-tos 0x10
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 443 -j TOS --set-tos 0x10
-A POSTROUTING -o eth0 -p tcp -m tcp --dport 512:65535 -j TOS --set-tos 0x04
COMMIT
# Completed on Tue Mar  7 15:58:27 2006


Aha, the new version of the firewall is setting the TOS byte, and since the client is initiating the connection the partners firewall is setting it to 0x04 by the following rule (if the partner initiates, it would be getting set to 0x10)

-A POSTROUTING -o eth0 -p tcp -m tcp --dport 512:65535 -j TOS --set-tos 0x04

Okay, now we're getting somewhere, but why would that one bit cause the packet to be dropped and where was it getting dropped?

So, can we look upstream?

At this point, I asked my old team if they could take a look at the configs of the border router just outside the client firewall and see if we could see anything there that might be causing problems.  They got me a copy of the configs and a couple of things jumped out at me right away.

 policy-map mark-inbound-http-hacks
   class http-hacks
    set ip dscp 1
 !


Hmm....  so the border router is setting the DSCP value to 1 for things that it believes are "http-hacks".  That looks pretty ominous.  And a little further down

route-map null_policy_route permit 10 
match ip address 106 
set interface Null0

access-list 106 permit ip any any dscp 1


And there we seem to have it, anything with a DSCP value of 0x01 is getting null-routed at our border router.  I talked to some of the network guys and they remembered putting this policy-map in.  In fact, this comes directly from Cisco (see http://www.cisco.com/warp/public/63/nbar_acl_codered.shtml).  So that has been in the border routers form a long time (if not actually from Code Red, then not too long after).  By the way, for those that don't remember, Code Red was in 2001.  So, what we seem to have is the 2 endpoints of the TCP conversation using different interpretations of that TOS/DSCP byte in the IP header.  Also, that Cisco doc contains the following sentence, "This document uses a DSCP value of 1 (in decimal) since it is unlikely that any other network traffic is carrying this value" which would probably be true if no one was still using the old TOS interpretation of that byte.

So how do we finally solve it

Well, my cursory reading of RFC 2474, suggests that we shouldn't be calling this the TOS byte anymore.  That usage has been superseded.  Further, it appears that the DSCP value shouldn't really be propagated beyond the bounds of a single organization's administrative control.  Cisco has a couple of documents for service providers that recommends that the DSCP values be re-marked (the term used in RFC 2474) at administrative boundaries (lest someone try to run their traffic through the service provider's network with an artificially high priority).  So, one conclusion is that we should be sure we are clearing these bits when they leave or enter our administrative control.  The client is waiting on change control approval and debating whether to remove the policy-map completely or change the DSCP value to 3 (which is also undefined in the IANA registry mentioned above and doesn't conflict with any old TOS uses of that byte).  I'm sure there are lots of other conclusions that could be drawn from this, but for the moment, I'm not going to draw anymore.  I'll leave that to you, our faithful readers.


----------------------------

Jim Clausing, jclausing --at-- isc.sans.org

Keywords:
0 comment(s)

Comments


Diary Archives