DNS Option 15: Debugging DNSSEC Errors.
Last Updated: 2022-09-27 13:41:11 UTC
by Johannes Ullrich (Version: 1)
DNSSEC has had a rough ride so far. I usually say that the mistake made with DNSSEC was that security came first in the design, ahead of usability. The result is that the implementation of DNSSEC is usually compliance driven and not widespread. There are two parts to implementing DNSSEC:
- DNSSEC Validation: This is done by resolvers. A resolver may check if a record it receives is correctly signed before forwarding it. This is pretty easy to implement in resolvers, and many large public resolvers validate DNSSEC, and by doing so, they protect their users. You typically just need to enable the feature and ensure the current root zone key is configured correctly.
- DNSSEC Signing: By digitally signing zone data (adding RRSIG and the necessary DNSKEY records), a particular zone is protected with DNSSEC. This is tricky as these signatures need to be maintained, and DS records verifying the keys need to be maintained with your registrar/parent zone. If you mess up, your zone will no longer resolve as long as the resolver validates the signatures.
One important issue is the "DS" record. Lets quickly review how DNSSEC works:
- For each zone ("domain"), you create one or more "Key Signing Keys"
- You use the Key Signing Key to sign some "Zone Signing Keys"
- The "Zone Signing Key" is used to create signatures for individual records.
Why the two keys? This is to optimize the tradeoff between key strengths, speed, and the need to maintain and rotate the keys.
The critical record here is the "DS" record: How does a resolver know that a zone is signed with DNSSEC, and which key to trust? The resolver for the parent zone will offer a DS record. This record is a hash of any Key Signing Key used by the zone. And this is why there are two keys:
Key Signing Key (KSK): longer key does not often change as it is a pain in some cases to update the corresponding DS record with the parent zone.
Zone Signing Key (ZSK): shorter key. This way, the crypto is simpler/faster, and we rotate the key more often. It is signed using the KSK.
So, in short: no DS record -> no DNSSEC. If there is a DS record: you better make sure DNSSEC works for your zone, or you end up with a self inflicted DoS (done this a few times as you will see)
I recently set up a new zone for internal use: sti-admin.com. I usually host my zones with Google, and Google makes DNSSEC pretty easy: You check a checkbox, and you are good to go. But, I wanted to do it the hard way because I needed a bit more flexibility for this domain to do dynamic updates and such. Now... I didn't want to skip DNSSEC as I somewhat believe in doing things securely no matter the pain. As I say: My shadow IT has its own shadow security stack.
So I experimented with a BIND9 feature to automatically sign the zones. In the past, you had to run a script to add signature records to your zone file. This meant that you had to do some scripting to rotate keys and resign the zone ever so often, and it didn't work well with dynamic updates. BIND introduced a feature to have the nameserver itself create the signature records "on the fly". All you do is add a "dnssec-policy" option to the zone. You may create your own policy if you do not like the default policy. But once you did that, you are good to go. Reload the configuration and named will automatically create keys and signature records.
But you still have to provide your registrar with the DS records so they can be signed by the parent zone. For Google, you do that via a simple form on their website.
Initially, I actually got it right, and it worked (amazing!). See below the graph from dnsviz.net, a great site to verify DNSSEC deployments.
But... why stop here? You learn if things go wrong!
So I wasn't happy with some of the configurations and started experimenting with different policies. Sadly, that broke things. And as I said: That is where you start learning. I did observe that the public resolver at '18.104.22.168' is answering in a peculiar way if DNSSEC is broken:
% dig sti-admin.com @22.214.171.124
; <<>> DiG 9.10.6 <<>> sti-admin.com @126.96.36.199
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 52403
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; OPT=15: 00 0a 66 61 69 6c 65 64 20 74 6f 20 76 65 72 69 66 79 20 73 69 67 6e 61 74 75 72 65 73 20 66 6f 72 20 73 74 69 2d 61 64 6d 69 6e 2e 63 6f 6d 2e 20 6f 70 74 2d 6f 75 74 20 70 72 6f 6f 66 ("..failed to verify signatures for sti-admin.com. opt-out proof")
;; QUESTION SECTION:
;sti-admin.com. IN A
;; Query time: 124 msec
;; SERVER: 188.8.131.52#53(184.108.40.206)
;; WHEN: Tue Sep 27 08:42:13 EDT 2022
;; MSG SIZE rcvd: 108
It uses an "Option 15" to provide a human-readable error message! Who would have thought about a user-friendly feature like that? The "dig" utility even displays this nicely for you.
I have not seen this from other nameservers, so this may be a Cloudflare special. But thanks, Cloudflare! :). The option is introduced in RFC 8914. The option isn't just useful for DNSSEC, but could be used any time a query fails, and the DNS server would like to return more details vs just a simple "ServFail" error.
Here is the complete packet:
08:41:40.176092 IP (tos 0x0, ttl 58, id 49070, offset 0, flags [DF], proto UDP (17), length 136)
220.127.116.11.53 > 10.5.1.126.56671: [udp sum ok] 52403 ServFail q: A? sti-admin.com. 0/0/1 ar: . OPT UDPsize=1232 (108)
0x0000: 4500 0088 bfae 4000 3a11 7332 0101 0101 E.....@.:.s2....
0x0010: 0a05 017e 0035 dd5f 0074 7319 ccb3 8182 ...~.5._.ts.....
0x0020: 0001 0000 0000 0001 0973 7469 2d61 646d .........sti-adm
0x0030: 696e 0363 6f6d 0000 0100 0100 0029 04d0 in.com.......)..
0x0040: 0000 0000 0042 000f 003e 000a 6661 696c .....B...>..fail
0x0050: 6564 2074 6f20 7665 7269 6679 2073 6967 ed.to.verify.sig
0x0060: 6e61 7475 7265 7320 666f 7220 7374 692d natures.for.sti-
0x0070: 6164 6d69 6e2e 636f 6d2e 206f 7074 2d6f admin.com..opt-o
0x0080: 7574 2070 726f 6f66 ut.proof
I highlighted the option data in red and underlined it:
000f - 15 the option-code
003e - length (62 Bytes)
000a - Info-Code. 10 = RRSIGs Missing
followed by the text description of the error.
Next step: A snort signature to alert me if one of my zones is badly signed, triggering this error :). But maybe a cron job to resolve them via 18.104.22.168 will be easier to detect errors.
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu