Cleaning Up After the Leak: Hiding exposed web content

Published: 2013-04-08
Last Updated: 2013-04-09 00:12:56 UTC
by Johannes Ullrich (Version: 1)
1 comment(s)

Just this weekend, a user notified us of a company leaking sensitive information on its website. The information was readily available via Google, which is how the reader found it. The news outlets also talked about a case where the secret firmware key used to sign BIOS firmware from motherboard vendor MSI leaked due to an open FTP server, essentially invalidating the security of modern UEFI motherboards.

So what do you do? Someone notifies you "hey, I found this document on your website, and I don't think it should be there". First thing would be to verify the leak ("Identification"). Don't forget to send back a big thank you.

Next we need to contain the incident. You are probably looking for a quick fix first. Something to stop the bleeding. Lets assume you don't have an actual "breach", so your systems are not compromised, just someone didn't use proper care when they published the documents.

Here are some quick fix options:

- setup a web application firewall rule to block access to the documents if you can identify common properties ("all PDFs", "all Excel spreadsheets in the /accounting directory", "all documents that contain the string 'SECRET' in the header).

- if you don't have a web application firewall, you may be able to do something similar with your web server configuration, but sometimes you are less flexible when it comes to that

- remove the documents from the web server. You probably don't just want to delete them. Either move them out of the document root (minimum) or to a different system, tape, CD or some other medium

This may be part of the identification step, but I suggest you first remove access to the content before you check your web logs to figure out who accessed the documents. Who needs to be notified of the leak internally or externally? 

Next plan the real fix (Eradication)

- who needs access to the documents?
- do we already have an authentication system we can leverage?
- how critical are the documents? What is an appropriate authentication scheme for them?

Don't rush this part! It can be hard to come up with correct access control rules after the fact, and it will take some time to get this right.

Finally, don't forget the cleanup of external copies. Remember: Once it is online, it is online for ever

- check search engines for cached copies of the content, and ask them to remove it
- while "robots.txt" is not a security feature, blocking access via robots.txt can speed up search engine removal
- search for other copies online of the content (Google, Bing, Pastebin, Twitter...) and try to remove these copies

It may be very hard, or impossible, to remove all copies. 

Once the fix is tested, you probably want to make the documents available, or in some cases, the real solution may be not to offer the documents online in the form in which you had them online. ("Recovery").

Lastly, don't forget the "Lessons Learned" part. In particular, don't forget to look at other spots where you made the same mistake, and try to fix the process used to make content live on your website. It is hardly ever the fault of an individual, but instead, a failure in the content management process, that leads to leaks like this.

Johannes B. Ullrich, Ph.D.
SANS Technology Institute

Keywords: web app sec
1 comment(s)


Amazing article i just have one question i didn't get the part about the robots.txt how is it not a security feature ?

Diary Archives