Silently Profiling Unknown Malware Samples
Yesterday I came across a suspicious binary from which MD5 and SHA hashes didn't match any sample on public malware repositories, such as VirusTotal (VT) and TotalHash, nor known sandboxes.
An initial option would be submitting the dubious binary to VT and have it analyzed by 60+ engines to find out more. However, depending on the case, publicly exposing a file may not be a good choice for different reasons, like alerting adversary he was found or merely publishing file's content.
It wasn't my case as I was almost sure my sample was a common WannaCry variant, and I would have no problem publicly exposing it. Even though I decided to study alternative ways to discover it before determining going deep with static and dynamic analysis.
Comparing characteristics
It is common for samples of the same family share some similarities and fortunately, there are known approaches and tools to help us identify a specific binary comparing it.
IMPHASH
One of those is Imphash[1], which computes a fingerprint of the binary's IAT (Import Address Table). In a PE (Portable Executable) file, IAT contains the list of the dynamically linked libraries and functions a given binary needs to run. Thus, the idea here is: if two binaries have the same "imphash", there are high chances they have similar objectives.
Computing imphash of my suspicious sample, we have:
--
In [1]: import pefile
In [2]: file=pefile.PE('tasksche.exe')
In [3]: file.get_imphash()
Out[3]: '68f013d7437aa653a8a98a05807afeb1'
--
As seem, using Python "pefile" module it is easy to calc imphash. Now let's compare it.
To this end, I'll use #totalhash [2], a malware analysis database which, amongst other functionalities, allows for "imphash" search.
As seen, 17 binaries with different hashes matched to my file’s imphashs. Choosing one of them, and looking at the details, my initial suspicion began to be confirmed.
As a bonus, I discovered that it is possible to use imphash function inside Yara rules. Take a look.
SSDEEP
To step up the analysis, I obtained one of those 17 files and performed an additional check with ssdeep [3], which computes a fuzzy hash of sequences of identical bytes in the same order in a binary.
I copied both files (my sample and the one I obtained) to a test directory and rum ssdeep, as seen in Figure 3.
Ssdeep score ranges from 0 to 100. Higher the number, more is the homologies between files.
RADIFF2
Until now, we have high indicators that our suspicious file is a WannaCry variant, but it is possible to check for additional characteristics.
There is a tool called “raddif2” part of Radare2 Project [4] which disassembles and compare binaries. Using it, I compared the main function between my binary and the sample I obtained using the following command.
$ radiff2 -g main tasksche.exe wannacry-sample1.exe | xdot –
From this, a graphic was generated, as seen in Figure 3.
Although it is not possible to read the figure contents, radiff2+xdot uses colors to represent differences. Grey is a perfect match, Yellow indicates some offsets don’t match, and red shows a substantial difference [4].
Conclusion
Discovering that my “unknown” sample is a regular WannaCry variant with a high degree of certainty was enough for my scenario. Of course, depending on the case, further analysis may be required to make sure dissimilarities does not represent malware modifications with important implications to scope the incident.
References:
[1] https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-import-hashing.html
[2] https://totalhash.cymru.com/
[3] https://ssdeep-project.github.io/ssdeep/index.html
[4] https://r2wiki.readthedocs.io/en/latest/tools/radiff2/
--
Renato Marinho
Morphus Labs| LinkedIn|Twitter
Comments