Decoding malware

Published: 2006-08-23
Last Updated: 2006-08-23 10:33:53 UTC
by Daniel Wesemann (Version: 1)
0 comment(s)
When ISC handler Bojan Zdrnja mentioned a "pretty interesting piece of malware" he had found, those of us who like to analyze and reverse-engineer such critters immediately jumped onto it.

The malware was talking to a handful of servers over HTTP to fetch additional content, and only by faking user agent headers to look exactly like the malware was setting them was Bojan able to retrieve the additional files. The files he got were "big strings" of ASCII character sequences which Bojan quickly figured out how to decode/translate from the Ceasarean substitution cipher into URLs. But when requesting one of these URLs, all he got was another messy big string, and one whose coding method was different:

FOEJIDBDBABDBDBDBHBDBDBDOMOMBDBDKLBDBDBDBDBDBDBDFDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBD
BDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDNLBDBDBDBNAMKJBNBDKHBKNODCKLBCFPNODCEHHL
HKGADDGDGBHMHEGBHCHODDHAHCHNHNHMGHDDHBHGDDGBGGHNDDHKHNDDFHFMEADDHOHMHHHGDNBOBOBJ
DHBDBDBDBDBDBDBDBGMODBNBAMBABABEFOELELFDFGEKFHFCEKFFFNFBEKFFFAFCELACANBGBHBAEKBE
AMBEBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDFCKPFPICBDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBD
EDFGBDBDFPBCBABDKKGNNFFHBDBDBDBDBDBDBDBDPDBDBMBCBIBCBFBDBDJDBDBDBDADBDBDBDEDBCBD
[...]
KHBKNODCKLBCFPEHHLHKGADDGDGBGMOIOMOMHMHEGBHCHODDHAHCHNHNHMGHDDHBHGDDGBGGHNDDHKBB
FHFMEADDHOHMHCOMNCONHHHGDNBOBOBJDHFAGEPFNGHDCAJELICABANLCEMMOOFLIILECACBBEKDIILG
CACCMIILLCCACEJMCOOFMLLMBMHDLHKBBEHJLHKLBMCAJELJNCNFIFNOCAKMECILCLDEKOCECPKBOEKC
KNBEEBHKHAHLEEKIEDFGHMJEGOKIFPBCOGLDGNNFFHAAPDNODCBIBCAOKIBGKKBFBKLFADBHAAGJOEHP
OFKKMEBHADBAAABKADBMAOKBIIGOMKBEBDDDBGDEDJBBBDEKKBHGHMBBBEBFDNOPFCFFMIBAFMKHPOAD
BGBDHPBIAGKFBIIDHHFLBBANNBBNMIOPDNGHHGGLGHCLONIDBCILODHNONMMIIBDBHPDDNHHHCGHHCGL
KJBAOIOPAMNIIFBABDDANDEAHLHCGBHGHHIHIOPPKLDHCGCLMDBHDDDEKCNKPOPOODDNDGHPHMHAMILN

(broken up for readability here - the original was one single long line with no CR/LF)

At first, we were convinced that the long string we were looking at was just another collection of URLs. But all the pattern matching we could cook up did not turn up anything looking like an encoded URL. So it was time to try a different approach - statistical analysis. Counting characters and character sequences can frequently tell something about the code or cipher used.

Starting with how many different "single" characters were in the cipher:

daniel@debian:~$ cat bigstring.txt | perl -ne 's/\s//g; s/(.)/$seen{$1}++/eg; foreach $c (keys %seen) {print "$seen{$c} $c\n"}' | wc -l
16


Hmm. Sixteen different chars. Let's see how many different two-character sequences we have:

daniel@debian:~$ cat bigstring.txt | perl -ne 's/\s//g; s/(..)/$seen{$1}++/eg; foreach $c (keys %seen) {print "$seen{$c} $c\n"}' | wc -l
256


Well well, another power of two. This can't be coincidence :-)

daniel@debian:~$ cat bigstring.txt | perl -ne 's/\s//g; s/(....)/$seen{$1}++/eg; foreach $c (keys %seen) {print "$seen{$c} $c\n"}' | wc -l
13160


Four-character sequences, on the other hand, don't seem to be anything special, what with 13160 different ones in the file. So most likely what we are dealing with here is a code that translated two-byte hexadecimal chars into a different alphabet. Let's see the 16-char alphabet and related frequency:

daniel@debian:~$ cat bigstring.txt | perl -ne 's/\s//g; s/(.)/$seen{$1}++/eg; foreach $c (keys %seen) {print "$seen{$c} $c\n"}'
4077 A
3523 F
3415 J
3496 O
3380 N
4108 P
3361 K
6790 B
3332 E
4334 H
3338 M
4718 C
6530 D
3623 I
3730 G
3781 L


Hmm. The frequencies dont help anything, but these are the first 16 chars of the alphabet. Maybe someone was lazy and did a simple substitution of the 16 hex values into the first 16 chars of the alphabet - which would mean that an "A" is 0, a "B" is 1, etc until "P" which would equal 0xF - 15 in Hex.  Trying this hypothesis on the file meant to convert the sixteen characters found in the file into their corresponding value. Done quick and dirty in PERL, this meant subtracting 65 from the ASCII code of each of the characters (65 is the ASCII code of "A" - consequently ascii(A)-65 equals 0, as desired):

daniel@debian:~$ cat bigstring.txt | perl -ne 's/(.)/printf "%x",ord($1)-65/ge' > stage1.txt

which had the resulting "stage1" file look something like this:

5e4983131013131317131313ecec1313ab1313131313131353131313131313131313131313131313
1313131313131313131313131313131313131313db1313131d0ca91d13a71ade32ab125fde32477b
7a603363617c7461727e3370727d7d7c673371763361667d337a7d33575c40337e7c77763d1e1e19
371313131313131316ce31d10c1010145e4b4b53564a57524a555d514a5550524b020d1617104a14
0c1413131313131313131313131313131313131352af5f8213131313131313131313131313131313
435613135f121013aa6dd5571313131313131313f3131c1218121513139313131303131313431213
[...]


These were still hex values. In order to translate them into the corresponding characters, another line of PERL-fu had to be applied:

$cat stage1.txt | perl -pe 's/(..)/chr(hex($1))/ge' > stage2.bin

This line takes the hex codes from the "stage1" file and converts them into one-byte characters.

Taking a look at the resulting "stage2.bin" file with a hex-dumper, we got:

daniel@debian:~$ hexdump -C stage2.bin | more

00000000  5e 49 83 13 10 13 13 13  17 13 13 13 ec ec 13 13  |^I..........ìì..|
00000010  ab 13 13 13 13 13 13 13  53 13 13 13 13 13 13 13  |«.......S.......|
00000020  13 13 13 13 13 13 13 13  13 13 13 13 13 13 13 13  |................|
00000030  13 13 13 13 13 13 13 13  13 13 13 13 db 13 13 13  |............Û...|
00000040  1d 0c a9 1d 13 a7 1a de  32 ab 12 5f de 32 47 7b  |..©..§.Þ2«._Þ2G{|
00000050  7a 60 33 63 61 7c 74 61  72 7e 33 70 72 7d 7d 7c  |z`3ca|tar~3pr}}||
00000060  67 33 71 76 33 61 66 7d  33 7a 7d 33 57 5c 40 33  |g3qv3af}3z}3W\@3|
00000070  7e 7c 77 76 3d 1e 1e 19  37 13 13 13 13 13 13 13  |~|wv=...7.......|
00000080  16 ce 31 d1 0c 10 10 14  5e 4b 4b 53 56 4a 57 52  |.Î1Ñ....^KKSVJWR|
00000090  4a 55 5d 51 4a 55 50 52  4b 02 0d 16 17 10 4a 14  |JU]QJUPRK.....J.|
000000a0  0c 14 13 13 13 13 13 13  13 13 13 13 13 13 13 13  |................|
000000b0  13 13 13 13 52 af 5f 82  13 13 13 13 13 13 13 13  |....R¯_.........|
[...]
000001c0  46 43 4b 23 13 13 13 13  13 43 12 13 13 03 13 13  |FCK#.....C......|
000001d0  13 13 13 13 13 17 13 13  13 13 13 13 13 13 13 13  |................|
000001e0  13 13 13 13 93 13 13 f3  46 43 4b 22 13 13 13 13  |.......óFCK"....|
000001f0  13 93 13 13 13 73 12 13  13 69 13 13 13 17 13 13  |.....s...i......|
00000200  13 13 13 13 13 13 13 13  13 13 13 13 53 13 13 f3  |............S..ó|
00000210  3d 61 60 61 70 13 13 13  13 03 13 13 13 f3 12 13  |=a`ap........ó..|
00000220  13 11 13 13 13 6d 13 13  13 13 13 13 13 13 13 13  |.....m..........|
00000230  13 13 13 13 53 13 13 d3  13 13 13 13 13 13 13 13  |....S..Ó........|
00000240  13 13 13 13 13 13 13 13  13 13 13 13 13 13 13 13  |................|

While this might still look like gibberish to some of you, folks who have looked at malware binaries in a hex dump before will notice the same we did: This sure does have the same structure as an UPX compressed EXE binary - with the difference that normal binaries don't have a file header full of "0x13" but rather a "0x00" in those places, and that "normal" EXEs also start with the tell-tale "MZ" byte sequence and not with "^I".

The simplest trick in the book to get to "0x00" from "0x13" is a binary XOR operation. XOR-ing something with the same value twice in a row yields the original byte again, so let's try a XOR with 0x13 to get from 0x13 back to 0x00:

daniel@debian:~$ cat stage2.bin | perl -pe 's/(.)/chr(ord($1)^0x13)/ge' > stage3.bin
daniel@debian:~$ file stage3.bin
stage3.bin: MS-DOS executable (EXE), OS/2 or MS Windows


Yee-Hah! The resulting decoded file is indeed an UPX packed windows binary. 

Looks like the days are over when a running malware foolishly gave away its presence by trying to download additional components in EXE form.  First, we had EXEs, then EXEs with JPG extension, then EXEs with JPG header - and now plain ASCII blobs. The task of your perimeter (proxy) anti-virus filter has just gotten a couple notches more daunting.

-- Daniel Wesemann

Keywords:
0 comment(s)

Comments


Diary Archives