Microsoft's Rich Signature (undocumented)
11/11/2010: corrected the part about the meaning of the @comp.id value high bits.
In this article I'm going to try to provide documentation for the undocumented(*) Rich Signature produced by Microsoft compilers. I'm not completely sure when this signature was introduced, I wrongly believed that I had been introduced with Visual Studio 2003, but I was shown that it is present even in VC++ 6 executables. So, I guess this signature has been introduced with that compiler. Information about this topic is non-existent (seems strange, but it's a fact). Thus, most readers probably don't know what I'm talking about. Let's look at the inital bytes of a normal executable compiled with Visual Studio 2005:
00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 MZ.........
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 .......@.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 F8 00 00 00 ...............
00000040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ..!L!Th
00000050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is.program.canno
00000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t.be.run.in.DOS.
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$.......
00000080 E7 B3 9D E7 A3 D2 F3 B4 A3 D2 F3 B4 A3 D2 F3 B4 糝
00000090 60 DD AC B4 A8 D2 F3 B4 60 DD AE B4 BE D2 F3 B4 `ݬ`ݮ
000000A0 A3 D2 F2 B4 F8 D0 F3 B4 84 14 8E B4 BA D2 F3 B4
000000B0 84 14 9E B4 3A D2 F3 B4 84 14 9D B4 3F D2 F3 B4 :?
000000C0 84 14 81 B4 B3 D2 F3 B4 84 14 8F B4 A2 D2 F3 B4
000000D0 84 14 8B B4 A2 D2 F3 B4 52 69 63 68 A3 D2 F3 B4 Rich
000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000F0 00 00 00 00 00 00 00 00 50 45 00 00 4C 01 04 00 ........PE..L.
00000100 01 2F 9A 46 00 00 00 00 00 00 00 00 E0 00 03 01 /F.........
00000110 0B 01 08 00 00 10 05 00 00 F0 04 00 00 00 00 00 .........
00000120 A5 A1 03 00 00 10 00 00 00 20 05 00 00 00 40 00 .........@.
00000130 00 10 00 00 00 10 00 00 04 00 00 00 00 00 00 00 .............
00000140 04 00 00 00 00 00 00 00 00 50 0A 00 00 10 00 00 ........P.....
00000150 D8 4B 07 00 02 00 00 00 00 00 10 00 00 10 00 00 K..........
Ok, we can easily identify the Dos Header
followed by the PE Header (also known as NT Headers). But do you also notice
something unusual? I'm referring to this:
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00
00 00 00 00 mode....$.......
00000080 E7 B3 9D E7 A3 D2 F3 B4 A3 D2
F3 B4 A3 D2 F3 B4 糝
00000090 60 DD AC B4 A8 D2 F3 B4 60 DD
AE B4 BE D2 F3 B4 `ݬ`ݮ
000000A0 A3 D2 F2 B4 F8 D0 F3 B4 84 14
8E B4 BA D2 F3 B4
000000B0 84 14 9E B4 3A D2 F3 B4 84 14
9D B4 3F D2 F3 B4 : ?
000000C0 84 14 81 B4 B3 D2 F3 B4 84 14
8F B4 A2 D2 F3 B4
000000D0 84 14 8B B4 A2 D2 F3 B4 52 69
63 68 A3 D2 F3 B4 Rich
000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000F0 00 00 00 00 00 00 00 00 50 45 00 00 4C 01 04 00 ........PE..L .
This seems pretty strange, doesn't it? What's this data between the dos stub and the PE Header?
As you can see, the highlighted block of data contains the word Rich. This why it is called Rich Signature, it surely isn't its internal official name. Not that any serious paper has ever mentioned it as Rich Signature, but in a couple of forums it came up with this name. That's all what is known about this block of data. That it contains the word "Rich" and that it is stored between the Dos Header and the PE Header. Also, what should be noted, is that this signature is only present in VC++ executables, it can't be found in .NET assemblies.
Since I had no information at all available, I started analyzing the data itself. After a bit of time I started noticing something interesting. A costant pattern of identic dwords in the block. Also, other dwords had similarities with the identical dwords. I marked the identical dwords with red and the similarities with yellow.
00000080 E7 B3 9D E7
A3 D2 F3 B4 A3 D2 F3 B4 A3 D2 F3 B4
糝
00000090 60 DD AC B4 A8 D2 F3 B4
60 DD AE
B4
BE
D2 F3 B4
`ݬ`ݮ
000000A0 A3 D2 F2 B4
F8
D0 F3 B4
84 14 8E
B4
BA
D2 F3 B4
000000B0 84 14 9E B4
3A
D2 F3 B4
84 14 9D
B4
3F
D2 F3 B4
:?
000000C0 84 14 81 B4
B3
D2 F3 B4
84 14 8F
B4 A2 D2 F3 B4
000000D0 84 14 8B B4
A2 D2 F3 B4
52 69 63 68
A3 D2 F3 B4
Rich
What made me thinking was that dword following
the word Rich. Why put this kind of dword after the word "Rich"? Anyway, I
xored the block of data from the beginning to the word "Rich" with this dword
and got a very convincing result:
00000080 44 61 6E 53 00 00 00 00 00 00 00 00 00 00 00 00 DanS............
00000090 C3 0F 5F 00 0B 00 00 00 C3 0F 5D 00 1D 00 00 00 _. ... ]. ...
000000A0 00 00 01 00 5B 02 00 00 27 C6 7D 00 19 00 00 00 .. .[ ..'}. ...
000000B0 27 C6 6D 00 99 00 00 00 27 C6 6E 00 9C 00 00 00 'm....'n....
000000C0 27 C6 72 00 10 00 00 00 27 C6 7C 00 01 00 00 00 'r. ...'|. ...
000000D0 27 C6 78 00 01 00 00 00 52 69 63 68
'x. ...Rich
This data seems very consistent. Also because it starts with the dword DanS. In fact, the data seemed to me such consistent that I wrote a CFF Explorer script to decrypt further signatures. Don't worry, my intuition was right and the following script is 100% correct.
- Dowload the Rich Signature Decrypter script
-- Rich Signature
Decrypter
-- 2008 Daniel Pistelli
filename = GetOpenFile()
if filename == null
then
return
end
hFile = OpenFile(filename)
if hFile == null
then
return
end
nSignDwords = 0
for i = 0, 100 do
dw = ReadDword(hFile, 0x80 + (i * 4))
if dw == null
then
return
end
-- is this the "Rich" terminator?
if dw == 0x68636952
then
nSignDwords =
i
break
end
end
if nSignDwords == 0
then
return
end
-- read xor mask
mask = ReadDword(hFile, 0x80 + ((nSignDwords + 1) * 4))
-- decrypt signature
for i = 0,
nSignDwords - 1
do
dw = ReadDword(hFile, 0x80 + (i * 4))
WriteDword(hFile, 0x80 + (i * 4),
dw ^ mask)
end
-- write new mask for decrypted signature
WriteDword(hFile, 0x80 + ((nSignDwords + 1) * 4), 0xFFFFFFFF)
-- save file
if SaveFile(hFile)
== true then
OpenWithCFFExplorer(filename)
end
It's not that I'm trying to spam around the CFF Explorer scripting language (which basically is a modified Lua with 0-based arrays, in case you're not familiar with it), but it's time saving to me and I use it whenever possible: writing the same thing in C/C++ or any other language would take me longer. As I said, this script decrypts every Rich Signature, corrects the xor mask (the magic dword which follows the "Rich" word) and opens the patched file in the CFF Explorer.
My guess was that this signature was produced by the linker, which is, of course, the most obvious choice. So, I disassembled the linker and looked for the "DanS" word. I found it at once. The function which creates the signature is: CbBuildProdidBlock. What follows is the disassembly of this function and all the comments I added to the code. After the disassembly, I'll sum up everything this function does and add further explanation about some points.
.text:004650E0
; unsigned int __stdcall IMAGE__CbBuildProdidBlock(unsigned int Arg1, int Arg2)
.text:004650E0
?CbBuildProdidBlock@IMAGE@@AAEKPAPAX@Z proc near
.text:004650E0
; CODE XREF: BuildImage+1057p
.text:004650E0
.text:004650E0
var_14
= dword ptr
-14h
.text:004650E0
var_4
= dword ptr
-4
.text:004650E0
Arg1
= dword ptr
4
.text:004650E0
Arg2
= dword ptr
8
.text:004650E0
.text:004650E0
mov eax,
?hheap@Heap@@2PAXA ; void * Heap::hheap
.text:004650E5
sub esp,
14h
.text:004650E8
push ebx
.text:004650E9
push ebp
.text:004650EA
push esi
.text:004650EB
push edi
.text:004650EC
push
0Ch
; dwBytes: 12
.text:004650EE
push
0
; dwFlags
.text:004650F0
push eax
; hHeap
.text:004650F1
call ds:
__imp__HeapAlloc@12
; allocates the space for the first (last)
.text:004650F1
; item of the linked list built to create the
.text:004650F1
; signature
.text:004650F7
test eax, eax
.text:004650F9
jnz short
MemoryOk
.text:004650FB
.text:004650FB
FatalError
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+BEj
.text:004650FB
push
44Eh
; unsigned int
.text:00465100
push
0
; unsigned __int16 *
.text:00465102
call
?Fatal@@YAXPBGIZZ ; Fatal(ushort const *,uint,...)
.text:00465107
; ---------------------------------------------------------------------------
.text:00465107
.text:00465107
MemoryOk
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+19j
.text:00465107
mov edx, [esp+
24h
+
Arg1
]
.text:0046510B
mov ecx,
1
.text:00465110
mov ebp, eax
; from here a linked list is created
.text:00465110
; every item of the list is 12 bytes big:
.text:00465110
; 2 dwords of data and a pointer to the next
.text:00465110
; element of the list. The structure would be:
.text:00465110
; pointer (dword)
.text:00465110
; data1 (dword)
.text:00465110
; data2 (dword)
.text:00465112
mov dword ptr [eax],
0
; pointer = 0
.text:00465118
mov dword ptr [eax+
4
],
78C627h
; data1 = 78C627h
.text:0046511F
mov [eax+
8
], ecx
; data2 = 1
.text:0046511F
; the last dwords of a Rich signature are always
.text:0046511F
; 0078C627 and 00000001h
.text:00465122
mov eax, [edx+
23Ch
]
; it's a pointer
.text:00465122
; we'll call this pointer XS
.text:00465122
; (edx contains a pointer to the
.text:00465122
; Microsoft Linker Database)
.text:00465128
mov [esp+
24h
+
var_14
], ecx
.text:0046512C
xor edi, edi
.text:0046512E
mov [esp+
24h
+
var_4
], eax
.text:00465132
.text:00465132
loc_465132:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+76j
.text:00465132
test edi, edi
.text:00465134
jnz short loc_46513C
.text:00465136
mov edi, [esp+
24h
+
var_4
]
.text:0046513A
jmp short loc_46513F
.text:0046513C
; ---------------------------------------------------------------------------
.text:0046513C
.text:0046513C
loc_46513C:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+54j
.text:0046513C
mov edi, [edi+
44h
]
.text:0046513F
.text:0046513F
loc_46513F:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+5Aj
.text:0046513F
test edi, edi
.text:00465141
jz short
buildxormask
; jumps to the xor mask creation
.text:00465143
xor ebx, ebx
.text:00465145
.text:00465145
NextItem
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+D7j
.text:00465145
; IMAGE::CbBuildProdidBlock(void * *)+DCj
.text:00465145
test ebx, ebx
; ebx == 0 ?
.text:00465147
jnz short loc_46514E
; ebx is 0 only in the first time the loop is executed
.text:00465149
mov ebx, [edi+
40h
]
; ebx = [XS+40h] (only first execution)
.text:00465149
; now ebx contains another pointer
.text:00465149
; let's call this pointer YS
.text:0046514C
jmp short loc_465154
.text:0046514E
; ---------------------------------------------------------------------------
.text:0046514E
.text:0046514E
loc_46514E:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+67j
.text:0046514E
mov ebx, [ebx+
0A0h
]
; ebx = YS pointer
.text:00465154
.text:00465154
loc_465154:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+6Cj
.text:00465154
test ebx, ebx
.text:00465156
jz short loc_465132
.text:00465158
mov esi, [ebx+
74h
]
; esi = [YS+74h] (new data1)
.text:0046515B
test esi,
0FFFF0000h
; high word is zero?
.text:00465161
jnz short loc_465179
; if not, jump to new alloc
.text:00465163
mov dl, [ebx+
0ACh
]
.text:00465169
shr dl,
6
.text:0046516C
test dl,
1
.text:0046516F
jz short loc_465179
.text:00465171
mov eax,
?g_pmodCIL@@3PAUMOD@@A ; MOD * g_pmodCIL
.text:00465176
mov esi, [eax+
74h
]
; seems to be a fixed value
.text:00465176
; esi = new data1 (fixed)
.text:00465179
.text:00465179
loc_465179:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+81j
.text:00465179
; IMAGE::CbBuildProdidBlock(void * *)+8Fj
.text:00465179
test ebp, ebp
.text:0046517B
mov eax, ebp
.text:0046517D
jz short loc_46518B
.text:0046517F
nop
.text:00465180
.text:00465180
loc_465180:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+A9j
.text:00465180
cmp [eax+
4
], esi
; old data1 == new data1 ?
.text:00465183
jz short
IncrementData2
.text:00465185
mov eax, [eax]
; eax = linked list pointer
.text:00465187
test eax, eax
; is the pointer to the next item null?
.text:00465189
jnz short loc_465180
; if not, repeats the check
.text:00465189
; this basically checks if one of the
.text:00465189
; other data1 elements in the list
.text:00465189
; equals the new data1 value
.text:00465189
;
.text:00465189
; if none of the items is equal,
.text:00465189
; it allocates the memory for the new item in the list
.text:00465189
;
.text:00465189
; if an old item data1 equals the new data1,
.text:00465189
; then the data2 dword of that old item is incremented
.text:0046518B
.text:0046518B
loc_46518B:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+9Dj
.text:0046518B
mov ecx,
?hheap@Heap@@2PAXA ; void * Heap::hheap
.text:00465191
push
0Ch
; dwBytes (the usual 12 bytes)
.text:00465193
push
0
; dwFlags
.text:00465195
push ecx
; hHeap
.text:00465196
call ds:
__imp__HeapAlloc@12
; HeapAlloc(x,x,x)
.text:0046519C
test eax, eax
.text:0046519E
jz
FatalError
.text:004651A4
mov ecx,
1
.text:004651A9
add [esp+
24h
+
var_14
], ecx
; increments the linked list count dword
.text:004651AD
mov [eax], ebp
; new item pointer
.text:004651AF
mov [eax+
4
], esi
; new data1
.text:004651B2
mov [eax+
8
], ecx
; data2 = 1 (for now)
.text:004651B5
mov ebp, eax
.text:004651B7
jmp short
NextItem
.text:004651B9
; ---------------------------------------------------------------------------
.text:004651B9
.text:004651B9
IncrementData2
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+A3j
.text:004651B9
add [eax+
8
], ecx
; increment data2
.text:004651BC
jmp short
NextItem
.text:004651BE
; ---------------------------------------------------------------------------
.text:004651BE
.text:004651BE
buildxormask
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+61j
.text:004651BE
mov ecx, [esp+
24h
+
Arg1
]
.text:004651C2
mov edx, [ecx+
25Ch
]
; initial xor mask value (0x80)
.text:004651C2
; 0x80 is also the value of the
.text:004651C2
; loops in the first xor mask loop
.text:004651C2
; (remember 0x80 is also the offset
.text:004651C2
; where the Rich Signature is stored)
.text:004651C8
xor eax, eax
.text:004651CA
test edx, edx
; is initial mask (counter) != 0 ?
.text:004651CC
mov esi, edx
.text:004651CE
jbe short
xoormaskloop2cond
; if so, skip the first xor mask loop
.text:004651D0
mov ebx, [ecx+
258h
]
; ebx = pointer to the linked executable
.text:004651D0
; I'll call this pointer "PointerToPE"
.text:004651D0
; the following loop basically creates the xor mask from
.text:004651D0
; a checksum of the first bytes of our executable
.text:004651D6
.text:004651D6
xormaskloop1
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+105j
.text:004651D6
movzx edi, byte ptr [ebx+eax]
; edi = (BYTE) PointerToPE[eax]
.text:004651DA
mov cl, al
; low byte of loop counter in cl
.text:004651DC
rol edi, cl
; rotates left the current byte of PointerToPE
.text:004651DC
; with the low byte of the loop counter
.text:004651DE
add eax,
1
; increment eax
.text:004651E1
add esi, edi
; adds the result of the rol to the xor mask
.text:004651E3
cmp eax, edx
; is counter < initial xor mask value?
.text:004651E5
jb short
xormaskloop1
; if so, goes on with the loop
.text:004651E7
.text:004651E7
xoormaskloop2cond
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+EEj
.text:004651E7
test ebp, ebp
; if the initial ebp value is 0,
.text:004651E7
; it doesn't execute the second xor mask loop
.text:004651E9
mov eax, ebp
; eax = initial value for the loop
.text:004651EB
jz short
alloc_sign_memory
.text:004651ED
lea ecx, [ecx+
0
]
; ecx = first linked list item
.text:004651ED
; the second xor mask loop adds the a checksum
.text:004651ED
; of the linked list items to the already existing
.text:004651ED
; xor mask value
.text:004651F0
.text:004651F0
xormaskloop2
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+11Ej
.text:004651F0
mov edx, [eax+
4
]
; edx = data1
.text:004651F3
mov cl, [eax+
8
]
; cl = (BYTE) data2
.text:004651F6
mov eax, [eax]
; eax = next list item
.text:004651F8
rol edx, cl
; rotates left edx with cl
.text:004651FA
add esi, edx
; adds the result of the rol to the xor mask
.text:004651FC
test eax, eax
; pointer = 0?
.text:004651FE
jnz short
xormaskloop2
; if not, goes on with the loop
.text:00465200
.text:00465200
alloc_sign_memory
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+10Bj
.text:00465200
xor edx, edx
.text:00465202
mov eax, esi
; puts the xor mask in eax
.text:00465204
shr eax,
5
; shifts the xor mask 5 bits right
.text:00465207
mov ecx,
3
.text:0046520C
div ecx
; divides the result by 3
.text:0046520E
add edx, [esp+
24h
+
var_14
]
; adds the mod of the division to the
.text:0046520E
; count of the linked list items
.text:0046520E
; this means the number of data1 and data2
.text:0046520E
; in the linked list item
.text:00465212
lea ebx, ds:
20h
[edx*8]
; multiplies the whole thing and adds 20h
.text:00465212
; ebx now contains the size of the rich signature
.text:00465212
; what just happened might appear strange and
.text:00465212
; I'll explain later the meaning of this whole
.text:00465212
; operation since it's quite interesting
.text:00465219
mov edx,
?hheap@Heap@@2PAXA ; void * Heap::hheap
.text:0046521F
push ebx
; dwBytes
.text:00465220
push
8
; dwFlags
.text:00465222
push edx
; hHeap
.text:00465223
mov [esp+
30h
+
Arg1
], ebx
.text:00465227
call ds:
__imp__HeapAlloc@12
; HeapAlloc(x,x,x)
.text:0046522D
test eax, eax
.text:0046522F
jnz short
buildsign
.text:00465231
push
44Eh
; unsigned int
.text:00465236
push eax
; unsigned __int16 *
.text:00465237
call
?Fatal@@YAXPBGIZZ ; Fatal(ushort const *,uint,...)
.text:0046523C
; ---------------------------------------------------------------------------
.text:0046523C
.text:0046523C
buildsign
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+14Fj
.text:0046523C
mov ecx, [esp+
24h
+
Arg2
]
.text:00465240
mov [ecx], eax
; stores the ptr of the allocated memory
.text:00465240
; for the signature at the address
.text:00465240
; specified by the function's second argument
.text:00465242
mov edx, esi
; moves the xor mask into edx
.text:00465244
lea edi, [eax+
8
]
.text:00465247
xor edx,
536E6144h
; xors the "DanS" Dword with the xor mask
.text:0046524D
mov [eax], edx
; stores the "DanS" word into the signature
.text:0046524F
mov [eax+
4
], esi
; stores the xor mask value three times
.text:00465252
mov [edi], esi
.text:00465254
mov [edi+
4
], esi
.text:00465257
add edi,
8
; increments the signature pointer
.text:0046525A
test ebp, ebp
.text:0046525C
mov eax, ebp
.text:0046525E
jz short
NoMoreData
.text:00465260
mov ebp, ds:
__imp__HeapFree@12
; HeapFree(x,x,x)
.text:00465266
.text:00465266
WriteSignatureDwords
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+1A9j
.text:00465266
mov ecx, [eax+
4
]
; gets data1
.text:00465269
xor ecx, esi
; xors it with the mask
.text:0046526B
mov [edi], ecx
; stores it in the signature
.text:0046526D
mov edx, [eax+
8
]
; gets data2
.text:00465270
xor edx, esi
; xors it with the mask
.text:00465272
push eax
.text:00465273
mov [edi+
4
], edx
; stores it in the signature
.text:00465276
mov ebx, [eax]
.text:00465278
mov eax,
?hheap@Heap@@2PAXA ; void * Heap::hheap
.text:0046527D
push
0
; dwFlags
.text:0046527F
push eax
; hHeap
.text:00465280
add edi,
8
.text:00465283
call ebp
;
HeapFree(x,x,x)
; = current item in the
.text:00465283
; list is being freed
.text:00465285
test ebx, ebx
; next element pointer == 0?
.text:00465287
mov eax, ebx
.text:00465289
jnz short
WriteSignatureDwords
; (this loop stores all the
.text:00465289
; "secret" data into the signature)
.text:00465289
; if the current item pointer is == 0,
.text:00465289
; stop the loop
.text:0046528B
mov ebx, [esp+
24h
+
Arg1
]
.text:0046528F
.text:0046528F
NoMoreData
:
; CODE XREF: IMAGE::CbBuildProdidBlock(void * *)+17Ej
.text:0046528F
mov [edi+
4
], esi
; stores the xor mask at the end of the signature
.text:00465292
mov dword ptr [edi],
68636952h
; stores the "Rich" Dword at the end
.text:00465292
; of the crypted data
.text:00465292
; just before the xor mask
.text:00465298
pop edi
.text:00465299
pop esi
.text:0046529A
pop ebp
.text:0046529B
mov eax, ebx
; ebx contains the size of the rich signature
.text:0046529B
; this is the return value of the function
.text:0046529D
pop ebx
.text:0046529E
add esp,
14h
.text:004652A1
retn
8
.text:004652A1
?CbBuildProdidBlock@IMAGE@@AAEKPAPAX@Z endp
The comments in the code give a general idea, but much more clarification is needed. Before I explain anything, let me tell you that if you have in mind to debug the linker, you'll need a kernel debugger: you absolutely cannot debug the linker with ollydbg or other user mode debuggers (don't take this too literally, in theory it's possible, but I being very old school was unable to do so). I did most of the code analyzing just with the disassembler, but at some point I needed to see the values of some variables and in some cases the memory pointed to by these values. So, it took me about a day (I had some troubles) setting up a VM with SoftIce and Visual Studio, which was the best choice in this case, because I needed to set a system breakpoint to attach the debugger to the linker process which is executed by the Visual Studio IDE when building a solution. Another thing: this code is the same as for x64 executables and itanium ones. So, what I'm disclosing here is valid for other platforms as well.
Brief summary of what this function does. The first part of the function creates a linked list of structures containing (not counting the linking pointer) two dwords which I called "data1" and "data2". This list contains one fixed item (data1=0x78C627 and data2=1) which is going to be the last item of the list. Other items are retrieved through a not so clear loop block from the "Microsoft Linker Database". This is simplified, of course, I have analyzed the loop and have a quite good knowledge of what it does, but it's not interesting for us right now, since we don't know the nature of the data. What should be noted in this loop, is that the data2 dword of one linked list item is being incremented everytime the dword data1 of the same item is encountered by the loop walking through the linker's internal data structures.
The linked list is, of course, terminated by the first null pointer. Keep in mind that the last item of this list has a fixed value of data1=0x78C627 and data2=1. After having created the linked list, the function creates the xor mask for the signature. This value should interest us for many reasons, but mainly because we couldn't know before analyzing the function what data the xor mask was made of. As I discovered dissassembling the code, the xor mask is completely harmless, since it's just a result of two checksums. The first checksum is made on the first 0x80 bytes of our PE (remember that the Rich Signature is stored exactly after those first 0x80 bytes), meanwhile the second one is made on the linked list data1 and data2 values. Also, the initial value of the xor mask is always 0x80. So, what information can be retrieved from the xor mask? Only that our Rich Signature or the first 0x80 bytes of our PE haven't been modified. The final part of the function allocates the memory for the signature (I'll talk about that later). Then the dword "DanS" is xored and stored at the beginning of the newly allocated memory. What follows is three times the xor mask value. Then a loop is started which walks through the linked list and xors every data1 and data2 values and stores them into the signature. When the loop is over the dword "Rich" is stored followed by the xor mask. The function returns the size of the new signature and the pointer of the newly allocated memory for the signature is stored through an argument pointer of the function.
This is all pretty clear, the only thing that
should be discussed further is the memory allocation process for the signature.
This process will decide the effective size of the signature. Let's look at this
signature:
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00
00 00 00 00 00 00 mode....$.......
00000080 44 61 6E 53 00 00 00 00 00 00
00 00 00 00 00 00 DanS............
00000090 C3 0F 5F 00 0B 00 00 00 C3 0F
5D 00 1D 00 00 00 _. ... ]. ...
000000A0 00 00 01 00 5B 02 00 00 27 C6
7D 00 19 00 00 00 .. .[ ..'}. ...
000000B0 27 C6 6D 00 99 00 00 00 27 C6
6E 00 9C 00 00 00 'm....'n....
000000C0 27 C6 72 00 10 00 00 00 27 C6
7C 00 01 00 00 00 'r. ...'|. ...
000000D0 27 C6 78 00 01 00 00 00 52 69
63 68 FF FF FF FF 'x. ...Rich
000000E0 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 ................
000000F0 00 00 00 00 00 00 00 00
50 45 00 00 4C 01 04 00 ........PE..L
.
The yellow marked data (well, not the entire yellow block, but we'll see that later) between our signature and the PE header is random padding produced by the memory allocation process for the signature. If we consider the code:
.text:00465200
xor edx, edx
.text:00465202
mov eax, esi
; puts the xor mask in eax
.text:00465204
shr eax,
5
; shifts the xor mask 5 bits right
.text:00465207
mov ecx,
3
.text:0046520C
div ecx
; divides the result by 3
.text:0046520E
add edx, [esp+
24h
+
var_14
]
; adds the mod of the division to the
.text:0046520E
; count of the linked list items
.text:0046520E
; this means the number of data1 and data2
.text:0046520E
; in the linked list item
.text:00465212
lea ebx, ds:
20h
[edx*8]
; multiplies the whole thing and adds 20h
This would look something like this in C/C++:
SignSize = ((((XorMask >> 5) % 3) + nListItems) * 8) + 0x20;
Since the mod of the xor mask is added to the
number of items in the linked list and then multiplied by 8 (size of data1 +
data2), the random padding will always be a multiple of 8; and it can be either
0, 8 or 16 bytes. This seems not to be the case in the signature above (the
yellow marked data is 24 bytes. But this can be easily explained if you consider
the addition of the 0x20 value. To give a clear idea of what I mean, here's the
data marked with three different colors:
00000080
44 61 6E 53 00 00 00 00 00 00 00 00 00
00 00 00 DanS............
00000090 C3 0F 5F 00 0B 00 00 00 C3 0F
5D 00 1D 00 00 00 _. ... ]. ...
000000A0 00 00 01 00 5B 02 00 00 27 C6
7D 00 19 00 00 00 .. .[ ..'}. ...
000000B0 27 C6 6D 00 99 00 00 00 27 C6
6E 00 9C 00 00 00 'm....'n....
000000C0 27 C6 72 00 10 00 00 00 27 C6
7C 00 01 00 00 00 'r. ...'|. ...
000000D0 27 C6 78 00 01 00 00 00
52 69 63 68 FF FF FF FF
'x. ...Rich
000000E0 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
................
000000F0 00 00 00 00 00 00 00 00
50 45 00 00 4C 01 04 00 ........PE..L
.
The data in red represents the 0x20 value, the turquoise data is our linked list and the yellow data is the random padding. So, to sum up here's our Rich Signature structure diagram:
Still, we don't know yet what the linked list data is. To discover the nature of the linked list data we have to go back to the linked list's generation loop. The initial memory value from what it all starts is stored here:
.text:00465122
mov eax, [edx+
23Ch
]
; it's a pointer
.text:00465122
; we'll call this pointer XS
.text:00465122
; (edx contains a pointer to the
.text:00465122
; Microsoft Linker Database)
As stated in the comments, edx contains a pointer to the Microsoft Linker Database, which is a structure in memory which starts with this string (so that's what it is, I guess). Since 23Ch is a pretty uncommon number, I looked for other references in the disassembly for that number and collected some interesting findings. Most of the time what I encountered was something like this:
.text:00456DC9So, my guess is that the pointer referenced in our loop was the "struct LIBS *" kind. I admit this was only a guess, but the data in memory couldn't tell me very much and this kind of code references seemed to me solid enough to give this theory a try.
struct
MicrosoftLinkerDB
{
BYTE Pad[0x23C];
// unknown data
struct LIBS *pLibs;
// our pointer
};
This would be our current structure. I analyzed the linked list loop long enough to have an idea of how the LIBS struct is represented in memory, but I won't disclose it here, because it doesn't add anything to what is necessary to know for what I'm going to do now. As I said, I started from the theory that somehow libraries were involved. So, I took one data1 element from our linked list and searched it in all library files in my SDK. Since I didn't have a tool to search for hex bytes in all files in a directory (all tools I have on my computer work with strings, not bytes), I wrote a little CFF Explorer script to accomplish this task.
-- data to search
data = { 0xC3, 0x0F, 0x5F, 0x00 }
str = GetDirectory("Select
Directory...")
n = 0
if str
then
hSearchHandle =
InitFindFile(str ..
"\\*.*")
if hSearchHandle
then
FName = FindFile(hSearchHandle)
while FName
do
off = SearchBytes(str
.. "\\" .. FName, 0,
data)
if off
!= null then
n = n + 1
-- print filename
MsgBox(FName)
end
FName = FindFile(hSearchHandle)
end
end
end
MsgBox("Number of results: "
.. n)
The result that I got searching for the value 0x005F0FC3 where:
ADSIid.lib
Bits.lib
bufferoverflowu.lib
certidl.lib
ComMode.obj
Fci.lib
Fdi.lib
GlAux.lib
ksuser.lib
MMC.lib
MqOA.lib
MsXml2.lib
ScrnSave.lib
ScrnSavW.lib
Shell32.lib
strsafe.lib
Svcguid.lib
unicows.lib
Uuid.lib
WbemUuid.lib
WiaGuid.lib
WinStrm.lib
WS2_32.lib
Number of results: 23
If I had suspected such a long list, I would
have used the log functions of the scripting. Anyway, I took one of these
libraries and opened with the CFF Explorer hex editor. I chose Shell32, because
I knew I had included it in my project. So, I opened the library and searched
for the value. This is where I found it:
0000FFC0 12 4D 69 63 72 6F 73 6F 66 74
20 28 52 29 20 4C Microsoft.(R).L
0000FFD0 49 4E 4B 00 00 00 00 00 00 00 00 00 40 63 6F 6D INK.........@com
0000FFE0 70 2E 69 64 C3 0F 5D 00
FF FF 00 00 03 00 00 00 p.id ]...
...
0000FFF0 00 00 04 00 00 00 00 00 00 00 02 00 00 00 02 00 .. ....... ... .
As you can see, our value comes right after the
string "@comp.id". And this is true for every library: I checked every data1
value of the linked list. Every value could be found in one or more input
libraries used by our test executable, and those data1 values were always
preceded by this string in the libraries. One library can have many different
"@comp.id" values, meanwhile object files have only one. To learn why this is
so, we have to take a look at the format of both libraries and object files.
Well, to tell the truth, it's the same format: a library contains one or more
object files. Before I say anything, you should know that the library and the
object file formats aren't as popular as the PE format, so there is less written
material about them. Telling a long story short, a library starts with a 8 byte
string, followed by a IMAGE_ARCHIVE_MEMBER_HEADER structure. After this
structure comes a dword (in big endian, most values in a library are in big
endian format) which tells us the number of symbols contained in the library.
This dword is followed by an array of big endian dwords (the size was given by
the symbols' number value) with the offsets of the symbols' data and after this
array comes an array of zero terminated strings, one for each symbol. Data
associated with the symbol can be IMAGE_ARCHIVE_MEMBER_HEADER followed by an
object file. This brings us to the format of an object file. This is very
simple, it's just a File Header structure followed by a Section Headers array.
In the hex data below two object files are visible. I marked the Archive Member
Header structure in gray, the File Header in yellow and the Sections Header
array in red. Notice that I marked the "@comp.id" string in orange.
00000790 43 48 45 44 32 30 2E 64 6C 6C
2F 20 20 20 39 34 CHED20.dll/...94
000007A0 32 34 34 39 35 34 38 20 20 20
20 20 20 20 20 20 2449548.........
000007B0 20 20 20 20 20 20 30 20 20 20
20 20 20 20 32 34 ......0.......24
000007C0 34 20 20 20 20 20 20 20 60 0A
4C 01 02 00 8C A3 4.......`.L
.
000007D0 2C 38 B3 00 00 00 02 00 00 00
00 00 00 01 2E 64
,8... ......
.d
000007E0 65 62 75 67 24 53 00 00 00 00
00 00 00 00 3B 00 ebug$S........;.
000007F0 00 00 64 00 00 00 00 00 00 00
00 00 00 00 00 00 ..d.............
00000800 00 00 40 00 10 42 2E 69 64 61
74 61 24 33 00 00 ..@. B.idata$3..
00000810 00 00 00 00 00 00 14 00 00 00
9F 00 00 00 00 00 ...... ........
00000820 00 00 00 00 00 00 00 00 00 00
40 00 30 C0 01 00 ..........@.0
.
00000830 00 00 13 00 09 00 00 00 00 00 0C 52 49 43 48 45 .. ....... RICHE
00000840 44 32 30 2E 64 6C 6C 20 00 01 00 03 07 00 08 18 D20.dll.. . .
00000850 4D 69 63 72 6F 73 6F 66 74 20 4C 49 4E 4B 20 35 Microsoft.LINK.5
00000860 2E 31 32 2E 39 30 34 39 00 00 00 00 00 00 00 00 .12.9049........
00000870 00 00 00 00 00 00 00 00 00 00 00 00 00
40 63 6F .............@co
00000880 6D 70 2E 69 64 59 23 13
00 FF FF 00 00 03 00 00 mp.idY#
... ..
00000890 00 00 00 04 00 00 00 00 00 00 00 02 00 00 00 02 ... ....... ...
000008A0 00 1D 00 00 00 5F 5F 4E 55 4C 4C 5F 49 4D 50 4F . ...__NULL_IMPO
000008B0 52 54 5F 44 45 53 43 52 49 50 54 4F 52 00 52 49 RT_DESCRIPTOR.RI
000008C0 43 48 45 44 32 30 2E 64 6C 6C
2F 20 20 20 39 34 CHED20.dll/...94
000008D0 32 34 34 39 35 34 38 20 20 20
20 20 20 20 20 20 2449548.........
000008E0 20 20 20 20 20 20 30 20 20 20
20 20 20 20 32 37 ......0.......27
000008F0 33 20 20 20 20 20 20 20 60 0A
4C 01 03 00 8C A3 3.......`.L
.
00000900 2C 38 CF 00 00 00 02 00 00 00
00 00 00 01 2E 64
,8... ......
.d
00000910 65 62 75 67 24 53 00 00 00 00
00 00 00 00 3B 00 ebug$S........;.
00000920 00 00 8C 00 00 00 00 00 00 00
00 00 00 00 00 00 ...............
00000930 00 00 40 00 10 42 2E 69 64 61
74 61 24 35 00 00 ..@. B.idata$5..
00000940 00 00 00 00 00 00 04 00 00 00
C7 00 00 00 00 00 ...... ........
00000950 00 00 00 00 00 00 00 00 00 00
40 00 30 C0 2E 69 ..........@.0.i
00000960 64 61 74 61 24 34 00 00 00 00
00 00 00 00 04 00 data$4........ .
00000970 00 00 CB 00 00 00 00 00 00 00
00 00 00 00 00 00 ...............
00000980 00 00 40 00 30 C0 01 00
00 00 13 00 09 00 00 00 ..@.0
... .....
00000990 00 00 0C 52 49 43 48 45 44 32 30 2E 64 6C 6C 20 .. RICHED20.dll.
000009A0 00 01 00 03 07 00 08 18 4D 69 63 72 6F 73 6F 66 . . . Microsof
000009B0 74 20 4C 49 4E 4B 20 35 2E 31 32 2E 39 30 34 39 t.LINK.5.12.9049
000009C0 00 00 00 00 00 00 00 00 00 40
63 6F 6D 70 2E 69 .........@comp.i
000009D0 64 59 23 13 00 FF FF 00
00 03 00 00 00 00 00 04 dY# ...
.....
000009E0 00 00 00 00 00 00 00 02 00 00 00 02 00 1E 00 00 ....... ... . ..
000009F0 00 7F 52 49 43 48 45 44 32 30 5F 4E 55 4C 4C 5F .RICHED20_NULL_
So, as you can see, every object file has only one "@comp.id" string, and a library can have more than one, because it embeds more than just one object file. Before thinking about the nature of the value following the "@comp.id" string, I wanted to know how to get to this value in the first place. To analyze both the File Header and the Sections Header array properly I needed the CFF Explorer. Since the CFF Explorer doesn't support object files (yet), I overwrote the File Header and the Sections Header array of a Portable Executable file. Let's take a look at the data we're analyzing in a simpler way:
Nevermind the SizeOfOptionalHeader. This field is 0 in an object file. I changed it because it is necessary in a PE file (it tells where the Section Headers array is located). And the sections:
What I tried first was checking if the string
was contained in a section. The size of a section is given by the Raw Size
field, whereas the location is given by the Raw Address. To obtain the file
offset of the section, one has to add the File Header offset to the Raw Address
field. In this object I marked the data of the last section:
000008C0 43 48 45 44 32 30 2E 64 6C 6C
2F 20 20 20 39 34 CHED20.dll/...94
000008D0 32 34 34 39 35 34 38 20 20 20 20 20 20 20 20 20 2449548.........
000008E0 20 20 20 20 20 20 30 20 20 20 20 20 20 20 32 37 ......0.......27
000008F0 33 20 20 20 20 20 20 20 60 0A 4C 01 03 00 8C A3 3.......`.L .
00000900 2C 38 CF 00 00 00 02 00 00 00 00 00 00 01 2E 64 ,8... ...... .d
00000910 65 62 75 67 24 53 00 00 00 00 00 00 00 00 3B 00 ebug$S........;.
00000920 00 00 8C 00 00 00 00 00 00 00 00 00 00 00 00 00 ...............
00000930 00 00 40 00 10 42 2E 69 64 61 74 61 24 35 00 00 ..@. B.idata$5..
00000940 00 00 00 00 00 00 04 00 00 00 C7 00 00 00 00 00 ...... ........
00000950 00 00 00 00 00 00 00 00 00 00 40 00 30 C0 2E 69 ..........@.0.i
00000960 64 61 74 61 24 34 00 00 00 00 00 00 00 00 04 00 data$4........ .
00000970 00 00 CB 00 00 00 00 00 00 00 00 00 00 00 00 00 ...............
00000980 00 00 40 00 30 C0 01 00 00 00 13 00 09 00 00 00 ..@.0 ... .....
00000990 00 00 0C 52 49 43 48 45 44 32 30 2E 64 6C 6C 20 .. RICHED20.dll.
000009A0 00 01 00 03 07 00 08 18 4D 69 63 72 6F 73 6F 66 . . . Microsof
000009B0 74 20 4C 49 4E 4B 20 35 2E 31 32 2E 39 30 34 39 t.LINK.5.12.9049
000009C0 00 00 00 00 00 00 00 00 00
40 63 6F 6D 70 2E 69 .........@comp.i
000009D0 64 59 23 13 00 FF FF 00 00 03 00 00 00 00 00 04 dY# ... .....
As you can see the "@comp.id" string comes right after the last section's data. Our data is not contained in a section as it turned out. I gave another look at the File Header structure and noticed the PointerToSymbolTable field, which in this case is 0xCF. Look at what happens if I mark 0xCF bytes starting from the File Header (which is always the base offset in object files):
000008F0 33 20 20 20 20 20 20 20 60 0A
4C 01 03 00 8C A3 3.......`.L
.
00000900 2C 38 CF 00 00 00 02 00 00 00
00 00 00 01 2E 64 ,8... ...... .d
00000910 65 62 75 67 24 53 00 00 00 00
00 00 00 00 3B 00 ebug$S........;.
00000920 00 00 8C 00 00 00 00 00 00 00
00 00 00 00 00 00 ...............
00000930 00 00 40 00 10 42 2E 69 64 61
74 61 24 35 00 00 ..@. B.idata$5..
00000940 00 00 00 00 00 00 04 00 00 00
C7 00 00 00 00 00 ...... ........
00000950 00 00 00 00 00 00 00 00 00 00
40 00 30 C0 2E 69 ..........@.0.i
00000960 64 61 74 61 24 34 00 00 00 00
00 00 00 00 04 00 data$4........ .
00000970 00 00 CB 00 00 00 00 00 00 00
00 00 00 00 00 00 ...............
00000980 00 00 40 00 30 C0 01 00 00 00
13 00 09 00 00 00 ..@.0 ... .....
00000990 00 00 0C 52 49 43 48 45 44 32
30 2E 64 6C 6C 20 .. RICHED20.dll.
000009A0 00 01 00 03 07 00 08 18 4D 69
63 72 6F 73 6F 66 . . . Microsof
000009B0 74 20 4C 49 4E 4B 20 35 2E 31
32 2E 39 30 34 39 t.LINK.5.12.9049
000009C0 00 00 00 00 00 00 00 00 00
40 63 6F 6D 70 2E 69 .........@comp.i
000009D0 64 59 23 13 00 FF FF 00 00 03 00 00 00 00 00 04 dY# ... .....
The field PointerToSymbolTable seems to bring us directly to the "@comp.id" string. However, it's not always that easy, sometimes this field brings us slightly before the string. Intrestingly, we can get to the string by multiples of 0x12 bytes, and after the string often come section headers. This seemed to me as a sort of table (and the field PointerToSymbolTable would suggest that I'm right). I remembered a very handy utility shipped along with Microsoft's compilers called dumpbin. I ran it on the library we analyzed above, which, by the way, is the Riched32.lib. Here is an interesting part of the output:
Dump of file Riched32.lib File Type: LIBRARY Archive member name at 8: / 382CA38C time/date Sat Nov 13 00:32:28 1999 uid gid 0 mode 26C size correct header end 21 public symbols 566 __IMPORT_DESCRIPTOR_RICHED20 78E __NULL_IMPORT_DESCRIPTOR 8BE RICHED20_NULL_THUNK_DATA AFE _IID_IRichEditOle AFE __imp__IID_IRichEditOle B6E _IID_IRichEditOleCallback B6E __imp__IID_IRichEditOleCallback A8A _CreateTextServices@12 A8A __imp__CreateTextServices@12 CC0 _IID_ITextServices CC0 __imp__IID_ITextServices BE6 _IID_ITextHost BE6 __imp__IID_ITextHost C52 _IID_ITextHost2 C52 __imp__IID_ITextHost2 A0C ?REExtendedRegisterClass@@YGHXZ A0C __imp_?REExtendedRegisterClass@@YGHXZ DA8 _RichEditANSIWndProc@16 DA8 __imp__RichEditANSIWndProc@16 D30 _RichEdit10ANSIWndProc@16 D30 __imp__RichEdit10ANSIWndProc@16 Archive member name at 2B0: / 382CA38C time/date Sat Nov 13 00:32:28 1999 uid gid 0 mode 27A size correct header end 13 offsets 1 566 2 78E 3 8BE 4 AFE 5 B6E 6 A8A 7 CC0 8 BE6 9 C52 A A0C B DA8 C D30 D 0 21 public symbols A ?REExtendedRegisterClass@@YGHXZ 6 _CreateTextServices@12 4 _IID_IRichEditOle 5 _IID_IRichEditOleCallback 8 _IID_ITextHost 9 _IID_ITextHost2 7 _IID_ITextServices C _RichEdit10ANSIWndProc@16 B _RichEditANSIWndProc@16 1 __IMPORT_DESCRIPTOR_RICHED20 2 __NULL_IMPORT_DESCRIPTOR A __imp_?REExtendedRegisterClass@@YGHXZ 6 __imp__CreateTextServices@12 4 __imp__IID_IRichEditOle 5 __imp__IID_IRichEditOleCallback 8 __imp__IID_ITextHost 9 __imp__IID_ITextHost2 7 __imp__IID_ITextServices C __imp__RichEdit10ANSIWndProc@16 B __imp__RichEditANSIWndProc@16 3 RICHED20_NULL_THUNK_DATA Archive member name at 566: RICHED20.dll/ 382CA38C time/date Sat Nov 13 00:32:28 1999 uid gid 0 mode 1EB size correct header end FILE HEADER VALUES 14C machine (x86) 3 number of sections 382CA38C time date stamp Sat Nov 13 00:32:28 1999 107 file pointer to symbol table 8 number of symbols 0 size of optional header 100 characteristics 32 bit word machine SECTION HEADER #1 .debug$S name 0 physical address 0 virtual address 3B size of raw data 8C file pointer to raw data (0000008C to 000000C6) 0 file pointer to relocation table 0 file pointer to line numbers 0 number of relocations 0 number of line numbers 42100040 flags Initialized Data Discardable 1 byte align Read Only RAW DATA #1 00000000: 01 00 00 00 13 00 09 00 00 00 00 00 0C 52 49 43 .............RIC 00000010: 48 45 44 32 30 2E 64 6C 6C 20 00 01 00 03 07 00 HED20.dll ...... 00000020: 08 18 4D 69 63 72 6F 73 6F 66 74 20 4C 49 4E 4B ..Microsoft LINK 00000030: 20 35 2E 31 32 2E 39 30 34 39 00 5.12.9049. SECTION HEADER #2 .idata$2 name 0 physical address 0 virtual address 14 size of raw data C7 file pointer to raw data (000000C7 to 000000DA) DB file pointer to relocation table 0 file pointer to line numbers 3 number of relocations 0 number of line numbers C0300040 flags Initialized Data 4 byte align Read Write RAW DATA #2 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 .... RELOCATIONS #2 Symbol Symbol Offset Type Applied To Index Name -------- ---------------- ----------------- -------- ------ 0000000C DIR32NB 00000000 3 .idata$6 00000000 DIR32NB 00000000 4 .idata$4 00000010 DIR32NB 00000000 5 .idata$5 SECTION HEADER #3 .idata$6 name 0 physical address 0 virtual address E size of raw data F9 file pointer to raw data (000000F9 to 00000106) DB file pointer to relocation table 0 file pointer to line numbers 0 number of relocations 0 number of line numbers C0200040 flags Initialized Data 2 byte align Read Write RAW DATA #3 00000000: 52 49 43 48 45 44 32 30 2E 64 6C 6C 00 00 RICHED20.dll.. COFF SYMBOL TABLE 000 00132359 ABS notype Static | @comp.id 001 00000000 SECT2 notype External | __IMPORT_DESCRIPTOR_RICHED20 002 C0000040 SECT2 notype Section | .idata$2 003 00000000 SECT3 notype Static | .idata$6 004 C0000040 UNDEF notype Section | .idata$4 005 C0000040 UNDEF notype Section | .idata$5 006 00000000 UNDEF notype External | __NULL_IMPORT_DESCRIPTOR 007 00000000 UNDEF notype External | RICHED20_NULL_THUNK_DATA
|
Dumpbin classifies our data as part of the COFF SYMBOL TABLE. What I thought was: if it is in the symbol table, it has to be a symbol. Thus, I searched for a symbol structure in the Winnt.h. I assumed that it was impossible that something defined so clearly was not given a definition in that header. Turns out I was right, here's everything we need to interpret the data correctly:
//
// Symbol format.
//
typedef struct
_IMAGE_SYMBOL {
union {
BYTE ShortName[8];
struct {
DWORD Short;
// if 0, use LongName
DWORD Long;
// offset into string table
} Name;
DWORD LongName[2];
// PBYTE [2]
} N;
DWORD Value;
SHORT SectionNumber;
WORD Type;
BYTE StorageClass;
BYTE
NumberOfAuxSymbols;
} IMAGE_SYMBOL;
typedef IMAGE_SYMBOL
UNALIGNED *PIMAGE_SYMBOL;
#define IMAGE_SIZEOF_SYMBOL
18
//
// Section values.
//
// Symbols have a section number of the section in which
they are
// defined. Otherwise, section numbers have the following
meanings:
//
#define IMAGE_SYM_UNDEFINED
(SHORT)0 // Symbol
is undefined or is common.
#define IMAGE_SYM_ABSOLUTE
(SHORT)-1 // Symbol
is an absolute value.
#define IMAGE_SYM_DEBUG
(SHORT)-2 // Symbol
is a special debug item.
#define IMAGE_SYM_SECTION_MAX
0xFEFF // Values 0xFF00-0xFFFF are special
The size of the structure is 18 bytes which is 0x12 in hex, the number noted earlier. Let's consider the "@comp.id" data layout given the structure above:
ShortName = @comp.id
Value = 0x00132359
SectionNumber = 0xFFFF (IMAGE_SYM_ABSOLUTE)
Type = 0x0000
StorageClass = 0x03
NumberOfAuxSymbols = 0x00
Exactly what dumpbin told us. From what I could understand the number of auxiliar symbols (NumberOfAuxSymbols) is the value which should be multiplied by 0x12 to get the next symbol in the table. So, if I have 1 auxiliar symbol, another 0x12 bytes belong to the current symbol before the next one comes. This is how you can enumerate the symbols in the table. This goes a little bit beyond the scope of this article, but now we know where the data comes from and how we can retrieve it from the library format.
On person made me notice that the low word of the comp.id value was the same as part of the version number of his VC++ compiler. Let's analyze for a second the fixed value inserted in the Rich Signature and let's consider its low word 0xC627 (50727). Can you spot it in the image?
The high word contains the value of an internal enum and represents a product identifier. When I first wrote this paper I wasn't aware of this and it was later pointed out by someone who had access to the sources. So my guess about the meaning of the high word was incorrect. What follows is a short script to display the content of the Rich Signature.
- Download Rich Signature Displayer
-- Rich Signature
Displayer
-- 2008 Daniel Pistelli
filename = GetOpenFile()
if filename ==
null then
return
end
hFile = OpenFile(filename)
if hFile ==
null then
return
end
nSignDwords = 0
for i = 0, 100
do
dw = ReadDword(hFile,
0x80 + (i * 4))
if dw ==
null then
return
end
-- is this the "Rich" terminator?
if dw ==
0x68636952 then
nSignDwords = i
break
end
end
if nSignDwords
== 0 then
return
end
-- read xor mask
mask = ReadDword(hFile,
0x80 + ((nSignDwords + 1) * 4))
-- init string
strInfo = "VC++ tools
used:\r\n"
-- decrypt versions
for i = 4,
nSignDwords - 1, 2 do
dw = ReadDword(hFile,
0x80 + (i * 4)) ^ mask
id = dw >> 16
minver = dw &
0xFFFF
vnum = ReadDword(hFile,
0x80 + ((i + 1) * 4)) ^ mask
strInfo = strInfo
.. "\r\n" .. "Id: " ..
id
.. " Version: " ..
minver .. " Times: " ..
vnum
end
-- show versions
MsgBox(strInfo,
"Rich Signature Info",
MB_ICONINFORMATION)
Here's a screenshot with the output of this script:
In the end, the value of @comp.id is quite harmless, compared to what people thought it could be.
Anyway, harmless or not, I'm now going to present you a neat little script which you may be pleased with just like I am. It does a lot of things. The main thing it does is to remove the DOS stub and the Rich Signature. It also strips the debug information (if any) present in the PE. As you might already know, current VC++ editions leave debug info even in release executable; contained in this debug info is the absolute path to the executable's pdb file, which gives away your project's current path. I hate that. It can be disabled from the compiler, but sometimes it comes handy to make sure. Finally, the script does all the rebuilding necessary to have a 100% working PE again, it even updates the checksum.
-- 2008 Daniel
Pistelli. All rights reserved.
-- Header Pack Script Version: 1.0.0.1
-- This neat little script does the following:
--
-- packs the dos header + PE header + section headers
-- removes useless things like the Rich Signature
-- removes linker references inside the PE header
-- strips the debug information (if any) from the PE
-- if it's a .NET, removes Strong Name Signature
-- updates checksum
-- shows an invalid PE Msgbox
function InvPEMsg()
MsgBox("The current file
seems to be an invalid PE. Cannot proceed!",
strScriptName,
MB_ICONEXCLAMATION)
end
-- --------------------------------------------------
-- the main code starts here
-- --------------------------------------------------
strScriptName = "Header Pack
Script 1.0.0.1 - by Daniel Pistelli"
local filename
= GetOpenFile("Select a
PE...",
"All\n*.*\nExe Files\n*.exe\nDll Files\n*.dll\n")
--[[ -- if it is a fixed file name, write: filename =
@"C:\...\Release\App.exe" ]]
if filename ==
null then
return
end
local hPE =
OpenFile(filename)
if hPE ==
null then
MsgBox("Couldn't open
file.", "Error",
MB_ICONEXCLAMATION)
return
end
local OptHdrOffset
= GetOffset(hPE,
PE_OptionalHeader)
if OptHdrOffset
== null then
InvPEMsg()
return
end
local bPE64 =
IsPE64(hPE)
local bDotNET =
IsDotNET(hPE)
-- --------------------------------------------------
-- START PROCESSING
-- --------------------------------------------------
-- PACK HEADERS
do
local
SecrHdrsOffset = GetOffset(hOriginalPE,
PE_SectionHeaders)
local
SizeOfSections = GetNumberOfSections(hPE)
* IMAGE_SIZEOF_SECTION_HEADER
local FileHdrOffset
= GetOffset(hPE,
PE_FileHeader)
local
SizeOfOptionalHdr = ReadWord(hPE,
FileHdrOffset + 16)
local
SizeOfPEHeader = 4 + 20 + SizeOfOptionalHdr;
local PEHdrOffset
= GetOffset(hPE,
PE_NtHeaders)
-- we assume that the PE header comes right after
-- the Dos header, this is a normal PE
FillBytes(hPE,
0x40, PEHdrOffset - 0x40, 0)
-- move headers
local HeadersSize
= SizeOfPEHeader +
SizeOfSections
local PEHdrAndSects
= ReadBytes(hPE,
PEHdrOffset, HeadersSize)
FillBytes(hPE,
PEHdrOffset, HeadersSize,
0)
WriteBytes(hPE,
0x40, PEHdrAndSects,
HeadersSize)
-- e_lfanew
WriteDword(hPE,
0x3C, 0x40)
OptHdrOffset = GetOffset(hPE,
PE_OptionalHeader)
end
-- REMOVE LINKER REF
do
WriteWord(hPE,
OptHdrOffset + 2, 0x0000)
end
-- REMOVE DBG INFO
RemoveDebugDirectory(hPE)
-- REMOVE SNS IF .NET
if bDotNET ==
true then
RemoveStrongNameSignature(hPE)
end
-- UPDATE CHECKSUM
UpdateChecksum(hPE)
-- SAVE FILE
SaveFile(hPE)
The header produced by this script comes, as I said, without DOS stub: I don't think it will be missing in 2008. The most efficient way to use this script is to execute it automatically after every linking. The PE header could be packed even more (for example one could reduce the data directory entries), but this goes beyond what I wanted to do: I just wanted my executables to be garbage clean.
Time to end this article. I enjoyed getting an insight into the libraries' file format. I might add support for it in the next version of the CFF Explorer, although I wouldn't consider this to be a priority, since most people wouldn't make use of it anyway.
Daniel Pistelli
* Apparently there had been already some information about the rich signature before I wrote this article. I did some google research before writing, but couldn't find anything about this topic. The prior information was signalled to me only after my paper became public. Here's a link to it. I mention it, because it's only fair, given my claim of the topic being undocumented.