Internals – NTCore

Time Travel: Running Python 3.7 on XP

To restart my career as a technical writer, I chose a light topic. Namely, running applications compiled with new versions of Visual Studio on Windows XP. I didn’t find any prior research on the topic, but I also didn’t search much. There’s no real purpose behind this article, beyond the fact that I wanted to know what could prevent a new application to run on XP. Our target application will be the embedded version of Python 3.7 for x86.

If we try to start any new application on XP, we’ll get an error message informing us that it is not a valid Win32 application. This happens because of some fields in the Optional Header of the Portable Executable.

Most of you probably already know that you need to adjust these fields as follows:

MajorOperatingSystemVersion: 5
MinorOperatingSystemVersion: 0
MajorSubsystemVersion: 5
MinorSubsystemVersion: 0

Fortunately, it’s enough to adjust the fields in the executable we want to start (python.exe), there’s no need to adjust the DLLs as well.

If we try run the application now, we’ll get an error message due to a missing API in kernel32. So let’s turn our attention to the imports.

We have a missing vcruntime140.dll, then a bunch of “api-ms-win-*” DLLs, then only python37.dll and kernel32.dll.

The first thing which comes to mind is that in new applications we often find these “api-ms-win-*” DLLs. If we search for the prefix in the Windows directory, we’ll find a directory both in System32 and SysWOW64 called “downlevel”, which contains a huge list of these DLLs.

As we’ll see later, these DLLs aren’t actually used, but if we open one with a PE viewer, we’ll see that it contains exclusively forwarders to APIs contained in the usual suspects such as kernel32, kernelbase, user32 etc.

There’s a MSDN page documenting these DLLs.

Interestingly, in the downlevel directory we can’t find any of the files imported by python.exe. These DLLs actually expose C runtime APIs like strlen, fopen, exit and so on.

If we don’t have any prior knowledge on the topic and just do a string search inside the Windows directory for such a DLL name, we’ll find a match in C:\Windows\System32\apisetschema.dll. This DLL is special as it contains a .apiset section, whose data can easily be identified as some sort of format for mapping “api-ms-win-*” names to others.

Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   

00013AC0  C8 3A 01 00 20 00 00 00   73 00 74 00 6F 00 72 00     .:......s.t.o.r.
00013AD0  61 00 67 00 65 00 75 00   73 00 61 00 67 00 65 00     a.g.e.u.s.a.g.e.
00013AE0  2E 00 64 00 6C 00 6C 00   65 00 78 00 74 00 2D 00     ..d.l.l.e.x.t.-.
00013AF0  6D 00 73 00 2D 00 77 00   69 00 6E 00 2D 00 73 00     m.s.-.w.i.n.-.s.
00013B00  78 00 73 00 2D 00 6F 00   6C 00 65 00 61 00 75 00     x.s.-.o.l.e.a.u.
00013B10  74 00 6F 00 6D 00 61 00   74 00 69 00 6F 00 6E 00     t.o.m.a.t.i.o.n.
00013B20  2D 00 6C 00 31 00 2D 00   31 00 2D 00 30 00 00 00     -.l.1.-.1.-.0...
00013B30  00 00 00 00 00 00 00 00   00 00 00 00 44 3B 01 00     ............D;..
00013B40  0E 00 00 00 73 00 78 00   73 00 2E 00 64 00 6C 00     ....s.x.s...d.l.
00013B50  6C 00 00 00 65 00 78 00   74 00 2D 00 6D 00 73 00     l...e.x.t.-.m.s.

Searching on the web, the first resource I found on this topic were two articles on the blog of Quarkslab (Part 1 and Part 2). However, I quickly figured that, while useful, they were too dated to provide me with up-to-date structures to parse the data. In fact, the second article shows a version number of 2 and at the time of my writing the version number is 6.

Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   

00000000  06 00 00 00                                           ....

Just for completeness, after the publication of the current article, I was made aware of an article by deroko about the topic predating those of Quarkslab.

Anyway, I searched some more and found a code snippet by Alex Ionescu and Pavel Yosifovich in the repository of Windows Internals. I took the following structures from there.

typedef struct _API_SET_NAMESPACE {
	ULONG Version;
	ULONG Size;
	ULONG Flags;
	ULONG Count;
	ULONG EntryOffset;
	ULONG HashOffset;
	ULONG HashFactor;
} API_SET_NAMESPACE, *PAPI_SET_NAMESPACE;

typedef struct _API_SET_HASH_ENTRY {
	ULONG Hash;
	ULONG Index;
} API_SET_HASH_ENTRY, *PAPI_SET_HASH_ENTRY;

typedef struct _API_SET_NAMESPACE_ENTRY {
	ULONG Flags;
	ULONG NameOffset;
	ULONG NameLength;
	ULONG HashedLength;
	ULONG ValueOffset;
	ULONG ValueCount;
} API_SET_NAMESPACE_ENTRY, *PAPI_SET_NAMESPACE_ENTRY;

typedef struct _API_SET_VALUE_ENTRY {
	ULONG Flags;
	ULONG NameOffset;
	ULONG NameLength;
	ULONG ValueOffset;
	ULONG ValueLength;
} API_SET_VALUE_ENTRY, *PAPI_SET_VALUE_ENTRY;

The data starts with a API_SET_NAMESPACE structure.

Count specifies the number of API_SET_NAMESPACE_ENTRY and API_SET_HASH_ENTRY structures. EntryOffset points to the start of the array of API_SET_NAMESPACE_ENTRY structures, which in our case comes exactly after API_SET_NAMESPACE.

Every API_SET_NAMESPACE_ENTRY points to the name of the “api-ms-win-*” DLL via the NameOffset field, while ValueOffset and ValueCount specify the position and count of API_SET_VALUE_ENTRY structures. The API_SET_VALUE_ENTRY structure yields the resolution values (e.g. kernel32.dll, kernelbase.dll) for the given “api-ms-win-*” DLL.

With this information we can already write a small script to map the new names to the actual DLLs.

import os
from Pro.Core import *
from Pro.PE import *

def main():
    c = createContainerFromFile("C:\\Windows\\System32\\apisetschema.dll")
    pe = PEObject()
    if not pe.Load(c):
        print("couldn't load apisetschema.dll")
        return
    sect = pe.SectionHeaders()
    nsects = sect.Count()
    d = None
    for i in range(nsects):
        if sect.Bytes(0) == b".apiset\x00":
            cs = pe.SectionData(i)[0]
            d = CFFObject()
            d.Load(cs)
            break
        sect = sect.Add(1)
    if not d:
        print("could find .apiset section")
        return
    n, ret = d.ReadUInt32(12)
    offs, ret = d.ReadUInt32(16)
    for i in range(n):
        name_offs, ret = d.ReadUInt32(offs + 4)
        name_size, ret = d.ReadUInt32(offs + 8)
        name = d.Read(name_offs, name_size).decode("utf-16")
        line = str(i) + ") " + name + " ->"
        values_offs, ret = d.ReadUInt32(offs + 16)
        value_count, ret = d.ReadUInt32(offs + 20)
        for j in range(value_count):
            vname_offs, ret = d.ReadUInt32(values_offs + 12)
            vname_size, ret = d.ReadUInt32(values_offs + 16)
            vname = d.Read(vname_offs, vname_size).decode("utf-16")
            line += " " + vname
            values_offs += 20
        offs += 24
        print(line)
     
main()

This code can be executed with Cerbero Profiler from command line as “cerpro.exe -r apisetschema.py”. These are the first lines of the produced output:

0) api-ms-onecoreuap-print-render-l1-1-0 -> printrenderapihost.dll
1) api-ms-onecoreuap-settingsync-status-l1-1-0 -> settingsynccore.dll
2) api-ms-win-appmodel-identity-l1-2-0 -> kernel.appcore.dll
3) api-ms-win-appmodel-runtime-internal-l1-1-3 -> kernel.appcore.dll
4) api-ms-win-appmodel-runtime-l1-1-2 -> kernel.appcore.dll
5) api-ms-win-appmodel-state-l1-1-2 -> kernel.appcore.dll
6) api-ms-win-appmodel-state-l1-2-0 -> kernel.appcore.dll
7) api-ms-win-appmodel-unlock-l1-1-0 -> kernel.appcore.dll
8) api-ms-win-base-bootconfig-l1-1-0 -> advapi32.dll
9) api-ms-win-base-util-l1-1-0 -> advapi32.dll
10) api-ms-win-composition-redirection-l1-1-0 -> dwmredir.dll
11) api-ms-win-composition-windowmanager-l1-1-0 -> udwm.dll
12) api-ms-win-core-apiquery-l1-1-0 -> ntdll.dll
13) api-ms-win-core-appcompat-l1-1-1 -> kernelbase.dll
14) api-ms-win-core-appinit-l1-1-0 -> kernel32.dll kernelbase.dll
...

Going back to API_SET_NAMESPACE, its field HashOffset points to an array of API_SET_HASH_ENTRY structures. These structures, as we’ll see in a moment, are used by the Windows loader to quickly index a “api-ms-win-*” DLL name. The Hash field is effectively the hash of the name, calculated by taking into consideration both HashFactor and HashedLength, while Index points to the associated API_SET_NAMESPACE_ENTRY entry.

The code which does the hashing is inside the function LdrpPreprocessDllName in ntdll:

77EA1DAC mov       ebx, dword ptr [ebx + 0x18]      ; HashFactor in ebx 
77EA1DAF mov       esi, eax                         ; esi = dll name length                    
77EA1DB1 movzx     eax, word ptr [edx]              ; one unicode character into eax
77EA1DB4 lea       ecx, dword ptr [eax - 0x41]      ; ecx = character - 0x41
77EA1DB7 cmp       cx, 0x19                         ; compare to 0x19
77EA1DBB jbe       0x77ea2392                       ; if below or equal, bail out
77EA1DC1 mov       ecx, ebx                         ; ecx = HashFactor
77EA1DC3 movzx     eax, ax
77EA1DC6 imul      ecx, edi                         ; ecx *= edi
77EA1DC9 add       edx, 2                           ; edx += 2
77EA1DCC add       ecx, eax                         ; ecx += eax
77EA1DCE mov       edi, ecx                         ; edi = ecx
77EA1DD0 sub       esi, 1                           ; len -= 1
77EA1DD3 jne       0x77ea1db1                       ; if not zero repeat from 77EA1DB1

Or more simply in C code:

const char *p = dllname;
int HashedLength = 0x23;
int HashFactor = 0x1F;
int Hash = 0;
for (int i = 0; i < HashedLength; i++, p++)
    Hash = (Hash * HashFactor) + *p;

As a practical example, let's take the DLL name "api-ms-win-core-processthreads-l1-1-2.dll". Its hash would be 0x445B4DF3. If we find its matching API_SET_HASH_ENTRY entry, we'll have the Index to the associated API_SET_NAMESPACE_ENTRY structure.

Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   

00014DA0                                        F3 4D 5B 44                 .M[D
00014DB0  5B 00 00 00                                           [...

So, 0x5b (or 91) is the index. By going back to the output of mappings, we can see that it matches.

91) api-ms-win-core-processthreads-l1-1-3 -> kernel32.dll kernelbase.dll

By inspecting the same output, we can also notice that all C runtime DLLs are resolved to ucrtbase.dll.

167) api-ms-win-crt-conio-l1-1-0 -> ucrtbase.dll
168) api-ms-win-crt-convert-l1-1-0 -> ucrtbase.dll
169) api-ms-win-crt-environment-l1-1-0 -> ucrtbase.dll
170) api-ms-win-crt-filesystem-l1-1-0 -> ucrtbase.dll
171) api-ms-win-crt-heap-l1-1-0 -> ucrtbase.dll
172) api-ms-win-crt-locale-l1-1-0 -> ucrtbase.dll
173) api-ms-win-crt-math-l1-1-0 -> ucrtbase.dll
174) api-ms-win-crt-multibyte-l1-1-0 -> ucrtbase.dll
175) api-ms-win-crt-private-l1-1-0 -> ucrtbase.dll
176) api-ms-win-crt-process-l1-1-0 -> ucrtbase.dll
177) api-ms-win-crt-runtime-l1-1-0 -> ucrtbase.dll
178) api-ms-win-crt-stdio-l1-1-0 -> ucrtbase.dll
179) api-ms-win-crt-string-l1-1-0 -> ucrtbase.dll
180) api-ms-win-crt-time-l1-1-0 -> ucrtbase.dll
181) api-ms-win-crt-utility-l1-1-0 -> ucrtbase.dll

I was already resigned at having to figure out how to support the C runtime on XP, when I noticed that Microsoft actually supports the deployment of the runtime on it. The following excerpt from MSDN says as much:

If you currently use the VCRedist (our redistributable package files), then things will just work for you as they did before. The Visual Studio 2015 VCRedist package includes the above mentioned Windows Update packages, so simply installing the VCRedist will install both the Visual C++ libraries and the Universal CRT. This is our recommended deployment mechanism. On Windows XP, for which there is no Universal CRT Windows Update MSU, the VCRedist will deploy the Universal CRT itself.

Which means that on Windows editions coming after XP the support is provided via Windows Update, but on XP we have to deploy the files ourselves. We can find the files to deploy inside C:\Program Files (x86)\Windows Kits\10\Redist\ucrt\DLLs. This path contains three sub-directories: x86, x64 and arm. We're obviously interested in the x86 one. The files contained in it are many (42), apparently the most common "api-ms-win-*" DLLs and ucrtbase.dll. We can deploy those files onto XP to make our application work. We are still missing the vcruntime140.dll, but we can take that DLL from the Visual C++ installation. In fact, that DLL is intended to be deployed, while the Universal CRT (ucrtbase.dll) is intended to be part of the Windows system.

This satisfies our dependencies in terms of DLLs. However, Windows introduced many new APIs over the years which aren't present on XP. So I wrote a script to test the compatibility of an application by checking the imported APIs against the API exported by the DLLs on XP. The command line for it is "cerpro.exe -r xpcompat.py application_path". It will check all the PE files in the specified directory.

import os, sys
from Pro.Core import *
from Pro.PE import *

xp_system32 = "C:\\Users\\Admin\\Desktop\\system32"
apisetschema = { "OMITTED FOR BREVITY" }
cached_apis = {}
missing_result = {}

def getAPIs(dllpath):
    apis = {}
    c = createContainerFromFile(dllpath)
    dll = PEObject()
    if not dll.Load(c):
        print("error: couldn't load dll")
        return apis
    ordbase = dll.ExportDirectory().Num("Base")
    functions = dll.ExportDirectoryFunctions()
    names = dll.ExportDirectoryNames()
    nameords = dll.ExportDirectoryNameOrdinals()
    n = functions.Count()
    it = functions.iterator()
    for x in range(n):
        func = it.next()
        ep = func.Num(0)
        if ep == 0:
            continue
        apiord = str(ordbase + x)
        n2 = nameords.Count()
        it2 = nameords.iterator()
        name_found = False
        for y in range(n2):
            no = it2.next()
            if no.Num(0) == x:
                name = names.At(y)
                offs = dll.RvaToOffset(name.Num(0))
                name, ret = dll.ReadUInt8String(offs, 500)
                apiname = name.decode("ascii")
                apis[apiname] = apiord
                apis[apiord] = apiname
                name_found = True
                break
        if not name_found:
            apis[apiord] = apiord
    return apis
    
def checkMissingAPIs(pe, ndescr, dllname, xpdll_apis):
    ordfl = pe.ImportOrdinalFlag()
    ofts = pe.ImportThunks(ndescr)
    it = ofts.iterator()
    while it.hasNext():
        ft = it.next().Num(0)
        if (ft & ordfl) != 0:
            name = str(ft ^ ordfl)
        else:
            offs = pe.RvaToOffset(ft)
            name, ret = pe.ReadUInt8String(offs + 2, 400)
            if not ret:
                continue
            name = name.decode("ascii")
        if not name in xpdll_apis:
            print("       ", "missing:", name)
            temp = missing_result.get(dllname, set())
            temp.add(name)
            missing_result[dllname] = temp

def verifyXPCompatibility(fname):
    print("file:", fname)
    c = createContainerFromFile(fname)
    pe = PEObject()
    if not pe.Load(c):
        return
    it = pe.ImportDescriptors().iterator()
    ndescr = -1
    while it.hasNext():
        descr = it.next()
        ndescr += 1
        offs = pe.RvaToOffset(descr.Num("Name"))
        name, ret = pe.ReadUInt8String(offs, 400)
        if not ret:
            continue
        name = name.decode("ascii").lower()
        if not name.endswith(".dll"):
            continue
        fwdlls = apisetschema.get(name[:-4], [])
        if len(fwdlls) == 0:
            print("   ", name)
        else:
            fwdll = fwdlls[0]
            print("   ", name, "->", fwdll)
            name = fwdll
        if name == "ucrtbase.dll":
            continue
        xpdll_path = os.path.join(xp_system32, name)
        if not os.path.isfile(xpdll_path):
            continue
        if not name in cached_apis:
            cached_apis[name] = getAPIs(xpdll_path)
        checkMissingAPIs(pe, ndescr, name, cached_apis[name])
    print()
        
def main():
    if os.path.isfile(sys.argv[1]):
        verifyXPCompatibility(sys.argv[1])
    else:
        files = [os.path.join(dp, f) for dp, dn, fn in os.walk(sys.argv[1]) for f in fn]
        for fname in files:
            with open(fname, "rb") as f:
                if f.read(2) == b"MZ":
                    verifyXPCompatibility(fname)
            
    # summary
    n = 0
    print("\nsummary:")
    for rdll, rapis in missing_result.items():
        print("   ", rdll)
        for rapi in rapis:
            print("       ", "missing:", rapi)
            n += 1
    print("total of missing APIs:", str(n))

main()

I had to omit the contents of the apisetschema global variable for the sake of brevity. You can download the full script from here. The system32 directory referenced in the code is the one of Windows XP, which I copied to my desktop.

And here are the relevant excerpts from the output:

file: python-3.7.0-embed-win32\python37.dll
    version.dll
    shlwapi.dll
    ws2_32.dll
    kernel32.dll
        missing: GetFinalPathNameByHandleW
        missing: InitializeProcThreadAttributeList
        missing: UpdateProcThreadAttribute
        missing: DeleteProcThreadAttributeList
        missing: GetTickCount64
    advapi32.dll
    vcruntime140.dll
    api-ms-win-crt-runtime-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-math-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-locale-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-string-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-stdio-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-convert-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-time-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-environment-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-process-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-heap-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-conio-l1-1-0.dll -> ucrtbase.dll
    api-ms-win-crt-filesystem-l1-1-0.dll -> ucrtbase.dll

[...]

file: python-3.7.0-embed-win32\_socket.pyd
    ws2_32.dll
        missing: inet_ntop
        missing: inet_pton
    kernel32.dll
    python37.dll
    vcruntime140.dll
    api-ms-win-crt-runtime-l1-1-0.dll -> ucrtbase.dll

[...]

summary:
    kernel32.dll
        missing: InitializeProcThreadAttributeList
        missing: GetTickCount64
        missing: GetFinalPathNameByHandleW
        missing: UpdateProcThreadAttribute
        missing: DeleteProcThreadAttributeList
    ws2_32.dll
        missing: inet_pton
        missing: inet_ntop
total of missing APIs: 7

We're missing 5 APIs from kernel32.dll and 2 from ws2_32.dll, but the Winsock APIs are imported just by _socket.pyd, a module which is loaded only when a network operation is performed by Python. So, in theory, we can focus our efforts on the missing kernel32 APIs for now.

My plan was to create a fake kernel32.dll, called xernel32.dll, containing forwarders for most APIs and real implementations only for the missing ones. Here's a script to create C++ files containing forwarders for all APIs of common DLLs on Windows 10:

import os, sys
from Pro.Core import *
from Pro.PE import *

xpsys32path = "C:\\Users\\Admin\\Desktop\\system32"
sys32path = "C:\\Windows\\SysWOW64"

def getAPIs(dllpath):
    pass # same code as above
    
def isOrdinal(i):
    try:
        int(i)
        return True
    except:
        return False
    
def createShadowDll(name):
    xpdllpath = os.path.join(xpsys32path, name + ".dll")
    xpapis = getAPIs(xpdllpath)
    dllpath = os.path.join(sys32path, name + ".dll")
    apis = sorted(getAPIs(dllpath).keys())
    if len(apis) != 0:
        with open(name + ".cpp", "w") as f:
            f.write("#include \n\n")
            for a in apis:
                comment = " // XP" if a in xpapis else ""
                if not isOrdinal(a):
                    f.write("#pragma comment(linker, \"/export:" + a + "=" + name + "." + a + "\")" + comment + "\n")
                #
            print("created", name + ".cpp")
    
def main():
    dlls = ("advapi32", "comdlg32", "gdi32", "iphlpapi", "kernel32", "ole32", "oleaut32", "shell32", "shlwapi", "user32", "uxtheme", "ws2_32")
    for dll in dlls:
        createShadowDll(dll)

main()

It creates files like the following kernel32.cpp:

#include 

#pragma comment(linker, "/export:AcquireSRWLockExclusive=kernel32.AcquireSRWLockExclusive")
#pragma comment(linker, "/export:AcquireSRWLockShared=kernel32.AcquireSRWLockShared")
#pragma comment(linker, "/export:ActivateActCtx=kernel32.ActivateActCtx") // XP
#pragma comment(linker, "/export:ActivateActCtxWorker=kernel32.ActivateActCtxWorker")
#pragma comment(linker, "/export:AddAtomA=kernel32.AddAtomA") // XP
#pragma comment(linker, "/export:AddAtomW=kernel32.AddAtomW") // XP
#pragma comment(linker, "/export:AddConsoleAliasA=kernel32.AddConsoleAliasA") // XP
#pragma comment(linker, "/export:AddConsoleAliasW=kernel32.AddConsoleAliasW") // XP
#pragma comment(linker, "/export:AddDllDirectory=kernel32.AddDllDirectory")
[...]

The comment on the right ("// XP") indicates whether the forwarded API is present on XP or not. We can provide real implementations exclusively for the APIs we want. The Windows loader doesn't care whether we forward functions which don't exist as long as they aren't imported.

The APIs we need to support are the following:

GetTickCount64: I just called GetTickCount, not really important
GetFinalPathNameByHandleW: took the implementation from Wine, but had to adapt it slightly
InitializeProcThreadAttributeList: took the implementation from Wine
UpdateProcThreadAttribute: same
DeleteProcThreadAttributeList: same

I have to be grateful to the Wine project here, as it provided useful implementations, which saved me the effort.

I called the attempt at a support runtime for older Windows versions "XP Time Machine Runtime" and you can find the repository here. I compiled it with Visual Studio 2013 and cmake.

So that we have now our xernel32.dll, the only thing we have to do is to rename the imported DLL inside python37.dll.

Let's try to start python.exe.

Awesome.

Of course, we're still not completely done, as we didn't implement the missing Winsock APIs, but perhaps this and some more could be the content of a second part to this article.

Ctor conflicts

Perhaps the content of this post is trivial and widely known(?), but I just spent some time fixing a bug related to the following C++ behavior.

Let’s take a look at this code snippet:

// main.cpp ------------------------------

#include 

void bar();
void foo();

int main(int argc, const char *argv[])
{
    bar();
    foo();
    return 0;
}

// a.cpp ---------------------------------

#include 

struct A
{
    A() { printf("apple\n"); }
};

void bar()
{
    new A;
}

// b.cpp ---------------------------------

#include 

struct A
{
    A() { printf("orange\n"); }
};

void foo()
{
    new A;
}

The output of the code above is:

apple
apple

Whether we compile it with VC++ or g++, the result is the same.

The problem is that although the struct or class is declared locally the name of the constructor is considered a global symbol. So while the allocation size of the struct or class is correct, the constructor being invoked is always the first one encountered by the compiler, which in this case is the one which prints ‘apple’.

The problem here is that the compiler doesn’t warn the user in any way that the wrong constructor is being called and in a large project with hundreds of files it may very well be that two constructors collide.

Since namespaces are part of the name of the symbol, the code above can be fixed by adding a namespace:

namespace N
{
struct A
{
    A() { printf("orange\n"); }
};
}
using namespace N;

void foo()
{
    new A;
}

Now the correct constructor will be called.

I wrote a small (dumb) Python script to detect possible ctor conflicts. It just looks for struct or class declarations and reports duplicate symbol names. It’s far from perfect.

# ctor_conflicts.py
import os, sys, re

source_extensions = ["h", "hxx", "hpp", "cpp", "cxx"]

symbols = { }
psym = re.compile("(typedef\\s+)?(struct|class)\\s+([a-zA-Z_][a-zA-Z0-9_]*)(\\s+)?([{])?")

def processSourceFile(fname):
    with open(fname) as f:
        content = f.readlines()
    n = len(content)
    i = 0
    while i < n:
        m = psym.search(content[i])
        i += 1
        if m == None:
            continue
        symname = m.group(3)
        # exclude some recurring symbols in different projects
        if symname == "Dialog" or symname == "MainWindow":
            continue
        # make sure a bracket is present
        if m.group(5) == None and (i >= n or content[i].startswith("{") == False):
            continue
        loc = fname + ":" + str(i)
        if symname in symbols:
            # found a possible collision
            print("Possible collision of '" + symname + "' in:")
            print(symbols[symname])
            print(loc)
            print("")
        else:
            symbols[symname] = loc

def walkFiles(path):
    for root, dirs, files in os.walk(path):
        for f in files:
            # skip SWIG wrappers
            if f.find("PyWrapWin") != -1:
                continue
            # skip Qt ui files
            if f.startswith("ui_"):
                continue
            fname = root + os.sep + f
            ext = os.path.splitext(fname)[1]
            if ext != None and len(ext) > 1 and ext[1:] in source_extensions:
                processSourceFile(fname)


if __name__ == '__main__':
    nargs = len(sys.argv)
    if nargs < 2:
        path = os.getcwd()
    else:
        path = sys.argv[1]
    walkFiles(path)

In my opinion this could be handled better on the compiler side, at least by giving a warning.

ADDENDUM: Myria ‏(@Myriachan) explained the compiler internals on this one on twitter:

I'm just surprised that it doesn't cause a "duplicate symbol" linker error. Symbol flagged "weak" from being inline, maybe? [...] Member functions defined inside classes like that are automatically "inline" by C++ standard. [...] The "inline" keyword has two meanings: hint to compiler that inlining machine code may be wise, and making symbol weak. [...] Regardless of whether the compiler chooses to inline machine code within calling functions, the weak symbol part still applies. [...] It is as if all inline functions (including functions defined inside classes) have __declspec(selectany) on them, in MSVC terms. [...] Without this behavior, if you ever had a class in a header with functions defined, the compiler would either have to always inline the machine code, or you'd have to use #ifdef nonsense to avoid more than one .cpp defining the function.

The explanation is the correct one. And yes, if we define the ctor outside of the class the compiler does generate an error.

The logic mismatch here is that local structures in C do exist, local ctors in C++ don't. So, the correct struct is allocated but the wrong ctor is being called. Also, while the symbol is weak for the reasons explained by Myria, the compiler could still give an error if the ctor code doesn't match across files.

So the rule here could be: if you have local classes, avoid defining the ctor inside the class. If you already have a conflict as I did and don't want to change the code, you can fix it with a namespace as shown above.

Creating undetected malware for OS X

This article was originally published on cerbero-blog.com on October the 7th, 2013.

While this PoC is about static analysis, it’s very different than applying a packer to a malware. OS X uses an internal mechanism to load encrypted Apple executables and we’re going to exploit the same mechanism to defeat current anti-malware solutions.

OS X implements two encryption systems for its executables (Mach-O). The first one is implemented through the LC_ENCRYPTION_INFO loader command. Here’s the code which handles this command:

            case LC_ENCRYPTION_INFO:
                if (pass != 3)
                    break;
                ret = set_code_unprotect(
                    (struct encryption_info_command *) lcp,
                    addr, map, slide, vp);
                if (ret != LOAD_SUCCESS) {
                    printf("proc %d: set_code_unprotect() error %d "
                           "for file \"%s\"\n",
                           p->p_pid, ret, vp->v_name);
                    /* Don't let the app run if it's
                     * encrypted but we failed to set up the
                     * decrypter */
                     psignal(p, SIGKILL);
                }
                break;

This code calls the set_code_unprotect function which sets up the decryption through text_crypter_create:

    /* set up decrypter first */
    kr=text_crypter_create(&crypt_info, cryptname, (void*)vpath);

The text_crypter_create function is actually a function pointer registered through the text_crypter_create_hook_set kernel API. While this system can allow for external components to register themselves and handle decryption requests, we couldn’t see it in use on current versions of OS X.

The second encryption mechanism which is actually being used internally by Apple doesn’t require a loader command. Instead, it signals encrypted segments through a flag.

The ‘PROTECTED‘ flag is checked while loading a segment in the load_segment function:

if (scp->flags & SG_PROTECTED_VERSION_1) {
    ret = unprotect_segment(scp->fileoff,
                scp->filesize,
                vp,
                pager_offset,
                map,
                map_addr,
                map_size);
} else {
    ret = LOAD_SUCCESS;
}

The unprotect_segment function sets up the range to be decrypted, the decryption function and method. It then calls vm_map_apple_protected.

#define APPLE_UNPROTECTED_HEADER_SIZE   (3 * PAGE_SIZE_64)

static load_return_t
unprotect_segment(
    uint64_t    file_off,
    uint64_t    file_size,
    struct vnode        *vp,
    off_t               macho_offset,
    vm_map_t    map,
    vm_map_offset_t     map_addr,
    vm_map_size_t       map_size)
{
    kern_return_t       kr;
    /*
     * The first APPLE_UNPROTECTED_HEADER_SIZE bytes (from offset 0 of
     * this part of a Universal binary) are not protected...
     * The rest needs to be "transformed".
     */
    if (file_off <= APPLE_UNPROTECTED_HEADER_SIZE &&
        file_off + file_size <= APPLE_UNPROTECTED_HEADER_SIZE) {
        /* it's all unprotected, nothing to do... */
        kr = KERN_SUCCESS;
    } else {
        if (file_off <= APPLE_UNPROTECTED_HEADER_SIZE) {
            /*
             * We start mapping in the unprotected area.
             * Skip the unprotected part...
             */
            vm_map_offset_t     delta;
            delta = APPLE_UNPROTECTED_HEADER_SIZE;
            delta -= file_off;
            map_addr += delta;
            map_size -= delta;
        }
        /* ... transform the rest of the mapping. */
        struct pager_crypt_info crypt_info;
        crypt_info.page_decrypt = dsmos_page_transform;
        crypt_info.crypt_ops = NULL;
        crypt_info.crypt_end = NULL;
#pragma unused(vp, macho_offset)
        crypt_info.crypt_ops = (void *)0x2e69cf40;
        kr = vm_map_apple_protected(map,
                        map_addr,
                        map_addr + map_size,
                        &crypt_info);
    }
    if (kr != KERN_SUCCESS) {
        return LOAD_FAILURE;
    }
    return LOAD_SUCCESS;
}

Two things about the code above. The first 3 pages (0x3000) of a Mach-O can't be encrypted/decrypted. And, as can be noticed, the decryption function is dsmos_page_transform.

Just like text_crypter_create even dsmos_page_transform is a function pointer which is set through the dsmos_page_transform_hook kernel API. This API is called by the kernel extension "Dont Steal Mac OS X.kext", allowing for the decryption logic to be contained outside of the kernel in a private kernel extension by Apple.

Apple uses this technology to encrypt some of its own core components like "Finder.app" or "Dock.app". On current OS X systems this mechanism doesn't provide much of a protection against reverse engineering in the sense that attaching a debugger and dumping the memory is sufficient to retrieve the decrypted executable.

However, this mechanism can be abused by encrypting malware which will no longer be detected by the static analysis technologies of current security solutions.

To demonstrate this claim we took a known OS X malware:

Since this is our public disclosure, we will say that the detection rate stood at about 20-25.

And encrypted it:

After encryption has been applied, the malware is no longer detected by scanners at VirusTotal. The problem is that OS X has no problem in loading and executing the encrypted malware.

The difference compared to a packer is that the decryption code is not present in the executable itself and so the static analysis engine can't recognize a stub or base itself on other data present in the executable, since all segments can be encrypted. Thus, the scan engine also isn't able to execute the encrypted code in its own virtual machine for a more dynamic analysis.

Two other important things about the encryption system is that the private key is the same and is shared across different versions of OS X. And it's not a chained encryption either: but per-page. Which means that changing data in the first encrypted page doesn't affect the second encrypted page and so on.

Our flagship product, Cerbero Profiler, which is an interactive file analysis infrastructure, is able to decrypt protected executables. To dump an unprotected copy of the Mach-O just perform a “Select all” (Ctrl+A) in the main hex view and then click on “Copy into new file” like in the screen-shot below.

The saved file can be executed on OS X or inspected with other tools.

Of course, the decryption can be achieved programmatically through our Python SDK as well. Just load the Mach-O file, initialize it (ProcessLoadCommands) and save to disk the stream returned by the GetStream.

A solution to mitigate this problem could be one of the following:

Implement the decryption mechanism like we did.
Check the presence of encrypted segments. If they are present, trust only executables with a valid code signature issued by Apple.
3. Check the presence of encrypted segments. If they are present, trust only executables whose cryptographic hash matches a trusted one.

This kind of internal protection system should be avoided in an operating system, because it can be abused.

After we shared our internal report, VirusBarrier Team at Intego sent us the following previous research about Apple Binary Protection:

http://osxbook.com/book/bonus/chapter7/binaryprotection/
http://osxbook.com/book/bonus/chapter7/tpmdrmmyth/
https://github.com/AlanQuatermain/appencryptor

The research talks about the old implementation of the binary protection. The current page transform hook looks like this:

  if (v9 == 0x2E69CF40) // this is the constant used in the current kernel
  {
    // current decryption algo
  }
  else
  {
    if (v9 != 0xC2286295)
    {
      // ...
      if (!some_bool)
      {
        printf("DSMOS++: WARNING -- Old Kernel\n");
        ++some_bool;
      }
    }
    // old decryption algo
  }

VirusBarrier Team also reported the following code by Steve Nygard in his class-dump utility:

https://bitbucket.org/nygard/class-dump/commits/5908ac605b5dfe9bfe2a50edbc0fbd7ab16fd09c

This is the correct decryption code. In fact, the kernel extension by Apple, just as in the code above provided by Steve Nygard, uses the OpenSSL implementation of Blowfish.

We didn't know about Nygard's code, so we did our own research about the topic and applied it to malware. We would like to thank VirusBarrier Team at Intego for its cooperation and quick addressing of the issue. At the time of writing we're not aware of any security solution for OS X, apart VirusBarrier, which isn't tricked by this technique. We even tested some of the most important security solutions individually on a local machine.

The current 0.9.9 version of Cerbero Profiler already implements the decryption of Mach-Os, even though it's not explicitly written in the changelist.

We didn't implement the old decryption method, because it didn't make much sense in our case and we're not aware of a clean way to automatically establish whether the file is old and therefore uses said encryption.

These two claims need a clarification. If we take a look at Nygard's code, we can see a check to establish the encryption method used:

#define CDSegmentProtectedMagic_None 0
#define CDSegmentProtectedMagic_AES 0xc2286295
#define CDSegmentProtectedMagic_Blowfish 0x2e69cf40

            if (magic == CDSegmentProtectedMagic_None) {
                // ...
            } else if (magic == CDSegmentProtectedMagic_Blowfish) {
                // 10.6 decryption
                // ...
            } else if (magic == CDSegmentProtectedMagic_AES) {
                // ...
            }

It checks the first dword in the encrypted segment (after the initial three non-encrypted pages) to decide which decryption algorithm should be used. This logic has a problem, because it assumes that the first encrypted block is full of 0s, so that when encrypted with AES it produces a certain magic and when encrypted with Blowfish another one. This logic fails in the case the first block contains values other than 0. In fact, some samples we encrypted didn't produce a magic for this exact reason.

Also, current versions of OS X don't rely on a magic check and don't support AES encryption. As we can see from the code displayed at the beginning of the article, the kernel doesn't read the magic dword and just sets the Blowfish magic value as a constant:

        crypt_info.crypt_ops = (void *)0x2e69cf40;

So while checking the magic is useful for normal cases, security solutions can't rely on it or else they can be easily tricked into using the wrong decryption algorithm.

MUI files under the hood

Have you ever copied after Vista a system file like notepad.exe onto the desktop and tried to execute it? Have you ever tried after Vista to modify the resources of a system file like regedit.exe? It’s most likely that neither of the two was a successful operation.

This will be very brief because the topic is very limited and because of my lack of time: bear with me. 🙂

If you try to copy, for instance, notepad.exe onto the desktop and run it in a debugger you will notice that it fails in its initialization routine when trying to load its accelerators. You take a look at the HINSTANCE passed to LoadAccelerators and notice that it’s NULL. You open notepad.exe in a resource viewer and notice that it doesn’t contain accelerator resources. Thus, you realize that the global instance is associated to some external resource as well. Go back to the system folder where you took the system executable and you’ll notice language directories such as “en-US”. Just copy the one which identifies the language of your system to the same directory of notepad.exe. You’ll notice that now notepad.exe runs correctly.

Vista introduced the separation between binary and language dependent resources to allow a single Windows image to contain more than just one language. You can obtain more information about the development aspects on MSDN.

The language directory contains files with names such as “notepad.exe.mui”, one for every file they provide resources for (including dlls). These are very basic PE files which contain only a resource directory and are loaded into the address space of the process as they are.

These files are associated to the main file in two ways:

1) By name: just rename the notepad to test.exe and the MUI file accordingly and it still works.
2) Via resource, as we’ll see.

If you open both notepad.exe and its MUI file with a resource viewer, you’ll see they both contain a “MUI” resource. What this data contains can be roughly understood from the MSDN or SDK:

//
// Information about a MUI file, used as input/output in GetFileMUIInfo
// All offsets are relative to start of the structure. Offsets with value 0 mean empty field.
//

typedef struct _FILEMUIINFO {
    DWORD       dwSize;                 // Size of the structure including buffer size [in]
    DWORD       dwVersion;              // Version of the structure [in]
    DWORD       dwFileType;             // Type of the file [out]
    BYTE        pChecksum[16];          // Checksum of the file [out]
    BYTE        pServiceChecksum[16];   // Checksum of the file [out]
    DWORD       dwLanguageNameOffset;   // Language name of the file [out]
    DWORD       dwTypeIDMainSize;       // Number of TypeIDs in main module [out]
    DWORD       dwTypeIDMainOffset;     // Array of TypeIDs (DWORD) in main module [out]
    DWORD       dwTypeNameMainOffset;   // Multistring array of TypeNames in main module [out]
    DWORD       dwTypeIDMUISize;        // Number of TypeIDs in MUI module [out]
    DWORD       dwTypeIDMUIOffset;      // Array of TypeIDs (DWORD) in MUI module [out]
    DWORD       dwTypeNameMUIOffset;    // Multistring array of TypeNames in MUI module [out]
    BYTE        abBuffer[8];             // Buffer for extra data [in] (Size 4 is for padding)
} FILEMUIINFO, *PFILEMUIINFO;

You’ll find this structure in WinNls.h. However, this structure is for GetFileMUIInfo, it doesn’t match the physical data.

Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   

00000000  CD FE CD FE C8 00 00 00   00 00 01 00 00 00 00 00     ................
00000010  12 00 00 00 00 00 00 00   00 00 00 00 EC 6C C4 C4     .............l..
00000020  FF 7C C9 CC F8 03 C7 B3   8C 8A 67 51 11 72 DC 72     .|........gQ.r.r
00000030  80 73 67 9E AB 20 3D FC   AA D4 2F 04 00 00 00 00     .sg...=.../.....
00000040  00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00     ................
00000050  00 00 00 00 88 00 00 00   0E 00 00 00 98 00 00 00     ................
00000060  20 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00     ................
00000070  00 00 00 00 B8 00 00 00   0C 00 00 00 00 00 00 00     ................
00000080  00 00 00 00 00 00 00 00   4D 00 55 00 49 00 00 00     ........M.U.I...
00000090  00 00 00 00 00 00 00 00   02 00 00 00 03 00 00 00     ................
000000A0  04 00 00 00 05 00 00 00   06 00 00 00 09 00 00 00     ................
000000B0  0E 00 00 00 10 00 00 00   65 00 6E 00 2D 00 55 00     ........e.n.-.U.
000000C0  53 00 00 00 00 00 00 00                               S.......

The first DWORD is clearly a signature. If you change it, the MUI is invalidated and notepad won’t run. It is followed by another DWORD describing the size of the structure (including the signature).

Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   

00000010                                        EC 6C C4 C4                 .l..
00000020  FF 7C C9 CC F8 03 C7 B3   8C 8A 67 51 11 72 DC 72     .|........gQ.r.r
00000030  80 73 67 9E AB 20 3D FC   AA D4 2F 04                 .sg...=.../.

These are the two checksums:

  BYTE  pChecksum[16];
  BYTE  pServiceChecksum[16];

These two checksums are probably in the same order of the structure. They both match the ones contained in the MUI file and if you change the second one, the application won’t run.

There are no other association criteria: I changed both the main file and the MUI file (by using a real DLL and just replacing the resource directory with the one of the MUI file) and it still worked.

About the second matter mentioned in the beginning: modification of resources. If you try to add/replace an icon to/in notepad.exe you will most likely not succeed. This is because as mentioned in the MSDN:

There are some restrictions on resource updates in files that contain Resource Configuration(RC Config) data: LN files and the associated .mui files. Details on which types of resources are allowed to be updated in these files are in the Remarks section for the UpdateResource function.

Basically, UpdateResource doesn’t work if the PE file contains a MUI resource. Now, prepare for an incredibly complicated and technically challenging hack to overcome this limitation… Ready? Rename the “MUI” resource to “CUI” or whatever, now try again and it works. Restore the MUI resource name and all is fine.

The new build of the CFF Explorer handles this automatically for your comfort.

This limitation probably broke most of the resource editors for Win32. Smart.

Filter Monitor 1.0.1

This week, after months of development of bigger projects, I found some time to windbg “ntoskrnl.exe” and write a utility. It is called Filter Monitor and shows some key filters installed by kernel mode components.

“As you probably all know the Service Descriptor Table has been a playground on x86 for all sorts of things: rootkits, anti-viruses, system monitors etc. On x64 modifying the Service Descriptor Table is no longer possible, at least not without subverting the Patch Guard technology.

Thus, programs have now to rely on the filtering/notification technologies provided by Microsoft. And that’s why I wrote this little utility which monitors some key filters.

Since I haven’t signed the driver of my utility, you have to press F8 at boot time and then select the “Disable Driver Signature Enforcement” option. If you have a multiple boot screen like myself, then you can take your time. Otherwise you have to press F8 frenetically to not miss right moment.

A disclaimer: the boot process can be a bit annoying, but the utility should be used on virtualized systems anyway, as I haven’t fully tested it yet. I doubt that it will crash your system, I guess the worst scenario is that it won’t list some filters. It should work on any Windows system starting from Vista RTM and I have provided an x86 version and an x64 version. But the truth is that I have tested only the x64 version on Windows 7 RTM. Last but not least, I can’t guarantee that this utility will work on future versions of Windows, it relies heavily on system internals.

Now, let’s run it. The supported filters/notifications at the time are these: Registry, Create Process, Create Thread and Load Image. “Registry” stands for CmRegisterCallback filters. “Create Process” for PsSetCreateProcessNotifyRoutine callbacks. “Create Thread” for PsSetCreateThreadNotifyRoutine callbacks. And “Load Image” for PsSetLoadImageNotifyRoutine callbacks.

The “Additional Info” in the list view provides internal information like the address of the callback function.

There are some default filters registered by system components, but, as you can notice, there are also Kaspersky components. That’s because some filters (like the registry filter) are not used by system components and I needed a tool which would make use of these filters for my little demonstration.

The version of Kaspersky I have installed is the latest one available on the internet which is: 9.0.0.463.

I created for this demonstration a little executable called “k-test” (what you see on the desktop are three copies of the same executable) which copies itself in a directory called “borda” in the “Roaming” directory of the operating system. It then creates a value in the Run key of the registry to execute itself at each start-up. Finally, it launches itself from the “Roaming” directory and ends.

This is a typical malware behavior. Beware that the signature of the application itself is not contained in the databases of Kaspersky as I have written it on the fly, but it detects the suspicious behavior, stops execution and deletes the file. And it does this every time I launch the test application.

Now let’s get to the part where I show an additional functionality of the Filter Monitor which is the ability to remove registered filters and see what happens if I remove the filters installed by klif.sys, which is the “Kaspersky Lab Interceptor and Filter” driver. As the name suggests, this driver intercepts and filters: it installs all four of typologies of filters listed by the Filter Monitor. On x86 instead of calling CmRegisterCallback it additionally hooks about 60 functions of the Service Descriptor Table (which is a lot), but that’s no longer possible on x64.

So, let’s remove the filters and re-launch k-test. It works now.

Final disclaimer: It is not my intent to comment on security features of anti-viruses, I just wanted to present my new tool and show its functionalities. I was already familiar with the internals of Kaspersky before writing this utility.

I hope you enjoyed the presentation.”

P.S. A huge thanks goes to Alessandro Gario for providing me with all the different versions of ntoskrnl.exe.