Time Travel: Running Python 3.7 on XP

To restart my career as a technical writer, I chose a light topic. Namely, running applications compiled with new versions of Visual Studio on Windows XP. I didn’t find any prior research on the topic, but I also didn’t search much. There’s no real purpose behind this article, beyond the fact that I wanted to know what could prevent a new application to run on XP. Our target application will be the embedded version of Python 3.7 for x86.

If we try to start any new application on XP, we’ll get an error message informing us that it is not a valid Win32 application. This happens because of some fields in the Optional Header of the Portable Executable.

Most of you probably already know that you need to adjust these fields as follows:

MajorOperatingSystemVersion: 5
MinorOperatingSystemVersion: 0
MajorSubsystemVersion: 5
MinorSubsystemVersion: 0

Fortunately, it’s enough to adjust the fields in the executable we want to start (python.exe), there’s no need to adjust the DLLs as well.

If we try run the application now, we’ll get an error message due to a missing API in kernel32. So let’s turn our attention to the imports.

We have a missing vcruntime140.dll, then a bunch of “api-ms-win-*” DLLs, then only python37.dll and kernel32.dll.

The first thing which comes to mind is that in new applications we often find these “api-ms-win-*” DLLs. If we search for the prefix in the Windows directory, we’ll find a directory both in System32 and SysWOW64 called “downlevel”, which contains a huge list of these DLLs.

As we’ll see later, these DLLs aren’t actually used, but if we open one with a PE viewer, we’ll see that it contains exclusively forwarders to APIs contained in the usual suspects such as kernel32, kernelbase, user32 etc.

There’s a MSDN page documenting these DLLs.

Interestingly, in the downlevel directory we can’t find any of the files imported by python.exe. These DLLs actually expose C runtime APIs like strlen, fopen, exit and so on.

If we don’t have any prior knowledge on the topic and just do a string search inside the Windows directory for such a DLL name, we’ll find a match in C:\Windows\System32\apisetschema.dll. This DLL is special as it contains a .apiset section, whose data can easily be identified as some sort of format for mapping “api-ms-win-*” names to others.

Searching on the web, the first resource I found on this topic were two articles on the blog of Quarkslab (Part 1 and Part 2). However, I quickly figured that, while useful, they were too dated to provide me with up-to-date structures to parse the data. In fact, the second article shows a version number of 2 and at the time of my writing the version number is 6.

Just for completeness, after the publication of the current article, I was made aware of an article by deroko about the topic predating those of Quarkslab.

Anyway, I searched some more and found a code snippet by Alex Ionescu and Pavel Yosifovich in the repository of Windows Internals. I took the following structures from there.

The data starts with a API_SET_NAMESPACE structure.

Count specifies the number of API_SET_NAMESPACE_ENTRY and API_SET_HASH_ENTRY structures. EntryOffset points to the start of the array of API_SET_NAMESPACE_ENTRY structures, which in our case comes exactly after API_SET_NAMESPACE.

Every API_SET_NAMESPACE_ENTRY points to the name of the “api-ms-win-*” DLL via the NameOffset field, while ValueOffset and ValueCount specify the position and count of API_SET_VALUE_ENTRY structures. The API_SET_VALUE_ENTRY structure yields the resolution values (e.g. kernel32.dll, kernelbase.dll) for the given “api-ms-win-*” DLL.

With this information we can already write a small script to map the new names to the actual DLLs.

This code can be executed with Cerbero Profiler from command line as “cerpro.exe -r apisetschema.py”. These are the first lines of the produced output:

Going back to API_SET_NAMESPACE, its field HashOffset points to an array of API_SET_HASH_ENTRY structures. These structures, as we’ll see in a moment, are used by the Windows loader to quickly index a “api-ms-win-*” DLL name. The Hash field is effectively the hash of the name, calculated by taking into consideration both HashFactor and HashedLength, while Index points to the associated API_SET_NAMESPACE_ENTRY entry.

The code which does the hashing is inside the function LdrpPreprocessDllName in ntdll:

Or more simply in C code:

As a practical example, let’s take the DLL name “api-ms-win-core-processthreads-l1-1-2.dll”. Its hash would be 0x445B4DF3. If we find its matching API_SET_HASH_ENTRY entry, we’ll have the Index to the associated API_SET_NAMESPACE_ENTRY structure.

So, 0x5b (or 91) is the index. By going back to the output of mappings, we can see that it matches.

By inspecting the same output, we can also notice that all C runtime DLLs are resolved to ucrtbase.dll.

I was already resigned at having to figure out how to support the C runtime on XP, when I noticed that Microsoft actually supports the deployment of the runtime on it. The following excerpt from MSDN says as much:

If you currently use the VCRedist (our redistributable package files), then things will just work for you as they did before. The Visual Studio 2015 VCRedist package includes the above mentioned Windows Update packages, so simply installing the VCRedist will install both the Visual C++ libraries and the Universal CRT. This is our recommended deployment mechanism. On Windows XP, for which there is no Universal CRT Windows Update MSU, the VCRedist will deploy the Universal CRT itself.

Which means that on Windows editions coming after XP the support is provided via Windows Update, but on XP we have to deploy the files ourselves. We can find the files to deploy inside C:\Program Files (x86)\Windows Kits\10\Redist\ucrt\DLLs. This path contains three sub-directories: x86, x64 and arm. We’re obviously interested in the x86 one. The files contained in it are many (42), apparently the most common “api-ms-win-*” DLLs and ucrtbase.dll. We can deploy those files onto XP to make our application work. We are still missing the vcruntime140.dll, but we can take that DLL from the Visual C++ installation. In fact, that DLL is intended to be deployed, while the Universal CRT (ucrtbase.dll) is intended to be part of the Windows system.

This satisfies our dependencies in terms of DLLs. However, Windows introduced many new APIs over the years which aren’t present on XP. So I wrote a script to test the compatibility of an application by checking the imported APIs against the API exported by the DLLs on XP. The command line for it is “cerpro.exe -r xpcompat.py application_path”. It will check all the PE files in the specified directory.

I had to omit the contents of the apisetschema global variable for the sake of brevity. You can download the full script from here. The system32 directory referenced in the code is the one of Windows XP, which I copied to my desktop.

And here are the relevant excerpts from the output:

We’re missing 5 APIs from kernel32.dll and 2 from ws2_32.dll, but the Winsock APIs are imported just by _socket.pyd, a module which is loaded only when a network operation is performed by Python. So, in theory, we can focus our efforts on the missing kernel32 APIs for now.

My plan was to create a fake kernel32.dll, called xernel32.dll, containing forwarders for most APIs and real implementations only for the missing ones. Here’s a script to create C++ files containing forwarders for all APIs of common DLLs on Windows 10:

It creates files like the following kernel32.cpp:

The comment on the right (“// XP”) indicates whether the forwarded API is present on XP or not. We can provide real implementations exclusively for the APIs we want. The Windows loader doesn’t care whether we forward functions which don’t exist as long as they aren’t imported.

The APIs we need to support are the following:

GetTickCount64: I just called GetTickCount, not really important
GetFinalPathNameByHandleW: took the implementation from Wine, but had to adapt it slightly
InitializeProcThreadAttributeList: took the implementation from Wine
UpdateProcThreadAttribute: same
DeleteProcThreadAttributeList: same

I have to be grateful to the Wine project here, as it provided useful implementations, which saved me the effort.

I called the attempt at a support runtime for older Windows versions “XP Time Machine Runtime” and you can find the repository here. I compiled it with Visual Studio 2013 and cmake.

So that we have now our xernel32.dll, the only thing we have to do is to rename the imported DLL inside python37.dll.

Let’s try to start python.exe.

Awesome.

Of course, we’re still not completely done, as we didn’t implement the missing Winsock APIs, but perhaps this and some more could be the content of a second part to this article.

Creating undetected malware for OS X

This article was originally published on cerbero-blog.com on October the 7th, 2013.

While this PoC is about static analysis, it’s very different than applying a packer to a malware. OS X uses an internal mechanism to load encrypted Apple executables and we’re going to exploit the same mechanism to defeat current anti-malware solutions.

OS X implements two encryption systems for its executables (Mach-O). The first one is implemented through the LC_ENCRYPTION_INFO loader command. Here’s the code which handles this command:

This code calls the set_code_unprotect function which sets up the decryption through text_crypter_create:

The text_crypter_create function is actually a function pointer registered through the text_crypter_create_hook_set kernel API. While this system can allow for external components to register themselves and handle decryption requests, we couldn’t see it in use on current versions of OS X.

The second encryption mechanism which is actually being used internally by Apple doesn’t require a loader command. Instead, it signals encrypted segments through a flag.

Protected flag

The ‘PROTECTED‘ flag is checked while loading a segment in the load_segment function:

The unprotect_segment function sets up the range to be decrypted, the decryption function and method. It then calls vm_map_apple_protected.

Two things about the code above. The first 3 pages (0x3000) of a Mach-O can’t be encrypted/decrypted. And, as can be noticed, the decryption function is dsmos_page_transform.

Just like text_crypter_create even dsmos_page_transform is a function pointer which is set through the dsmos_page_transform_hook kernel API. This API is called by the kernel extension “Dont Steal Mac OS X.kext“, allowing for the decryption logic to be contained outside of the kernel in a private kernel extension by Apple.

Apple uses this technology to encrypt some of its own core components like “Finder.app” or “Dock.app”. On current OS X systems this mechanism doesn’t provide much of a protection against reverse engineering in the sense that attaching a debugger and dumping the memory is sufficient to retrieve the decrypted executable.

However, this mechanism can be abused by encrypting malware which will no longer be detected by the static analysis technologies of current security solutions.

To demonstrate this claim we took a known OS X malware:

Scan before encryption

Since this is our public disclosure, we will say that the detection rate stood at about 20-25.

And encrypted it:

Scan after encryption

After encryption has been applied, the malware is no longer detected by scanners at VirusTotal. The problem is that OS X has no problem in loading and executing the encrypted malware.

The difference compared to a packer is that the decryption code is not present in the executable itself and so the static analysis engine can’t recognize a stub or base itself on other data present in the executable, since all segments can be encrypted. Thus, the scan engine also isn’t able to execute the encrypted code in its own virtual machine for a more dynamic analysis.

Two other important things about the encryption system is that the private key is the same and is shared across different versions of OS X. And it’s not a chained encryption either: but per-page. Which means that changing data in the first encrypted page doesn’t affect the second encrypted page and so on.

Our flagship product, Cerbero Profiler, which is an interactive file analysis infrastructure, is able to decrypt protected executables. To dump an unprotected copy of the Mach-O just perform a “Select all” (Ctrl+A) in the main hex view and then click on “Copy into new file” like in the screen-shot below.

Mach-O decryption

The saved file can be executed on OS X or inspected with other tools.

Decrypted Mach-O

Of course, the decryption can be achieved programmatically through our Python SDK as well. Just load the Mach-O file, initialize it (ProcessLoadCommands) and save to disk the stream returned by the GetStream.

A solution to mitigate this problem could be one of the following:

  • Implement the decryption mechanism like we did.
  • Check the presence of encrypted segments. If they are present, trust only executables with a valid code signature issued by Apple.
  • 3. Check the presence of encrypted segments. If they are present, trust only executables whose cryptographic hash matches a trusted one.

This kind of internal protection system should be avoided in an operating system, because it can be abused.

After we shared our internal report, VirusBarrier Team at Intego sent us the following previous research about Apple Binary Protection:

http://osxbook.com/book/bonus/chapter7/binaryprotection/
http://osxbook.com/book/bonus/chapter7/tpmdrmmyth/
https://github.com/AlanQuatermain/appencryptor

The research talks about the old implementation of the binary protection. The current page transform hook looks like this:

VirusBarrier Team also reported the following code by Steve Nygard in his class-dump utility:

https://bitbucket.org/nygard/class-dump/commits/5908ac605b5dfe9bfe2a50edbc0fbd7ab16fd09c

This is the correct decryption code. In fact, the kernel extension by Apple, just as in the code above provided by Steve Nygard, uses the OpenSSL implementation of Blowfish.

We didn’t know about Nygard’s code, so we did our own research about the topic and applied it to malware. We would like to thank VirusBarrier Team at Intego for its cooperation and quick addressing of the issue. At the time of writing we’re not aware of any security solution for OS X, apart VirusBarrier, which isn’t tricked by this technique. We even tested some of the most important security solutions individually on a local machine.

The current 0.9.9 version of Cerbero Profiler already implements the decryption of Mach-Os, even though it’s not explicitly written in the changelist.

We didn’t implement the old decryption method, because it didn’t make much sense in our case and we’re not aware of a clean way to automatically establish whether the file is old and therefore uses said encryption.

These two claims need a clarification. If we take a look at Nygard’s code, we can see a check to establish the encryption method used:

It checks the first dword in the encrypted segment (after the initial three non-encrypted pages) to decide which decryption algorithm should be used. This logic has a problem, because it assumes that the first encrypted block is full of 0s, so that when encrypted with AES it produces a certain magic and when encrypted with Blowfish another one. This logic fails in the case the first block contains values other than 0. In fact, some samples we encrypted didn’t produce a magic for this exact reason.

Also, current versions of OS X don’t rely on a magic check and don’t support AES encryption. As we can see from the code displayed at the beginning of the article, the kernel doesn’t read the magic dword and just sets the Blowfish magic value as a constant:

So while checking the magic is useful for normal cases, security solutions can’t rely on it or else they can be easily tricked into using the wrong decryption algorithm.