Originally published for Windows, the “Heaven’s Gate” technique allowed malicious software to evade endpoint security products by invoking 64-bit code in 32-bit processes, effectively bypassing user-mode hooks. This technique has since been mitigated in Windows 10+ through Control Flow Guard (CFG).
Red Canary has successfully reproduced a variation of this technique for Linux, and the result of our research has been incorporated into Chain Reactor, our open source framework for adversary simulation on Linux.
In the sections below, we will break down the Heaven’s Gate technique as a primer, detail what a variation of this technique looks like for Linux, provide proof-of-concept code, and demonstrate how
ptrace can be fooled.
Revisiting Heaven’s Gate for Windows
Endpoint security products commonly instrumented Windows applications through user-mode API hooks. For example, to monitor the file activity of a newly launched application, security products would hook the associated 32-bit APIs in ntdll.dll, like CreateFile, OpenFile, and WriteFile.
However, Windows on Windows (WoW64), which allows users to run 32-bit applications on 64-bit systems for compatibility reasons, introduces a small wrinkle. In a WoW64 system, all 32-bit applications load both a 32-bit and 64-bit ntdll.dll. The 32-bit DLL effectively acts as a wrapper, thunking into the 64-bit DLL to perform the actual system call.
As a result, malicious software could bypass the user-mode components of many security products by directly utilizing the 64-bit API, bypassing the 32-bit API hook.
This technique was originally documented by George Nicolaou on his blog and has since been mitigated in Windows 10+ through CFG.
Heaven’s Gate on Linux
We were able to recreate a variation of this technique for Linux, effectively confusing common tracing tools. Unlike Windows, 32-bit Linux executables do not load any native 64-bit libraries. Instead, Linux creates 32-bit kernel entry points that emulate the existence of a 32-bit kernel. In contrast, Windows only has 64-bit kernel entry points.
Instrumentation for security products will likely exist in one of a few places: a
ptrace-based system, in the kernel, or the utilization of a subsystem like audit.
We will cover two use cases:
- 64-bit applications calling 32-bit syscall entries
- 32-bit applications calling 64-bit syscall entries
These are detailed at length below, with proofs of concept (POC) and raw code in our GitHub repository.
Evading 64-bit security instrumentation
It turns out 64-bit applications can directly invoke the standard 32-bit interrupt handler to transition into the kernel. Some caveats exist though, as all register-based arguments must be no larger than their 32-bit counterpart, and all pointers must point to the lower 2GB of virtual memory.
Fortunately, these constraints can be met via mmap with the MAP_32BIT flag:
Let’s walk through an example using the 32-bit socketcall interface:
Prior to Linux 4.3, 32-bit applications used a single entry point (
socketcall) for all socket operations. The first parameter (
call) specifies the action to perform (
connect, etc.,) and the second parameter is a contiguous memory block for the action’s expected arguments.
In this example, we’ll have
socketcall perform a
- Three immediates (
- One out (
- One in/out (
- One pointer to a structure
When exercising the Heaven’s Gate technique, immediate values are the least complex, as the semantics are pass-by-value. Input/output pointers are more complex because they both need to point into the lower 2GB of virtual memory. We also need to ensure enough memory is allocated via
mmap, for the socket operation’s arguments (6 in this case), the receive buffer (specified by
src_addr, and an out for
Once the arguments have been laid out correctly, we can invoke
socketcall’s 32-bit syscall entry point through its system call number (102).
Proof of concept:
Evading 32-bit security instrumentation
Invoking the 64-bit syscall interface is possible, but it requires a particular code segment selector being set. Currently, for both 32-bit protected and 64-bit long mode, the code segment is an index into either the Local Descriptor Table (LDT) or Global Descriptor Table (GDT), pointing to a structure that defines the limits of virtual address space for user-mode and kernel mode, as well as the defined instruction encoding.
Index values for the code segment (CS) are defined in the x86 architecture folder, inside the Linux source repository.
As we can see from above:
- 32-bit applications are assigned a value of 35 (
- 64-bit applications are assigned a value of 51 (
As a result, we need to transition our application’s CS from
0x33 in order to execute the 64-bit syscall without an illegal instruction exception. We can achieve this by utilizing a
far jump, which allows us to specify the segment selector and the virtual address of our target function.
Proof of concept:
It’s easy to confuse user-mode instrumentation and tooling. For example,
ptrace to introspect a process’s syscalls during execution.
ptrace doesn’t distinguish between 32-bit and 64-bit syscall entry points when operating in 64-bit long mode. As a result,
strace incorrectly decodes the syscalls when we utilize the Heaven’s Gate technique:
As illustrated above,
strace’s output is informing us that
lgetxattr is being called, when in fact it is
It is unknown which Linux security products are impacted by this technique. The only prior art we could find after our research was a GitHub Gist with some POC code by “rqou.”
Impacted security product categories could include:
- Next-gen antivirus (NGAV)
- Endpoint detection and response (EDR)
- Endpoint protection platforms (EPP)
- Cloud workload protection platforms (CWPP)
- Sandbox and deception technologies
We encourage you to explore if you’re at risk as a customer or as a vendor. POC code is provided in our GitHub and incorporated in Chain Reactor.