Ransomware continues to be a menace for endpoint devices and networks. But as companies increasingly rely on cloud resources, adversaries have also shifted their methodologies to take advantage of the cloud. A recent report by Halcyon detailed a methodology, seen in the wild, where adversaries abuse server-side encryption with customer-managed keys (SSE-C). In short, users are able to supply their own encryption keys when performing certain S3 actions like PutObject or CopyObject. These keys are used by AWS to encrypt the S3 object but are not stored by AWS. Therefore, adversaries can simply copy in place S3 objects that they have access to, encrypt them, and prevent the victim from having the ability to recover the keys.
Luckily, prevention is relatively straightforward and AWS has a detailed walkthrough of good security practices to block access to this technique if it is not being used in your environment.
In a compromised AWS environment, adversaries can copy S3 objects, encrypt them, and prevent the victim from recovering the encryption keys.
This technique got me thinking. What is the typical response time that an organization can expect from detection to preventing the adversary from continuing to encrypt its data? From the first CopyObject
command, how long does it typically take for a detection to fire? Then, how quickly can a security team automatically revoke access to that user and prevent them from performing any other CopyObject
calls?
Testing environment setup
User
My test user was a simple IAM user leveraging the AmazonS3Full
access policy. I configured it with a long-term access key (not best practice, I know) so I could test to see if there was any appreciable difference between disabling access keys or applying a deny-all policy.
S3 bucket target
For my S3 bucket I wanted to emulate “typical” document storage. I simply uploaded files to my S3 bucket with varying file sizes from 500 KB to 2 MB. The encryption is performed server-side, so we should expect varying levels of performance for different file sizes. With such small files there shouldn’t be any variability in how long it takes to encrypt them. But it would be interesting to capture SSE-C performance in regards to file size!
Ransomware execution
To mimic the adversary, I performed a copy in place in my bucket for all the objects I previously uploaded. This seems to be the most efficient way to bulk-encrypt files in a bucket without having to leverage other services such as batch jobs or replication. In the future, it would be interesting to track performance of this copy-in-place methodology and attempt large-scale encryption with some of those other services.
Detection pipeline
For my experiment I decided to leverage CloudWatch for my detections. There are various ways to configure alerts but I built this with CloudWatch to utilize as many native AWS functions as possible to simplify my deployment and allow others to potentially mimic the pipeline. The figure below shows my basic data flow.
I collect all data events from my target bucket and ship them with CloudTrail to a CloudWatch log group. Then I created a metric alarm that detects 10 concurrent calls of CopyObject
within a 1-minute period. Once the alarm is triggered, it does two things:
- Sends me an email with an alert for detected ransomware
- Executes a Lambda function
The Lambda function is my method of automated remediation. I had two versions of the function. The first disabled the access key of the IAM user and the second applied a deny-all policy onto my test user. For production environments, you would need to expand this to handle any potential compromised user. This could be done by processing CloudTrail events to parse the userIdentity
element to find which user is affected. Or, you could do a blanket resource control policy (RCP) that disabled SSE-C across the account. Automated response is nuanced and should be tailored to your environment and business. I have attached my sample Lambda code below that can be used as a launch point.
import json
import boto3
import logging
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
logger.info("Lambda function started")
# Initialize the boto3 client for IAM
iam_client = boto3.client('iam')
# Define the user and access key to be disabled
user_name = '<testUser>'
access_key_id = '<accessKeyId>'
try:
# Disable the user's access key
# iam_client.update_access_key(
# UserName=user_name,
# AccessKeyId=access_key_id,
# Status='Inactive'
# )
#logger.info(f"Disabled access key {access_key_id} for user {user_name}")
response = iam_client.attach_user_policy(
PolicyArn='arn:aws:iam::123456789012:policy/ransomwareResponsePolicy',
UserName='encryptionTest',)
logger.info(f"Disabled access for user {user_name}")
return {
'statusCode': 200,
# 'body': json.dumps(f"Access key {access_key_id} for user {user_name} disabled successfully")
'body': json.dumps(f"Accessfor user {user_name} disabled successfully")
}
except Exception as e:
return {
'statusCode': 500,
# 'body': json.dumps(f"Error disabling access key {access_key_id} for user {user_name}: {str(e)}")
'body': json.dumps(f"Error disabling access for user {user_name}: {str(e)}")
}
There are improvements to be made to increase the speed of response but for this testing I wanted to have an environment that was relatively straightforward to demonstrate the need for automation in preventing ransomware.
Automated response results
I performed three basic scenarios for my testing:
- CloudWatch delivers an email with no response actions
- CloudWatch triggers Lambda, which disabled users’ access key
- CloudWatch triggers Lambda, which applies a deny-all policy
I found that there was no appreciable difference between these scenarios, beyond the need for manual intervention when a simple notification is sent in the first scenario. In all automated cases, once the alarm was triggered, the response actions happened immediately. By far the longest holdup was delivery of CloudTrail logs to my log group and therefore the metric filter. This on average took around 6 minutes. This is consistent with AWS documentation, which states that logs are delivered around every 5 minutes. Once the alarm was triggered, either disabling the access key or applying the deny-all policy, user access was almost immediately cut off, preventing them from performing any additional actions in AWS.
Dwell time is of the essence
There is still much testing to be done to better help establish timelines in ransomware techniques for the cloud. Understanding how quickly AWS encrypts files can help us understand what this approximate 6-minute dwell time means in terms of the amount of data that can be encrypted. Regardless, every second matters when it comes to effective defense against ransomware.
I hope to help solidify the idea that auto remediation is critical to an effective defense against ransomware. If you are reliant on signals such as emails and manual remediation, you are going to add several minutes to the response time, which can mean significant encryption before user access is cut off. As a community, developing a robust collection of behavioral indicators of pre-encryption behavior will help cut down on dwell times drastically.