Tarek Cheikh
Founder & AWS Cloud Architect
In the previous article, we covered RDS for relational databases. RDS is the right choice when you need SQL, complex joins, and ACID transactions across tables. But some workloads — high-throughput key-value lookups, event stores, session management, gaming leaderboards, IoT telemetry — need a different kind of database. Amazon DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale, from 1 request per second to millions.
This article covers DynamoDB from data modeling to production patterns: primary keys, secondary indexes, capacity modes, consistency models, transactions, streams, global tables, DAX caching, and the access-pattern-driven design that makes NoSQL work.
DynamoDB is a serverless, fully managed key-value and document database. Unlike RDS where you choose an instance size, DynamoDB has no servers to provision. You create a table, define the key schema, and start reading and writing. AWS handles partitioning, replication (3 copies across 3 AZs), patching, and scaling.
Key characteristics:
# DynamoDB structure:
#
# Table = collection of items (like an SQL table)
# Item = a single record (like an SQL row), max 400 KB
# Attribute = a data field (like an SQL column)
#
# Key difference from SQL:
# - No fixed schema beyond the primary key
# - Each item can have completely different attributes
# - No JOINs between tables
# - No foreign keys or referential integrity
The primary key is the most important decision in DynamoDB table design. It determines how data is distributed across partitions and how you can query it.
# Two types of primary keys:
# 1. Simple primary key (partition key only)
# - Single attribute that uniquely identifies each item
# - DynamoDB hashes the partition key to determine the physical partition
# - Good when: each item is accessed individually by a unique ID
#
# Example: Users table with user_id as partition key
# user_id (PK) name email
# -----------------------------------------------
# user-001 Alice Martin alice@example.com
# user-002 Bob Chen bob@example.com
# 2. Composite primary key (partition key + sort key)
# - Two attributes together uniquely identify each item
# - Items with the same partition key are stored together, sorted by sort key
# - Enables range queries within a partition
# - Good when: you query related items together
#
# Example: Orders table with user_id (PK) + order_date (SK)
# user_id (PK) order_date (SK) total status
# -----------------------------------------------
# user-001 2025-01-15 89.99 shipped
# user-001 2025-02-20 45.50 delivered
# user-001 2025-03-10 120.00 pending
# user-002 2025-01-22 67.00 delivered
DynamoDB distributes data across partitions based on the hash of the partition key. A good partition key has high cardinality (many distinct values) to spread data evenly.
# GOOD partition keys (high cardinality, even distribution):
# - user_id, order_id, session_id, device_id
# - UUIDs, email addresses, account numbers
# BAD partition keys (low cardinality, hot partitions):
# - status ("active" / "inactive") -- only 2 values, all data in 2 partitions
# - country -- a few countries get most traffic
# - date -- all today's writes go to one partition
# Each partition supports:
# - Up to 3,000 RCU (read capacity units) and 1,000 WCU (write capacity units)
# - Up to 10 GB of data
# If one partition key receives more traffic than this, you get throttling
# Create a table with composite primary key
aws dynamodb create-table \
--table-name Orders \
--key-schema \
AttributeName=user_id,KeyType=HASH \
AttributeName=order_date,KeyType=RANGE \
--attribute-definitions \
AttributeName=user_id,AttributeType=S \
AttributeName=order_date,AttributeType=S \
--billing-mode PAY_PER_REQUEST \
--tags Key=Environment,Value=production
# Attribute types:
# S = String
# N = Number
# B = Binary
# Note: you only define attributes that are part of keys (primary key + indexes)
# All other attributes are schemaless
import boto3
from boto3.dynamodb.conditions import Key, Attr
from decimal import Decimal
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')
# PUT: Create or replace an item
table.put_item(Item={
'user_id': 'user-001',
'order_date': '2025-04-14',
'total': Decimal('89.99'),
'status': 'pending',
'items': [
{'product': 'keyboard', 'qty': 1, 'price': Decimal('59.99')},
{'product': 'mouse', 'qty': 1, 'price': Decimal('30.00')}
]
})
# GET: Retrieve a single item by primary key
response = table.get_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'}
)
item = response.get('Item')
# UPDATE: Modify specific attributes
table.update_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'},
UpdateExpression='SET #s = :status, shipped_date = :date',
ExpressionAttributeNames={'#s': 'status'}, # 'status' is a reserved word
ExpressionAttributeValues={
':status': 'shipped',
':date': '2025-04-16'
}
)
# DELETE: Remove an item
table.delete_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'}
)
# QUERY: Efficient -- reads items from a single partition
# Must specify the partition key, optionally filter by sort key
# Get all orders for a user
response = table.query(
KeyConditionExpression=Key('user_id').eq('user-001')
)
# Get orders for a user in a date range
response = table.query(
KeyConditionExpression=(
Key('user_id').eq('user-001') &
Key('order_date').between('2025-01-01', '2025-03-31')
)
)
# Get the latest 5 orders for a user
response = table.query(
KeyConditionExpression=Key('user_id').eq('user-001'),
ScanIndexForward=False, # Descending order by sort key
Limit=5
)
# SCAN: Expensive -- reads EVERY item in the table
# Avoid in production for large tables
# Use only for admin tasks, migrations, or small tables
response = table.scan(
FilterExpression=Attr('status').eq('pending')
)
# FilterExpression is applied AFTER reading -- you pay for the full scan
# For large scans, use pagination:
paginator = dynamodb.meta.client.get_paginator('scan')
for page in paginator.paginate(TableName='Orders'):
for item in page['Items']:
process(item)
Secondary indexes let you query data by attributes other than the primary key. DynamoDB supports two types.
A GSI has a different partition key and optional sort key from the base table. It is a separate copy of the data (projected attributes), maintained asynchronously.
# Add a GSI to query orders by status
aws dynamodb update-table \
--table-name Orders \
--attribute-definitions AttributeName=status,AttributeType=S \
--global-secondary-index-updates '[{
"Create": {
"IndexName": "status-index",
"KeySchema": [
{"AttributeName": "status", "KeyType": "HASH"},
{"AttributeName": "order_date", "KeyType": "RANGE"}
],
"Projection": {"ProjectionType": "ALL"},
"OnDemandThroughput": {"MaxReadRequestUnits": 100, "MaxWriteRequestUnits": 100}
}
}]'
# GSI characteristics:
# - Different partition key and sort key from base table
# - Eventually consistent reads only (no strongly consistent)
# - Has its own capacity (separate from the base table)
# - Up to 20 GSIs per table
# - Can be added or removed after table creation
# Query the GSI
response = table.query(
IndexName='status-index',
KeyConditionExpression=Key('status').eq('pending')
)
An LSI shares the same partition key as the base table but has a different sort key. It must be created at table creation time.
# LSI: same partition key, different sort key
# Must be defined at table creation
aws dynamodb create-table \
--table-name Orders \
--key-schema \
AttributeName=user_id,KeyType=HASH \
AttributeName=order_date,KeyType=RANGE \
--attribute-definitions \
AttributeName=user_id,AttributeType=S \
AttributeName=order_date,AttributeType=S \
AttributeName=total,AttributeType=N \
--local-secondary-indexes '[{
"IndexName": "user-total-index",
"KeySchema": [
{"AttributeName": "user_id", "KeyType": "HASH"},
{"AttributeName": "total", "KeyType": "RANGE"}
],
"Projection": {"ProjectionType": "ALL"}
}]' \
--billing-mode PAY_PER_REQUEST
# LSI characteristics:
# - Same partition key as the base table
# - Supports strongly consistent reads
# - Up to 5 LSIs per table
# - Must be created at table creation (cannot add later)
# - Shares capacity with the base table
# - 10 GB partition size limit applies to base table + all LSIs
# Query: get a user's orders sorted by total amount
response = table.query(
IndexName='user-total-index',
KeyConditionExpression=Key('user_id').eq('user-001'),
ScanIndexForward=False # Highest total first
)
# DynamoDB offers two consistency levels for reads:
# Eventually Consistent Read (default)
# - May return stale data (usually consistent within 1 second)
# - Costs 0.5 RCU per 4 KB item
# - Use for: dashboards, product catalogs, non-critical reads
# Strongly Consistent Read
# - Always returns the most recent data
# - Costs 1 RCU per 4 KB item (2x the cost)
# - Use for: financial data, inventory counts, anything requiring latest state
# - Not available on GSIs (only on base table and LSIs)
# Eventually consistent read (default)
response = table.get_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'}
)
# Strongly consistent read
response = table.get_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'},
ConsistentRead=True
)
# Pay per request -- no capacity planning needed
# DynamoDB scales automatically to handle any traffic level
# Pricing (us-east-1):
# Write request unit (WRU): $1.25 per million
# Read request unit (RRU): $0.25 per million
# 1 WRU = 1 write of up to 1 KB
# 1 RRU = 1 eventually consistent read of up to 4 KB
# = 0.5 strongly consistent reads of up to 4 KB
# Best for:
# - New tables with unknown traffic patterns
# - Unpredictable or spiky workloads
# - Development and testing environments
# - Applications where simplicity is more important than cost optimization
# Pre-allocate capacity for predictable workloads (cheaper for steady traffic)
# Pricing (us-east-1):
# Write capacity unit (WCU): $0.00065 per hour ($0.4745/month)
# Read capacity unit (RCU): $0.00013 per hour ($0.0949/month)
# 1 WCU = 1 write per second of up to 1 KB
# 1 RCU = 1 strongly consistent read per second of up to 4 KB
# = 2 eventually consistent reads per second of up to 4 KB
aws dynamodb update-table \
--table-name Orders \
--billing-mode PROVISIONED \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=50
# Enable auto scaling to handle traffic variations
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id table/Orders \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--min-capacity 10 \
--max-capacity 1000
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id table/Orders \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--policy-name read-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBReadCapacityUtilization"
}
}'
# Cost comparison: 100 writes/second sustained for a month
#
# On-demand: 100 * 86400 * 30 = 259,200,000 WRU
# 259.2M * $1.25/M = $324.00/month
#
# Provisioned: 100 WCU * $0.4745/month = $47.45/month
#
# Provisioned is 85% cheaper for steady workloads
# On-demand is cheaper for bursty or low-volume workloads
# BatchWriteItem: write up to 25 items in a single call
with table.batch_writer() as batch:
for i in range(100):
batch.put_item(Item={
'user_id': f'user-{i:03d}',
'order_date': '2025-04-14',
'total': Decimal(str(round(10 + i * 1.5, 2))),
'status': 'pending'
})
# batch_writer handles pagination and retries automatically
# BatchGetItem: read up to 100 items in a single call
response = dynamodb.meta.client.batch_get_item(
RequestItems={
'Orders': {
'Keys': [
{'user_id': {'S': 'user-001'}, 'order_date': {'S': '2025-04-14'}},
{'user_id': {'S': 'user-002'}, 'order_date': {'S': '2025-04-14'}},
{'user_id': {'S': 'user-003'}, 'order_date': {'S': '2025-04-14'}}
]
}
}
)
# Check response['UnprocessedKeys'] for items that were not read (throttled)
DynamoDB supports ACID transactions across up to 100 items within and across tables.
# TransactWriteItems: atomic write across multiple items/tables
client = boto3.client('dynamodb')
client.transact_write_items(
TransactItems=[
{
'Update': {
'TableName': 'Accounts',
'Key': {'account_id': {'S': 'acct-001'}},
'UpdateExpression': 'SET balance = balance - :amount',
'ConditionExpression': 'balance >= :amount',
'ExpressionAttributeValues': {':amount': {'N': '100'}}
}
},
{
'Update': {
'TableName': 'Accounts',
'Key': {'account_id': {'S': 'acct-002'}},
'UpdateExpression': 'SET balance = balance + :amount',
'ExpressionAttributeValues': {':amount': {'N': '100'}}
}
},
{
'Put': {
'TableName': 'Transfers',
'Item': {
'transfer_id': {'S': 'txn-12345'},
'from_account': {'S': 'acct-001'},
'to_account': {'S': 'acct-002'},
'amount': {'N': '100'},
'timestamp': {'S': '2025-04-14T10:30:00Z'}
}
}
}
]
)
# All three operations succeed or all fail -- no partial updates
# Transactions cost 2x the capacity of standard operations
# Conditional write: only update if a condition is met
table.update_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'},
UpdateExpression='SET #s = :new_status',
ConditionExpression='#s = :expected_status',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={
':new_status': 'shipped',
':expected_status': 'pending'
}
)
# Fails with ConditionalCheckFailedException if status is not 'pending'
# Optimistic locking with version number
table.update_item(
Key={'user_id': 'user-001', 'order_date': '2025-04-14'},
UpdateExpression='SET total = :new_total, version = version + :one',
ConditionExpression='version = :expected_version',
ExpressionAttributeValues={
':new_total': Decimal('99.99'),
':one': 1,
':expected_version': 3
}
)
# If another process updated the item (incrementing version), this fails
# The caller reads the item again and retries with the new version
TTL automatically deletes expired items at no cost. Useful for session data, temporary records, and data retention policies.
# Enable TTL on a table
aws dynamodb update-time-to-live \
--table-name Sessions \
--time-to-live-specification Enabled=true,AttributeName=expires_at
import time
# Store a session that expires in 1 hour
table.put_item(Item={
'session_id': 'sess-abc123',
'user_id': 'user-001',
'data': {'cart': ['item1', 'item2']},
'expires_at': int(time.time()) + 3600 # Unix epoch + 1 hour
})
# DynamoDB deletes expired items automatically within ~48 hours of expiration
# Expired items still appear in reads until actually deleted
# Deletion does not consume write capacity
Streams capture a time-ordered sequence of item-level changes (inserts, updates, deletes) in a table. They integrate with Lambda for real-time event processing.
# Enable DynamoDB Streams
aws dynamodb update-table \
--table-name Orders \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
# StreamViewType options:
# KEYS_ONLY -- only the key attributes of the modified item
# NEW_IMAGE -- the entire item as it appears after the modification
# OLD_IMAGE -- the entire item as it appeared before the modification
# NEW_AND_OLD_IMAGES -- both the new and old images of the item
# Lambda function triggered by DynamoDB Streams
def lambda_handler(event, context):
for record in event['Records']:
event_name = record['eventName'] # INSERT, MODIFY, REMOVE
if event_name == 'INSERT':
new_item = record['dynamodb']['NewImage']
user_id = new_item['user_id']['S']
# Send order confirmation email
send_confirmation(user_id, new_item)
elif event_name == 'MODIFY':
old_item = record['dynamodb']['OldImage']
new_item = record['dynamodb']['NewImage']
old_status = old_item.get('status', {}).get('S')
new_status = new_item.get('status', {}).get('S')
if old_status != new_status:
# Status changed -- notify customer
notify_status_change(new_item, old_status, new_status)
elif event_name == 'REMOVE':
old_item = record['dynamodb']['OldImage']
# Archive deleted record
archive_to_s3(old_item)
Global Tables replicate a DynamoDB table across multiple AWS regions. Each region has a full read/write replica, providing low-latency access for global applications and cross-region disaster recovery.
# Add a replica to an existing table (creates a Global Table)
aws dynamodb update-table \
--table-name Orders \
--replica-updates '[
{"Create": {"RegionName": "eu-west-1"}},
{"Create": {"RegionName": "ap-southeast-1"}}
]'
# Global Table characteristics:
# - Active-active: reads and writes in any region
# - Replication lag: typically under 1 second
# - Conflict resolution: last-writer-wins based on timestamp
# - 99.999% availability SLA (vs 99.99% for single-region)
# - Streams must be enabled (NEW_AND_OLD_IMAGES)
# - Additional cost: replicated write request units in each region
DAX is an in-memory cache for DynamoDB that reduces read latency from single-digit milliseconds to microseconds. It sits between your application and DynamoDB, handling cache management automatically.
# DAX architecture:
#
# Application --> DAX Cluster --> DynamoDB Table
# (cache)
#
# Cache hit: response in microseconds (< 1 ms)
# Cache miss: DAX reads from DynamoDB, caches, and returns
#
# DAX is compatible with the DynamoDB API
# Minimal code changes required (change the client endpoint)
# Create a DAX cluster
aws dax create-cluster \
--cluster-name my-dax-cluster \
--node-type dax.r5.large \
--replication-factor 3 \
--iam-role-arn arn:aws:iam::123456789012:role/dax-role \
--subnet-group my-dax-subnet-group
# Use cases for DAX:
# - Read-heavy workloads (caches GetItem and Query results)
# - Microsecond latency requirements
# - Reducing read costs on hot items
# Not suitable for: write-heavy workloads, strongly consistent reads
In DynamoDB, the recommended pattern for related entities is to store them in a single table using overloaded keys. This allows fetching related data in a single query instead of multiple queries across tables.
# Single-table design: store users, orders, and products together
table = dynamodb.Table('AppData')
# User profile
table.put_item(Item={
'PK': 'USER#user-001',
'SK': 'PROFILE',
'name': 'Alice Martin',
'email': 'alice@example.com',
'created': '2025-01-15'
})
# User's orders
table.put_item(Item={
'PK': 'USER#user-001',
'SK': 'ORDER#2025-04-14#ord-001',
'total': Decimal('89.99'),
'status': 'shipped',
'items': ['keyboard', 'mouse']
})
table.put_item(Item={
'PK': 'USER#user-001',
'SK': 'ORDER#2025-03-10#ord-002',
'total': Decimal('45.50'),
'status': 'delivered'
})
# Get a user profile
response = table.get_item(
Key={'PK': 'USER#user-001', 'SK': 'PROFILE'}
)
# Get all of a user's data (profile + all orders) in one query
response = table.query(
KeyConditionExpression=Key('PK').eq('USER#user-001')
)
# Get only a user's orders
response = table.query(
KeyConditionExpression=(
Key('PK').eq('USER#user-001') &
Key('SK').begins_with('ORDER#')
)
)
# Get a user's orders in a date range
response = table.query(
KeyConditionExpression=(
Key('PK').eq('USER#user-001') &
Key('SK').between('ORDER#2025-01-01', 'ORDER#2025-03-31')
)
)
# IoT sensor readings with automatic expiration
table = dynamodb.Table('SensorData')
def store_reading(sensor_id, reading):
timestamp = datetime.now()
table.put_item(Item={
'sensor_date': f'{sensor_id}#{timestamp.strftime("%Y-%m-%d")}',
'timestamp': timestamp.isoformat(),
'temperature': Decimal(str(reading['temp'])),
'humidity': Decimal(str(reading['humidity'])),
'ttl': int(time.time()) + 2592000 # Auto-delete after 30 days
})
def get_daily_readings(sensor_id, date):
return table.query(
KeyConditionExpression=Key('sensor_date').eq(f'{sensor_id}#{date}')
)['Items']
# Increment a counter atomically (no read-modify-write race condition)
table.update_item(
Key={'page_id': 'homepage'},
UpdateExpression='ADD view_count :inc',
ExpressionAttributeValues={':inc': 1}
)
# On-demand backup (retained until explicitly deleted)
aws dynamodb create-backup \
--table-name Orders \
--backup-name orders-backup-2025-04-14
# Restore from backup (creates a NEW table)
aws dynamodb restore-table-from-backup \
--target-table-name Orders-Restored \
--backup-arn arn:aws:dynamodb:us-east-1:123456789012:table/Orders/backup/01234567890
# Enable Point-in-Time Recovery (PITR)
aws dynamodb update-continuous-backups \
--table-name Orders \
--point-in-time-recovery-specification PointInTimeRecoveryEnabled=true
# Restore to any point in the last 35 days
aws dynamodb restore-table-to-point-in-time \
--source-table-name Orders \
--target-table-name Orders-Restored \
--restore-date-time "2025-04-13T10:00:00Z"
# PITR cost: $0.20 per GB/month
# Export table data to S3 for analytics (does not consume table capacity)
aws dynamodb export-table-to-point-in-time \
--table-arn arn:aws:dynamodb:us-east-1:123456789012:table/Orders \
--s3-bucket my-exports-bucket \
--s3-prefix dynamodb-exports/ \
--export-format DYNAMODB_JSON
# Export formats: DYNAMODB_JSON or ION
# Query exported data with Athena for analytics
# On-demand pricing (us-east-1):
Write request units: $1.25 per million
Read request units: $0.25 per million
# Provisioned pricing (us-east-1):
Write capacity unit: $0.00065/hour ($0.4745/month per WCU)
Read capacity unit: $0.00013/hour ($0.0949/month per RCU)
# Storage: $0.25 per GB/month (first 25 GB free)
# Data transfer: Same as other AWS services
# In from internet: free
# Out to internet: $0.09/GB
# Additional costs:
# Streams: $0.02 per 100,000 read request units
# Global Tables: Replicated WRU charged in each replica region
# PITR: $0.20 per GB/month
# DAX: Instance pricing (dax.r5.large ~$0.269/hr)
# On-demand backup: $0.10 per GB/month
# Restore: $0.15 per GB
# Transactions cost 2x standard operations
# Use DynamoDB when:
# - Access patterns are known and key-based (get by ID, query by partition)
# - You need consistent single-digit millisecond latency at any scale
# - Workload is read-heavy or write-heavy with simple queries
# - You want serverless with zero capacity management (on-demand mode)
# - Data model is hierarchical or document-oriented
# - You need global multi-region active-active replication
# - Examples: user sessions, shopping carts, IoT data, gaming leaderboards,
# event stores, product catalogs, real-time bidding
# Use RDS/Aurora when:
# - You need complex SQL queries (JOINs, aggregations, subqueries)
# - Data has complex relationships requiring referential integrity
# - Access patterns are unpredictable or ad-hoc
# - You need full ACID transactions across many tables
# - Existing application uses SQL and migration effort is not justified
# - Examples: financial systems, ERP, CRM, reporting, content management
ORDER#2025-04-14#ord-001) to support range queries and hierarchical dataProjectionExpression) to reduce read costsConsumedReadCapacityUnits and ConsumedWriteCapacityUnits to right-size provisioned capacityThrottledRequests, SystemErrors, and UserErrorsThis article is just the start. Get the full picture with our free whitepaper - 8 chapters covering IAM, S3, VPC, monitoring, agentic AI security, compliance, and a prioritized action plan with 50+ CLI commands.
Six production-proven AWS architecture patterns: three-tier web apps, serverless APIs, event-driven processing, static websites, data lakes, and multi-region disaster recovery with diagrams and implementation guides.
Complete guide to AWS cost optimization covering Cost Explorer, Compute Optimizer, Savings Plans, Spot Instances, S3 lifecycle policies, gp2 to gp3 migration, scheduling, budgets, and production best practices.
Complete guide to AWS AI services including Rekognition, Comprehend, Textract, Polly, Translate, Transcribe, and Bedrock with CLI commands, pricing, and production best practices.