Amazon RDS Deep Dive: Managed Database Guide

Amazon RDS Deep Dive: Managed Relational Databases on AWS

In the previous articles, we covered compute (EC2 and Lambda) and storage (S3). Most applications also need a relational database, and managing one in production — backups, replication, failover, patching, scaling — is operationally demanding. Amazon RDS (Relational Database Service) handles all of that, letting you run production databases without managing the underlying infrastructure.

This article covers RDS from first principles to production patterns: supported engines, instance sizing, storage options, Multi-AZ deployments, read replicas, backups, security, monitoring, Aurora, and the operational patterns that make managed databases reliable at scale.

What Is RDS?

RDS is a managed database service that automates the undifferentiated heavy lifting of running relational databases: hardware provisioning, OS patching, database installation, backups, replication, failover, and scaling. You choose the engine, instance size, and storage. AWS handles the rest.

What RDS manages for you:

Automated backups: Daily snapshots plus continuous transaction log backups
Software patching: OS and database engine patches applied during maintenance windows
Multi-AZ failover: Automatic promotion of standby in a different Availability Zone
Monitoring: CloudWatch metrics, Enhanced Monitoring, Performance Insights
Storage scaling: Automatic storage expansion when running low
Encryption: At-rest and in-transit encryption with KMS

What you still manage:

Schema design, indexing, and query optimization
Application-level connection management
Database parameter tuning for your workload
IAM policies and security group rules
Choosing the right instance type and storage configuration

Supported Engines

# RDS supports six database engines:

Engine           Versions               Use Case
--------------------------------------------------------------
PostgreSQL       13, 14, 15, 16         General purpose, GIS, JSON
MySQL            8.0                    Web applications, CMS
MariaDB          10.6, 10.11           MySQL-compatible, open source
Oracle           19c, 21c              Enterprise applications
SQL Server       2019, 2022            Windows/.NET applications
IBM Db2          11.5                  Legacy enterprise workloads

# Aurora (AWS-built, covered separately below):
Aurora MySQL     MySQL 8.0 compatible
Aurora PostgreSQL  PostgreSQL 14, 15, 16 compatible

Creating an RDS Instance

# Create a PostgreSQL instance with recommended production settings
aws rds create-db-instance \
    --db-instance-identifier my-app-db \
    --db-instance-class db.r6g.large \
    --engine postgres \
    --engine-version 16.4 \
    --master-username dbadmin \
    --manage-master-user-password \
    --allocated-storage 100 \
    --storage-type gp3 \
    --storage-throughput 125 \
    --multi-az \
    --backup-retention-period 14 \
    --storage-encrypted \
    --kms-key-id alias/rds-key \
    --db-subnet-group-name my-db-subnet-group \
    --vpc-security-group-ids sg-0123456789abcdef0 \
    --no-publicly-accessible \
    --enable-performance-insights \
    --monitoring-interval 60 \
    --monitoring-role-arn arn:aws:iam::123456789012:role/rds-monitoring-role

# --manage-master-user-password: stores the password in Secrets Manager
#   (never pass passwords in CLI arguments -- visible in process list and shell history)
# --no-publicly-accessible: the instance has no public IP
# --multi-az: creates a synchronous standby in a different AZ
# --storage-encrypted: encrypts data at rest with KMS

DB Subnet Groups

RDS instances run inside your VPC. A DB subnet group defines which subnets (and therefore which Availability Zones) RDS can use. Always use private subnets.

# Create a DB subnet group spanning two AZs
aws rds create-db-subnet-group \
    --db-subnet-group-name my-db-subnet-group \
    --db-subnet-group-description "Private subnets for RDS" \
    --subnet-ids subnet-abc123 subnet-def456

# These should be private subnets with no internet gateway route
# Database traffic stays within the VPC

Instance Classes

# RDS instance families:

# General Purpose (db.m6g, db.m7g) -- balanced compute and memory
db.m6g.large      2 vCPU    8 GB RAM     ~$0.178/hr
db.m6g.xlarge     4 vCPU   16 GB RAM     ~$0.356/hr
db.m6g.2xlarge    8 vCPU   32 GB RAM     ~$0.712/hr

# Memory Optimized (db.r6g, db.r7g) -- for memory-intensive workloads
db.r6g.large      2 vCPU   16 GB RAM     ~$0.252/hr
db.r6g.xlarge     4 vCPU   32 GB RAM     ~$0.504/hr
db.r6g.2xlarge    8 vCPU   64 GB RAM     ~$1.008/hr

# Burstable (db.t4g) -- for dev/test and low-traffic workloads
db.t4g.micro      2 vCPU    1 GB RAM     ~$0.016/hr
db.t4g.small      2 vCPU    2 GB RAM     ~$0.032/hr
db.t4g.medium     2 vCPU    4 GB RAM     ~$0.065/hr
db.t4g.large      2 vCPU    8 GB RAM     ~$0.129/hr

# Graviton (g suffix) instances offer ~20% better price-performance
# than equivalent Intel instances. Prefer db.m6g/r6g over db.m6i/r6i

# Pricing shown is Single-AZ, on-demand, us-east-1, PostgreSQL
# Multi-AZ roughly doubles the cost

Storage Options

# RDS storage types:

# gp3 (General Purpose SSD) -- RECOMMENDED for most workloads
# - Baseline: 3,000 IOPS, 125 MB/s throughput (included)
# - Scalable: up to 16,000 IOPS, 1,000 MB/s (independent of size)
# - Price: $0.08/GB/month + IOPS/throughput above baseline
# - Minimum: 20 GB, Maximum: 64 TB

# io1 / io2 (Provisioned IOPS SSD) -- for I/O-intensive workloads
# - Provisioned: up to 256,000 IOPS (io2 Block Express)
# - Price: $0.125/GB/month + $0.10/IOPS/month (io1)
# - Use when you need > 16,000 IOPS or predictable latency

# gp2 (previous generation) -- being replaced by gp3
# - IOPS scales with volume size (3 IOPS/GB, baseline 100)
# - Avoid for new deployments -- gp3 is cheaper and more flexible

# Enable storage autoscaling (recommended)
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --max-allocated-storage 500

# RDS automatically expands storage when:
# - Free storage falls below 10%
# - Low storage condition lasts at least 5 minutes
# - At least 6 hours since last modification
# Storage can only grow, never shrink

Multi-AZ Deployments

Multi-AZ creates a synchronous standby replica in a different Availability Zone. If the primary instance fails, RDS automatically fails over to the standby. The failover updates the DNS record for the endpoint — your application reconnects to the same hostname and reaches the new primary.

# Multi-AZ architecture:

Availability Zone A              Availability Zone B
+-------------------+           +-------------------+
| Primary Instance  |  ------>  | Standby Instance  |
| (reads + writes)  |  sync     | (no connections)  |
+-------------------+  repl.    +-------------------+
        |                               |
        v                               v
   EBS Storage                    EBS Storage
   (encrypted)                    (encrypted)

# Failover triggers:
# - Primary instance failure
# - AZ outage
# - Instance type change
# - Manual failover (for testing)
# - Software patching (during maintenance window)

# Failover duration: typically 60-120 seconds
# DNS TTL is 5 seconds so applications reconnect quickly

# Enable Multi-AZ on an existing instance
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --multi-az \
    --apply-immediately

# Test failover manually
aws rds reboot-db-instance \
    --db-instance-identifier my-app-db \
    --force-failover

Connection Handling During Failover

import psycopg2
from psycopg2 import pool
import time

# Use a connection pool with retry logic
class DatabasePool:
    def __init__(self, host, database, user, password):
        self.config = {
            'host': host,
            'database': database,
            'user': user,
            'password': password,
            'connect_timeout': 5,
            'options': '-c statement_timeout=30000'  # 30s query timeout
        }
        self._create_pool()

    def _create_pool(self):
        self.pool = pool.ThreadedConnectionPool(
            minconn=2,
            maxconn=10,
            **self.config
        )

    def execute_with_retry(self, query, params=None, max_retries=3):
        """Execute a query with automatic retry on connection failure."""
        for attempt in range(max_retries):
            conn = None
            try:
                conn = self.pool.getconn()
                conn.autocommit = True
                cursor = conn.cursor()
                cursor.execute(query, params)
                result = cursor.fetchall() if cursor.description else None
                self.pool.putconn(conn)
                return result
            except psycopg2.OperationalError:
                # Connection lost -- likely a failover
                if conn:
                    self.pool.putconn(conn, close=True)
                if attempt < max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                    try:
                        self._create_pool()
                    except Exception:
                        pass
                else:
                    raise

Read Replicas

Read replicas use asynchronous replication to offload read traffic from the primary instance. They are independent instances with their own endpoints.

# Create a read replica
aws rds create-db-instance-read-replica \
    --db-instance-identifier my-app-db-read1 \
    --source-db-instance-identifier my-app-db \
    --db-instance-class db.r6g.large

# Create a cross-region read replica (for DR or latency reduction)
aws rds create-db-instance-read-replica \
    --db-instance-identifier my-app-db-eu \
    --source-db-instance-identifier my-app-db \
    --db-instance-class db.r6g.large \
    --region eu-west-1

# Promote a read replica to standalone primary (for DR or migration)
aws rds promote-read-replica \
    --db-instance-identifier my-app-db-eu

# Read replica limits:
# MySQL:      up to 5 replicas
# PostgreSQL: up to 5 replicas
# MariaDB:    up to 5 replicas
# Aurora:     up to 15 replicas (with much lower replication lag)

# Replication lag:
# RDS MySQL/PostgreSQL: typically seconds, can grow under heavy write load
# Aurora: typically < 100ms (shared storage architecture)

Read/Write Routing

# Route reads to replicas, writes to primary
class ReadWriteRouter:
    def __init__(self, write_host, read_hosts):
        self.write_pool = create_pool(write_host)
        self.read_pools = [create_pool(h) for h in read_hosts]
        self._current_read = 0

    def write(self, query, params=None):
        """Send writes to the primary instance."""
        conn = self.write_pool.getconn()
        try:
            cursor = conn.cursor()
            cursor.execute(query, params)
            conn.commit()
        finally:
            self.write_pool.putconn(conn)

    def read(self, query, params=None):
        """Round-robin reads across replicas."""
        pool = self.read_pools[self._current_read % len(self.read_pools)]
        self._current_read += 1
        conn = pool.getconn()
        try:
            cursor = conn.cursor()
            cursor.execute(query, params)
            return cursor.fetchall()
        finally:
            pool.putconn(conn)

# Usage
router = ReadWriteRouter(
    write_host='my-app-db.abc123.us-east-1.rds.amazonaws.com',
    read_hosts=[
        'my-app-db-read1.abc123.us-east-1.rds.amazonaws.com',
        'my-app-db-read2.abc123.us-east-1.rds.amazonaws.com'
    ]
)

RDS Proxy

RDS Proxy sits between your application and the database, pooling and sharing database connections. It is essential for Lambda functions (which can open hundreds of connections during scale-up) and applications with many short-lived connections.

# Create an RDS Proxy
aws rds create-db-proxy \
    --db-proxy-name my-app-proxy \
    --engine-family POSTGRESQL \
    --auth '[{
        "AuthScheme": "SECRETS",
        "SecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-creds",
        "IAMAuth": "DISABLED"
    }]' \
    --role-arn arn:aws:iam::123456789012:role/rds-proxy-role \
    --vpc-subnet-ids subnet-abc123 subnet-def456 \
    --vpc-security-group-ids sg-0123456789abcdef0

# Register the target database
aws rds register-db-proxy-targets \
    --db-proxy-name my-app-proxy \
    --db-instance-identifiers my-app-db

# RDS Proxy benefits:

# 1. Connection pooling
#    Without proxy: 100 Lambda invocations = 100 DB connections
#    With proxy:    100 Lambda invocations = 10-20 pooled connections

# 2. Faster failover
#    Without proxy: DNS propagation + new connections = 60-120s
#    With proxy:    Proxy handles failover transparently = ~30s

# 3. IAM authentication
#    Applications authenticate to the proxy with IAM tokens
#    Proxy authenticates to the database with stored credentials

# Connect to the proxy endpoint instead of the DB endpoint:
# my-app-proxy.proxy-abc123.us-east-1.rds.amazonaws.com

Backups and Recovery

Automated Backups

# RDS takes two types of backups automatically:

# 1. Daily snapshots (during the backup window)
#    - Full snapshot of the DB instance
#    - Stored in S3 (managed by AWS, not visible in your S3 console)
#    - Retention: 1-35 days (default: 7)

# 2. Transaction logs (continuous)
#    - Backed up every 5 minutes
#    - Enable point-in-time recovery (PITR)
#    - Allows restore to any second within the retention period

# Set backup retention and window
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --backup-retention-period 14 \
    --preferred-backup-window "03:00-04:00"

# Backup storage: free up to the size of your DB instance
# Beyond that: $0.095/GB/month

Manual Snapshots

# Create a manual snapshot (retained until explicitly deleted)
aws rds create-db-snapshot \
    --db-instance-identifier my-app-db \
    --db-snapshot-identifier my-app-db-before-migration

# List snapshots
aws rds describe-db-snapshots \
    --db-instance-identifier my-app-db

# Copy snapshot to another region (for DR)
aws rds copy-db-snapshot \
    --source-db-snapshot-identifier arn:aws:rds:us-east-1:123456789012:snapshot:my-app-db-before-migration \
    --target-db-snapshot-identifier my-app-db-dr-copy \
    --region eu-west-1

# Share snapshot with another AWS account
aws rds modify-db-snapshot-attribute \
    --db-snapshot-identifier my-app-db-before-migration \
    --attribute-name restore \
    --values-to-add 987654321098

Point-in-Time Recovery

# Restore to a specific point in time (creates a NEW instance)
aws rds restore-db-instance-to-point-in-time \
    --source-db-instance-identifier my-app-db \
    --target-db-instance-identifier my-app-db-restored \
    --restore-time "2025-04-07T14:30:00Z" \
    --db-instance-class db.r6g.large

# Restore from a snapshot (creates a NEW instance)
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier my-app-db-restored \
    --db-snapshot-identifier my-app-db-before-migration \
    --db-instance-class db.r6g.large

# IMPORTANT: Restores always create a NEW instance
# You must update your application to point to the new endpoint
# Or rename the old instance, then rename the restored one to the original name

Security

Network Security

# RDS instances should be in private subnets with restricted security groups

# Security group: allow only your application servers
aws ec2 authorize-security-group-ingress \
    --group-id sg-rds-group \
    --protocol tcp \
    --port 5432 \
    --source-group sg-app-servers

# Never allow 0.0.0.0/0 access to a database security group
# Never set --publicly-accessible on a production database

Encryption

# Encryption at rest (must be enabled at creation, cannot add later)
# Uses AES-256 encryption with KMS keys
# Encrypts: storage, backups, snapshots, read replicas

# Encryption in transit (SSL/TLS)
# Download the RDS CA certificate bundle
wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem

# Connect with SSL/TLS encryption
conn = psycopg2.connect(
    host='my-app-db.abc123.us-east-1.rds.amazonaws.com',
    database='myapp',
    user='dbadmin',
    password=password,
    sslmode='verify-full',
    sslrootcert='global-bundle.pem'
)

# Force SSL connections at the database level (PostgreSQL)
# In the parameter group:
aws rds modify-db-parameter-group \
    --db-parameter-group-name my-params \
    --parameters "ParameterName=rds.force_ssl,ParameterValue=1,ApplyMethod=pending-reboot"

IAM Database Authentication

# Enable IAM authentication on the instance
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --enable-iam-database-authentication

# Create a database user that authenticates via IAM (PostgreSQL)
# Connect to the database and run:
# CREATE USER iam_user WITH LOGIN;
# GRANT rds_iam TO iam_user;

# Connect using IAM authentication token
import boto3

rds_client = boto3.client('rds')

token = rds_client.generate_db_auth_token(
    DBHostname='my-app-db.abc123.us-east-1.rds.amazonaws.com',
    Port=5432,
    DBUsername='iam_user',
    Region='us-east-1'
)

conn = psycopg2.connect(
    host='my-app-db.abc123.us-east-1.rds.amazonaws.com',
    database='myapp',
    user='iam_user',
    password=token,      # token is valid for 15 minutes
    sslmode='verify-full',
    sslrootcert='global-bundle.pem'
)

# Benefits: no long-lived passwords, authentication via IAM policies
# Works with EC2 instance roles, Lambda execution roles, etc.

Parameter Groups

Parameter groups control database engine configuration. The default parameter group is read-only. Create a custom one to tune settings for your workload.

# Create a custom parameter group
aws rds create-db-parameter-group \
    --db-parameter-group-name my-postgres-params \
    --db-parameter-group-family postgres16 \
    --description "Custom PostgreSQL 16 parameters"

# Key PostgreSQL parameters to tune:
aws rds modify-db-parameter-group \
    --db-parameter-group-name my-postgres-params \
    --parameters \
        "ParameterName=shared_buffers,ParameterValue={DBInstanceClassMemory/4},ApplyMethod=pending-reboot" \
        "ParameterName=effective_cache_size,ParameterValue={DBInstanceClassMemory*3/4},ApplyMethod=pending-reboot" \
        "ParameterName=work_mem,ParameterValue=65536,ApplyMethod=immediate" \
        "ParameterName=maintenance_work_mem,ParameterValue=524288,ApplyMethod=immediate" \
        "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
        "ParameterName=log_min_duration_statement,ParameterValue=1000,ApplyMethod=immediate"

# shared_buffers: ~25% of instance memory (RDS default)
# effective_cache_size: ~75% of instance memory
# work_mem: per-sort/hash memory (be conservative with many connections)
# log_min_duration_statement: log queries taking > 1 second

# Apply the parameter group to your instance
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --db-parameter-group-name my-postgres-params

Monitoring

CloudWatch Metrics

# Key RDS metrics to monitor:

CPUUtilization           # Percentage of CPU used
FreeableMemory           # Available RAM in bytes
DatabaseConnections      # Number of active connections
ReadIOPS / WriteIOPS     # I/O operations per second
ReadLatency / WriteLatency  # Average I/O latency
FreeStorageSpace         # Available storage in bytes
DiskQueueDepth           # Number of outstanding I/O requests
ReplicaLag               # Replication delay on read replicas (seconds)
BurstBalance             # Remaining burst credits (gp2/t-class only)

# Set alarms for critical thresholds
aws cloudwatch put-metric-alarm \
    --alarm-name rds-high-cpu \
    --namespace AWS/RDS \
    --metric-name CPUUtilization \
    --dimensions Name=DBInstanceIdentifier,Value=my-app-db \
    --statistic Average \
    --period 300 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 3 \
    --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts

# Critical alarms to set:
# CPU > 80% sustained   -- consider scaling up
# FreeableMemory < 10%  -- instance needs more RAM
# FreeStorageSpace < 20% -- storage running low
# DatabaseConnections > 80% of max_connections
# DiskQueueDepth > 10 sustained -- storage bottleneck
# ReplicaLag > 30 seconds -- replica falling behind

Performance Insights

Performance Insights shows you exactly which queries are consuming the most database resources, broken down by waits, SQL statements, and sessions. It is included free for 7 days of retention, or $0 additional for the basic tier.

# Enable Performance Insights
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --enable-performance-insights \
    --performance-insights-retention-period 7

# Performance Insights reveals:
#
# Top SQL by load:
#   SELECT * FROM orders WHERE status = 'pending'    -- 45% of DB load
#   INSERT INTO audit_log (...)                      -- 20% of DB load
#   UPDATE users SET last_login = now() WHERE ...    -- 15% of DB load
#
# Top waits:
#   IO:DataFileRead     -- reading data from disk (need more RAM or IOPS)
#   Lock:tuple          -- row-level lock contention
#   CPU                 -- compute-bound queries (need better indexes)
#
# This tells you exactly which query to optimize first

Enhanced Monitoring

# Enhanced Monitoring provides OS-level metrics at 1-60 second granularity
# (CloudWatch only provides 1-minute intervals)
#
# Additional metrics: per-process CPU, memory, file system usage, I/O stats
#
# Enable with --monitoring-interval (1, 5, 10, 15, 30, or 60 seconds)
aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --monitoring-interval 15 \
    --monitoring-role-arn arn:aws:iam::123456789012:role/rds-monitoring-role

Blue/Green Deployments

Blue/Green deployments let you make major changes (engine upgrades, parameter changes, schema migrations) with minimal downtime by creating a staging environment that stays in sync with production.

# Create a blue/green deployment
aws rds create-blue-green-deployment \
    --blue-green-deployment-name my-upgrade \
    --source arn:aws:rds:us-east-1:123456789012:db:my-app-db \
    --target-engine-version 16.4

# This creates:
# - Blue environment: your current production (unchanged)
# - Green environment: a copy with the new engine version
# - Logical replication keeps green in sync with blue

# After testing the green environment:
aws rds switchover-blue-green-deployment \
    --blue-green-deployment-identifier my-upgrade

# Switchover:
# 1. Blocks writes briefly
# 2. Ensures green is caught up
# 3. Renames instances (green gets the blue name)
# 4. Applications reconnect to the same endpoint
# Typical downtime: under 1 minute

Amazon Aurora

Aurora is AWS's cloud-native relational database, compatible with MySQL and PostgreSQL. It uses a different architecture from standard RDS that provides better performance, higher availability, and simpler operations.

Aurora Architecture

# Aurora separates compute from storage:

                  Writer Instance    Reader Instance(s)
                       |                   |
                       v                   v
              +------------------------------------+
              |     Shared Distributed Storage     |
              |  (6 copies across 3 AZs)           |
              |                                    |
              |  AZ-a: copy1, copy2                |
              |  AZ-b: copy3, copy4                |
              |  AZ-c: copy5, copy6                |
              +------------------------------------+

# Key differences from standard RDS:
# - Storage is shared between writer and readers (no replication lag for reads)
# - 6 copies of data across 3 AZs (tolerates loss of 2 copies for writes, 3 for reads)
# - Storage auto-scales from 10 GB to 128 TB (no pre-provisioning)
# - Replication lag: typically < 100ms (vs seconds for standard RDS)
# - Up to 15 read replicas (vs 5 for standard RDS)
# - Continuous backup to S3 (no backup window, no performance impact)
# - Writer failover to a reader: typically 10-30 seconds

Aurora Endpoints

# Aurora provides multiple endpoints:

# Cluster endpoint (writer) -- for all write operations
my-cluster.cluster-abc123.us-east-1.rds.amazonaws.com

# Reader endpoint (load-balanced across readers) -- for read operations
my-cluster.cluster-ro-abc123.us-east-1.rds.amazonaws.com

# Instance endpoints (specific instance) -- for direct access
my-cluster-instance-1.abc123.us-east-1.rds.amazonaws.com

# Custom endpoints -- for routing specific queries to specific instances
# e.g., route analytics queries to larger reader instances

Creating an Aurora Cluster

# Create an Aurora PostgreSQL cluster
aws rds create-db-cluster \
    --db-cluster-identifier my-aurora-cluster \
    --engine aurora-postgresql \
    --engine-version 16.4 \
    --master-username dbadmin \
    --manage-master-user-password \
    --storage-encrypted \
    --db-subnet-group-name my-db-subnet-group \
    --vpc-security-group-ids sg-0123456789abcdef0

# Add the writer instance
aws rds create-db-instance \
    --db-instance-identifier my-aurora-writer \
    --db-cluster-identifier my-aurora-cluster \
    --db-instance-class db.r6g.large \
    --engine aurora-postgresql

# Add reader instance(s)
aws rds create-db-instance \
    --db-instance-identifier my-aurora-reader-1 \
    --db-cluster-identifier my-aurora-cluster \
    --db-instance-class db.r6g.large \
    --engine aurora-postgresql

Aurora Serverless v2

Aurora Serverless v2 automatically scales compute capacity based on demand, measured in Aurora Capacity Units (ACUs). Each ACU provides approximately 2 GB of memory. Scaling is continuous and happens in increments of 0.5 ACU, with no interruption to connections.

# Create an Aurora Serverless v2 cluster
aws rds create-db-cluster \
    --db-cluster-identifier my-serverless-cluster \
    --engine aurora-postgresql \
    --engine-version 16.4 \
    --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=32 \
    --master-username dbadmin \
    --manage-master-user-password \
    --storage-encrypted

# Add a Serverless v2 instance
aws rds create-db-instance \
    --db-instance-identifier my-serverless-instance \
    --db-cluster-identifier my-serverless-cluster \
    --db-instance-class db.serverless \
    --engine aurora-postgresql

# Scaling range:
# MinCapacity: 0.5 ACU (1 GB RAM)  -- scales down to save costs
# MaxCapacity: up to 256 ACU (512 GB RAM)
# Each ACU: ~$0.12/hr (us-east-1)

# Use cases:
# - Variable workloads (high during business hours, low at night)
# - Development/staging environments
# - New applications with unpredictable traffic

Aurora Global Database

# Aurora Global Database replicates across regions with < 1 second lag
# Use for: disaster recovery, low-latency global reads

# Create a global cluster from an existing Aurora cluster
aws rds create-global-cluster \
    --global-cluster-identifier my-global-db \
    --source-db-cluster-identifier arn:aws:rds:us-east-1:123456789012:cluster:my-aurora-cluster

# Add a secondary region
aws rds create-db-cluster \
    --db-cluster-identifier my-aurora-eu \
    --engine aurora-postgresql \
    --global-cluster-identifier my-global-db \
    --region eu-west-1

# Failover to secondary region (RPO: typically < 1 second)
aws rds failover-global-cluster \
    --global-cluster-identifier my-global-db \
    --target-db-cluster-identifier arn:aws:rds:eu-west-1:123456789012:cluster:my-aurora-eu

Database Migration with DMS

# AWS Database Migration Service (DMS) migrates databases to RDS
# with minimal downtime using change data capture (CDC)

# Migration types:
# full-load              -- one-time migration of existing data
# cdc                    -- ongoing replication of changes only
# full-load-and-cdc      -- initial load + continuous sync (recommended)

# Workflow:
# 1. Create a DMS replication instance
# 2. Create source and target endpoints
# 3. Create and start the replication task
# 4. Monitor until source and target are in sync
# 5. Switch application to the new RDS endpoint
# 6. Stop the replication task

# Supported sources: on-premises MySQL, PostgreSQL, Oracle, SQL Server,
#   MongoDB, Amazon Aurora, S3, and more
# Supported targets: RDS, Aurora, DynamoDB, S3, Redshift, and more

Cost Optimization

# RDS pricing components:
# 1. Instance hours (compute)
# 2. Storage (GB/month)
# 3. I/O requests (Aurora only)
# 4. Backup storage (beyond free allocation)
# 5. Data transfer (cross-AZ, cross-region)

# Cost-saving strategies:

# 1. Reserved Instances (1 or 3 year commitment)
#    db.r6g.large on-demand:   $0.252/hr  = $2,207/yr
#    db.r6g.large 1-yr RI:    $0.159/hr  = $1,393/yr  (37% savings)
#    db.r6g.large 3-yr RI:    $0.101/hr  = $884/yr    (60% savings)

# 2. Use Graviton instances (db.r6g/m6g instead of db.r6i/m6i)
#    ~20% cheaper with equivalent or better performance

# 3. Aurora Serverless v2 for variable workloads
#    Scales down to 0.5 ACU during low traffic
#    No cost for idle instances (unlike provisioned Aurora)

# 4. Right-size instances using Performance Insights
#    Monitor CPU, memory, and I/O to find over-provisioned instances

# 5. Stop dev/test instances when not in use
aws rds stop-db-instance --db-instance-identifier dev-db
# Automatically restarts after 7 days (or manually start it)

# 6. Use gp3 storage instead of io1
#    gp3 baseline: 3,000 IOPS free
#    io1 at 3,000 IOPS: $0.10 * 3000 = $300/month additional

Maintenance Windows

# RDS applies patches during the maintenance window
# Default: 30-minute window assigned by AWS
# Customize to match your low-traffic period

aws rds modify-db-instance \
    --db-instance-identifier my-app-db \
    --preferred-maintenance-window "sun:03:00-sun:04:00"

# For Multi-AZ instances:
# Patches are applied to the standby first, then failover, then patch the old primary
# Total downtime during patching: typically 60-120 seconds

# For Aurora:
# Zero-downtime patching (ZDP) applies patches without failover when possible

Best Practices

Production Readiness

Enable Multi-AZ for all production databases
Set backup retention to at least 14 days
Enable encryption at rest (must be set at creation)
Force SSL/TLS connections in the parameter group
Place instances in private subnets with restricted security groups
Use --manage-master-user-password to store credentials in Secrets Manager
Enable auto-scaling for storage with --max-allocated-storage
Test failover regularly with --force-failover

Performance

Use Performance Insights to identify slow queries and bottlenecks
Create custom parameter groups and tune for your workload
Use read replicas to offload read-heavy traffic
Use RDS Proxy for connection pooling (especially with Lambda)
Use gp3 storage and scale IOPS independently of storage size
Monitor DiskQueueDepth and ReadLatency/WriteLatency for storage bottlenecks

Operations

Use Blue/Green deployments for major version upgrades
Schedule maintenance windows during low-traffic periods
Set CloudWatch alarms for CPU, memory, connections, storage, and replica lag
Use DMS for migrations with minimal downtime
Copy snapshots cross-region for disaster recovery
Stop dev/test instances outside business hours to save costs

Amazon RDS Deep Dive: Managed Relational Databases on AWS