Copying RDS snapshot to another region for cross-region recovery

October 21, 2016 by Paulina BudzoƄ

For an updated ready-to-use CloudFormation template of this code, see newer post: Complete code: cross-region RDS recovery .

Amazon RDS is a great database-as-a-service, which takes care of almost all database-related maintenance tasks for you - everything from automated backups and patching to replication and fail-overs into another availability zones.

Unfortunately all of this fails if the region where your RDS is hosted fails. Region-wide failures are very rare, but they do happen! RDS does not support cross-region replication at the moment, so you cannot simply create a replica of your database in another region (unless you host the database on an EC2 instance and set up the replication yourself). The second-best option, to make sure you can restore your service quickly in another region, is to always have a copy of your latest database backup in that region. In case of RDS, that can mean copying automated snapshots. There is no option for AWS to do it automatically, but it can be easily scripted with AWS Lambda functions.

RDS can create an automated snapshot of your database every day. All we need to do is make sure to copy that snapshot once it’s ready and remove any old snapshots from the “fail-over region” to save storage cost.

The following quick-and-dirty Lambda function (in Python) accomplishes just that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import boto3
import operator

ACCOUNT = 'xxxxx'


def copy_latest_snapshot():
    client = boto3.client('rds', 'eu-west-1')
    frankfurt_client = boto3.client('rds', 'eu-central-1')

    response = client.describe_db_snapshots(
        SnapshotType='automated',
        IncludeShared=False,
        IncludePublic=False
    )

    if len(response['DBSnapshots']) == 0:
        raise Exception("No automated snapshots found")

    snapshots_per_project = {}
    for snapshot in response['DBSnapshots']:
        if snapshot['Status'] != 'available':
            continue

        if snapshot['DBInstanceIdentifier'] not in snapshots_per_project.keys():
            snapshots_per_project[snapshot['DBInstanceIdentifier']] = {}

        snapshots_per_project[snapshot['DBInstanceIdentifier']][snapshot['DBSnapshotIdentifier']] = snapshot[
            'SnapshotCreateTime']

    for project in snapshots_per_project:
        sorted_list = sorted(snapshots_per_project[project].items(), key=operator.itemgetter(1), reverse=True)

        copy_name = project + "-" + sorted_list[0][1].strftime("%Y-%m-%d")

        print("Checking if " + copy_name + " is copied")

        try:
            frankfurt_client.describe_db_snapshots(
                DBSnapshotIdentifier=copy_name
            )
        except:
            response = frankfurt_client.copy_db_snapshot(
                SourceDBSnapshotIdentifier='arn:aws:rds:eu-west-1:' + ACCOUNT + ':snapshot:' + sorted_list[0][0],
                TargetDBSnapshotIdentifier=copy_name,
                CopyTags=True
            )

            if response['DBSnapshot']['Status'] != "pending" and response['DBSnapshot']['Status'] != "available":
                raise Exception("Copy operation for " + copy_name + " failed!")
            print("Copied " + copy_name)

            continue

        print("Already copied")


def remove_old_snapshots():
    client = boto3.client('rds', 'eu-west-1')
    frankfurt_client = boto3.client('rds', 'eu-central-1')

    response = frankfurt_client.describe_db_snapshots(
        SnapshotType='manual'
    )

    if len(response['DBSnapshots']) == 0:
        raise Exception("No manual snapshots in Frankfurt found")

    snapshots_per_project = {}
    for snapshot in response['DBSnapshots']:
        if snapshot['Status'] != 'available':
            continue

        if snapshot['DBInstanceIdentifier'] not in snapshots_per_project.keys():
            snapshots_per_project[snapshot['DBInstanceIdentifier']] = {}

        snapshots_per_project[snapshot['DBInstanceIdentifier']][snapshot['DBSnapshotIdentifier']] = snapshot[
            'SnapshotCreateTime']

    for project in snapshots_per_project:
        if len(snapshots_per_project[project]) > 1:
            sorted_list = sorted(snapshots_per_project[project].items(), key=operator.itemgetter(1), reverse=True)
            to_remove = [i[0] for i in sorted_list[1:]]

            for snapshot in to_remove:
                print("Removing " + snapshot)
                frankfurt_client.delete_db_snapshot(
                    DBSnapshotIdentifier=snapshot
                )


def lambda_handler(event, context):
    copy_latest_snapshot()
    remove_old_snapshots()


if __name__ == '__main__':
    lambda_handler(None, None)

For the given account (update the ACCOUNT var at the top of the code) it will go through each of your RDS instances and copy the latest snapshot from Ireland (eu-west-1) to Frankfurt (eu-central-1). It will then go through all manual snapshots within Frankfurt and keep only the latest snapshot for each instance. Region values can be changed within the script to match any requirements.

This Lambda can be scheduled in two ways:

  • via CloudWatch Events Schedule, to simply run every day,
  • via RDS events (through SNS), to run whenever an RDS backup is finished (some improvements to the code could be useful).

You can create this function manually (it does not require any additional libraries, so it can be copied & pasted into AWS Lambda) or use CloudFormation (please do!). For reference, check out the GitHub repository where you can find other useful Lambdas and CloudFormation templates for their creation: https://github.com/pbudzon/aws-maintenance.

Posted in: AWS