Tuesday, October 26, 2021

[SOLVED] An efficient method to check that thousands of EC2 instances have a Snapshot?

Issue

I'm tasked with developing a way to ensure all our running instances have a current snapshot (usually completed daily). Currently I do the following:

  1. Paginate through every instance
  2. Cross reference it's volume ID with any snapshot with that volume ID
  3. Filter only snapshots created 'today'.

My code works, its just very slow... This won't be suitable for an organization with tens of thousands of instances. What are some ways I could speed this up without hitting the rate limit for the API call?

Code:

'''
Checks for a current snapshot for every instance, if it has one it is compliant, otherwise non-compliant.
'''

from datetime import datetime, date, timedelta
import boto3
ec2client = boto3.client('ec2', region_name='us-east-1')

# Create a  Paginator
paginator = ec2client.get_paginator('describe_instances')
instances = paginator.paginate().build_full_result()

for reservation in instances["Reservations"]:
    for instance in reservation["Instances"]:

        # Set base value
        compliant = "Non-compliant"

        # Lists all storage devices attached to instance.
        block_device_mappings = instance["BlockDeviceMappings"]

        for block in block_device_mappings:
            ebs = block.get("Ebs", False)

            # Volume ID of instances storage, use to find Snapshots.
            volume_id = ebs.get("VolumeId", False)

            # Wild card for filtering snapshots that occured today
            compliant_time_frame = date.isoformat(date.today()) + '*'

            snapshots = ec2client.describe_snapshots(
                    Filters=[
                    {'Name': 'volume-id','Values': [volume_id]},
                    {'Name': 'start-time', 'Values': [compliant_time_frame]}
                ]
            )
            snapshots = snapshots.get("Snapshots", [])

            # If snapshots exist, instance is compliant
            # We have already drilled down for current snapshots in .describe_snapshots() filter
            if len(snapshots) > 0:
                compliant = "Compliant"

        print(instance.get("InstanceId"), compliant)

Solution

It appears that the code is calling describe_snapshots() for every instance.

It would probably be faster to first retrieve a list of ALL snapshots created within the desired timeframe. Store them in a list or dictionary.

Then, while looping through each instance, simply consult the list or dictionary to confirm whether the snapshot exists.



Answered By - John Rotenstein