Amazon DLM monitoring
Amazon Data Lifecycle Manager (DLM) helps automate the creation, retention, and deletion of Amazon EBS snapshots and EBS-backed Amazon Machine Images (AMIs). It simplifies data backup management by defining life cycle policies that handle these tasks on a schedule, ensuring your backup and cleanup operations remain consistent and cost-efficient.
Overview
By integrating AWS DLM with Site24x7, you can monitor and validate your automated backup and AMI creation processes. Site24x7 tracks the execution status of your DLM policies, helping you confirm that snapshots are being created and deleted as expected. This ensures your data retention and cleanup rules are functioning correctly.
When you integrate AWS DLM with Site24x7, two individual monitors are created:
- DLM-EBS Snapshots: Site24x7 collects information about snapshot life cycle policies, including creation and deletion schedules, recent executions, and any failed actions. It helps you confirm that snapshot automation is working correctly and that old snapshots are being cleaned up according to defined retention rules.
- DLM-EBS Backed AMI: This monitor focuses on AMI life cycle policies. It checks for successful AMI creation and deregistration activities, identifies failed executions, and ensures that AMI retention settings are applied properly. Monitoring this helps maintain a consistent and optimized AMI inventory.
Together, these monitors help validate the overall health of your DLM configurations, ensuring both EBS volume backups and AMI images are managed as expected across your AWS environment.
Use case
If you manage multiple EBS volumes or AMIs across accounts, you may rely on DLM policies to automate snapshot creation and deletion. However, policy failures or misconfigurations such as missing permissions, invalid schedules, or resource tagging errors can cause backup gaps without immediate notice.
Site24x7’s DLM integration helps detect such issues early by monitoring life cycle policy activity and execution results. You can verify whether each policy is working correctly, track any failures, and ensure data protection tasks are running as intended.
Benefits of Site24x7's Amazon DLM integration
Integrate your Amazon DLM environment with Site24x7 and leverage the following benefits:
- Operational assurance: Ensure data backup and cleanup automation continues without interruption.
- Proactive alerts: Receive notifications when policy executions fail or deviate from the defined schedule.
- Audit support: Maintain visibility into policy executions for compliance and retention verification.
- Cross-service correlation: Combine DLM insights with EC2, EBS, and other AWS services for complete infrastructure monitoring.
Setup and configuration
- Log in to your Site24x7 account.
- Go to Cloud > AWS > Integrate AWS Account and create a cross-account IAM role to enable Site24x7 to access your AWS resources.
- On the Integrate AWS Account page, select DLM from the Services to be discovered list based on your requirement.
Permissions
Ensure that Site24x7 receives the following permissions to monitor Amazon DLM:
- "dlm:GetLifecyclePolicies"
- "dlm:GetLifecyclePolicy"
Polling frequency
Site24x7 queries AWS service-level APIs according to the set polling frequency (from once a minute to once a day) to collect metrics from DLM monitors.
Supported metrics
The supported metrics for DLM monitors are given below.
DLM-EBS Snapshots
The supported metrics for DLM-EBS Snapshots monitor are given below.
| Metric name | Description | Statistics | Unit |
|---|---|---|---|
|
Resources Targeted |
The number of resources targeted by the tags specified in a snapshot or EBS-backed AMI policy. |
Sum |
Count |
|
Snapshots Create Started |
The number of snapshot create actions initiated by a snapshot policy. |
Sum |
Count |
|
Snapshots Create Completed |
The number of snapshots created by a snapshot policy. This includes successful retries within 60 minutes of the scheduled time. |
Sum |
Count |
|
Snapshots Create Failed |
The number of snapshots that could not be created by a snapshot policy. This includes unsuccessful retries within 60 minutes from the scheduled time. |
Sum |
Count |
|
Snapshots Shared Completed |
The number of snapshots shared across accounts by a snapshot policy. |
Sum |
Count |
|
Snapshots Delete Completed |
The number of snapshots deleted by a snapshot or EBS-backed AMI policy. This metric applies only to snapshots created by the policy. |
Sum |
Count |
|
Snapshots Delete Failed |
The number of snapshots that could not be deleted by a snapshot or EBS-backed AMI policy. This metric applies only to snapshots created by the policy. |
Sum |
Count |
|
Snapshots Copied Region Started |
The number of cross-region snapshot copy actions initiated by a snapshot policy. |
Sum |
Count |
|
Snapshots Copied Region Completed |
The number of cross-region snapshot copies created by a snapshot policy. This includes successful retries within 24 hours of the scheduled time. |
Sum |
Count |
|
Snapshots Copied Region Failed |
The number of cross-region snapshot copies that could not be created by a snapshot policy. This includes unsuccessful retries within 24 hours from the scheduled time. |
Sum |
Count |
|
Snapshots Copied Region Delete Completed |
The number of cross-region snapshot copies deleted, as designated by the retention rule, by a snapshot policy. |
Count |
|
|
Snapshots Copied Region Delete Failed |
The number of cross-region snapshot copies that could not be deleted, as designated by the retention rule, by a snapshot policy. |
Sum |
Count |
|
Snapshots Archive Deletion Failed |
The number of archived snapshots that could not be deleted from the archive tier by a snapshot policy. |
Sum |
Count |
|
Snapshots Archive Scheduled |
The number of snapshots that were scheduled to be archived by a snapshot policy. |
Sum |
Count |
|
Snapshots Archive Completed |
The number of snapshots that were successfully archived by a snapshot policy. |
Sum |
Count |
|
Snapshots Archive Failed |
The number of snapshots that could not be archived by a snapshot policy. |
Sum |
Count |
|
Snapshots Archive Deletion Completed |
The number of archived snapshots that were successfully deleted from the archive tier by a snapshot policy. |
Sum |
Count |
|
Pre-Script Started |
The number of instances for which a pre-script was successfully initiated. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Pre-Script Completed |
The number of instances for which a pre-script was successfully completed. The metric is emitted even if the pre script completes outside of the specified timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Pre-Script Failed |
The number of instances for which a pre-script failed to complete successfully. The metric is emitted even if the pre-script completes outside of the specified timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Post-Script Started |
The number of instances for which a post-script was successfully initiated. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Post-Script Completed |
The number of instances for which a post script was successfully completed. The metric is emitted even if the post script completes outside of the specified timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Post-Script Failed |
The number of instances for which a post script failed to complete successfully. The metric is emitted even if the post script completes outside of the specified timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
VSS Backup Started |
The number of instances for which a Volume Shadow Copy Service (VSS) backup was successfully initiated. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
VSS Backup Completed |
The number of instances for which a VSS backup was successfully completed. The metric is emitted even if the VSS backup completes outside of the timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
VSS Backup Failed |
The number of instances for which a VSS backup failed to complete successfully. The metric is emitted even if the VSS backup completes outside of the timeout period. If script retries are enabled, this metric can be emitted multiple times per policy run. |
Sum |
Count |
|
Snapshots Copied Account Started |
The number of cross-account snapshot copy actions initiated by a cross-account copy event policy. |
Sum |
Count |
|
Snapshots Copied Account Completed |
The number of snapshots copied from another account by a cross-account copy event policy. This includes successful retries within 24 hours of the scheduled time. |
Sum |
Count |
|
Snapshots Copied Account Failed |
The number of snapshots that could not be copied from another account by a cross-account copy event policy. This includes unsuccessful retries within 24 hours of the scheduled time. |
Sum |
Count |
|
Snapshots Copied Account Delete Completed |
The number of cross-region snapshot copies deleted, as designated by the retention rule, by a cross-account copy event policy. |
Sum |
Count |
|
Snapshots Copied Account Delete Failed |
The number of cross-region snapshot copies that could not be deleted, as designated by the retention rule, by a cross-account copy event policy. |
Sum |
Count |
DLM-EBS Backed AMI
The supported metrics for DLM-EBS Backed AMI monitor are given below.
| Metric name | Description | Statistics | Unit |
|---|---|---|---|
|
Resources Targeted |
The number of resources targeted by the tags specified in a snapshot or EBS-backed AMI policy. |
Sum |
Count |
|
Snapshots Delete Completed |
The number of snapshots deleted by a snapshot or EBS-backed AMI policy. This metric applies only to snapshots created by the policy. |
Sum |
Count |
|
Snapshots Delete Failed |
The number of snapshots that could not be deleted by a snapshot or EBS-backed AMI policy. This metric applies only to snapshots created by the policy. |
Sum |
Count |
|
Snapshots Copied Region Delete Completed |
The number of cross-region snapshot copies deleted, as designated by the retention rule, by a snapshot policy. |
Sum |
Count |
|
Snapshots Copied Region Delete Failed |
The number of cross-region snapshot copies that could not be deleted, as designated by the retention rule, by a snapshot policy. |
Sum |
Count |
|
Images Create Started |
The number of create image actions initiated by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Create Completed |
The number of AMIs created by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Create Failed |
The number of AMIs that could not be created by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Deregister Completed |
The number of AMIs deregistered by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Deregister Failed |
The number of AMIs that could not be deregistered by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Copied Region Started |
The number of cross-region copy actions initiated by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Copied Region Completed |
The number of cross-region AMI copies created by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Copied Region Failed |
The number of cross-region AMI copies that could not be created by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Copied Region Deregister Completed |
The number of cross-region AMI copies deregistered, as designated by the retention rule, by an EBS-backed AMI policy. |
Sum |
Count |
|
Images Copied Region Deregistered Failed |
The number of cross-region AMI copies that could not be deregistered, as designated by the retention rule, by an EBS-backed AMI policy. |
Sum |
Count |
|
Enable Image Deprecation Completed |
The number of AMIs that were marked for deprecation by an EBS-backed AMI policy. |
Sum |
Count |
|
Enable Image Deprecation Failed |
The number of AMIs that could not be marked for deprecation by an EBS-backed AMI policy. |
Sum |
Count |
|
Enable Copied Image Deprecation Completed |
The number of cross-region AMI copies that were marked for deprecation by an EBS-backed AMI policy. |
Sum |
Count |
|
Enable Copied Image Deprecation Failed |
The number of cross-region AMI copies that could not be marked for deprecation by an EBS-backed AMI policy. |
Sum |
Count |
Threshold configuration
To configure thresholds for DLM monitors:
- Log in to your Site24x7 account and navigate to Admin > Configuration Profiles > Threshold and Availability.
- Click Add Threshold Profile.
- Select the applicable monitor type from the Monitor Type drop-down menu. The available monitor types are DLM-EBS Snapshots and DLM-EBS Backed AMI.
- Provide an appropriate name in the Display Name field.
- The supported metrics are displayed in the Threshold Configuration section. You can set threshold values for all the metrics mentioned above.
- Click Save.
Licensing
- Each DLM-EBS Snapshots monitor utilizes one basic monitor license.
- Each DLM-EBS Backed AMI monitor utilizes one basic monitor license.
Viewing Amazon DLM data
To view the DLM-EBS Snapshots monitor:
- From the Site24x7 console, navigate to Cloud > AWS > DLM-EBS Snapshots.
To view the DLM-EBS Backed AMI monitor:
- From the Site24x7 console, navigate to Cloud > AWS > DLM-EBS Backed AMI.
Monitor data
The monitor data for each Amazon DLM monitor is given below.
DLM-EBS Snapshots
The monitor data for DLM-EBS Snapshots monitor is given below.
Summary
The Summary tab provides an overview of the events timeline and metrics in form of charts.
Cross Region Policies
The Cross Region Policies tab shows details of your snapshot copy and deletion activities across AWS regions. You can track metrics such as the completion and failure counts of cross-region snapshot copy and delete operations. The charts and data help confirm if your cross-region backup policies are running successfully and highlight any failures that might need attention.
Configuration Details
The Configuration tab displays all key configuration details of the monitored DLM-EBS Snapshots monitor. You can view details such as Policy ID, DLM - EBS Snapshot Policy Name, Resource Type, and Policy Type.
Schedules
The Schedules tab displays the schedule configuration details for the DLM policy. It includes information about when snapshots are created, their retention periods, and how frequently they are taken or deleted.
Zia Forecast
A forecast chart displays future points of a performance metric (measurement of resource usage) based on historical time series data. Thirty days of historical data is used to predict what your metric usage will be in the next seven days.
Outages
The Outages tab provides details on an outage's start time, end time, duration, and comments, if any.
Inventory
Obtain details like Resource Name, Monitor Licensing Category, and Check Frequency from the Inventory tab. Set and view the Threshold and Availability Profile and the Notification Profile according to the user in this tab.
Log Report
This tab provides a consolidated report of the DLM-EBS Snapshots monitor's log status, which can be downloaded as a CSV file.
Alert Logs
This tab displays a chronological list of all triggered alerts related to the DLM-EBS Snapshots monitor. This tab helps you trace alert history and severity to assess issues and validate threshold settings.
DLM-EBS Backed AMI
The monitor data for DLM-EBS Backed AMI monitor is given below.
Summary
The Summary tab provides an overview of the events timeline and metrics in form of charts.
Cross Region Policies
The Cross Region Policies tab displays data related to the cross-region operations of your AMI lifecycle policies. You can view metrics showing when AMI copies are initiated, completed, or failed across regions. It also tracks the deregistration status of copied AMIs and the deletion status of associated snapshots, helping you confirm that your cross-region image copy and cleanup tasks are working as expected.
Configuration Details
The Configuration tab displays all key configuration details of the monitored DLM-EBS Snapshots monitor. You can view details such as Policy ID, DLM - EBS Snapshot Policy Name, Resource Type, and Policy Type.
Schedules
The Schedules tab lists the schedule configuration details defined in your DLM policy. It shows when AMIs are created, copied, and deleted, along with their retention periods and frequency. This helps verify that your AMI creation and rotation schedules align with your organization’s backup and compliance requirements.
Zia Forecast
A forecast chart displays future points of a performance metric (measurement of resource usage) based on historical time series data. Thirty days of historical data is used to predict what your metric usage will be in the next seven days.
Outages
The Outages tab provides details on an outage's start time, end time, duration, and comments, if any.
Inventory
Obtain details like Cache Name, Region, and Monitor Licensing Category from the Inventory tab. Set and view the Threshold and Availability Profile and the Notification Profile according to the user in this tab.
Log Report
This tab provides a consolidated report of the DLM-EBS Backed AMI monitor's log status, which can be downloaded as a CSV file.
Alert Logs
This tab displays a chronological list of all triggered alerts related to the DLM-EBS Backed AMI monitor. This tab helps you trace alert history and severity to assess issues and validate threshold settings.
