SLIC Watch provides a CloudWatch Dashboard and Alarms for:
- AWS Lambda
- API Gateway
- Step Functions
- DynamoDB Tables
- Kinesis Data Streams (available for Lambda consumers, more coming soon)
- SQS Queues (Coming soon)
Currently, SLIC Watch is available as a Serverless Framework plugin.
npm install serverless-slic-watch-plugin --save-dev
The topic
configuration must be configured with the ARN of an SNS Topic.
Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node).
Supported options along with their defaults are shown below.
# ...
custom:
slicWatch:
topic: SNS_TOPIC_ARN
alarms:
enabled: true
Period: 60
EvaluationPeriods: 1
TreatMissingData: notBreaching
ComparisonOperator: GreaterThanThreshold
Lambda: # Lambda Functions
Errors:
Threshold: 0
Statistic: Sum
ThrottlesPc: # Throttles are evaluated as a percentage of invocations
Threshold: 0
DurationPc: # Duration is evaluated as a percentage of the function timeout
Threshold: 95
Statistic: Maximum
Invocations: # No invocation alarms are created by default. Override threshold to create alarms
enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
Threshold: null
Statistic: Sum
IteratorAge:
Threshold: 10000
Statistic: Maximum
ApiGateway: # API Gateway REST APIs
5XXError:
Statistic: Average
Threshold: 0
4XXError:
Statistic: Average
Threshold: 0.05
Latency:
ExtendedStatistic: p99
Threshold: 5000
States: # Step Functions
Statistic: Sum
ExecutionsThrottled:
Threshold: 0
ExecutionsFailed:
Threshold: 0
ExecutionsTimedOut:
Threshold: 0
DynamoDB:
# Consumed read/write capacity units are not alarmed. These should either
# be part of an auto-scaling configuration for provisioned mode or should be automatically
# avoided for on-demand mode. Instead, we rely on persistent throttling
# to alert failures in these scenarios.
# Throttles can occur in normal operation and are handled with retries. Threshold should
# therefore be configured to provide meaningful alarms based on higher than average throttling.
Statistic: Sum
ReadThrottleEvents:
Threshold: 10
WriteThrottleEvents:
Threshold: 10
UserErrors:
Threshold: 0
SystemErrors:
Threshold: 0
dashboard:
timeRange:
# For possible 'start' and 'end' values, see
# https:# docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html
start: -PT3H
metricPeriod: 300
widgets:
metricPeriod: 300
width: 8
height: 6
Lambda:
# Metrics per Lambda Function
Errors:
Statistic: ['Sum']
Throttles:
Statistic: ['Sum']
Duration:
Statistic: ['Average', 'p95', 'Maximum']
Invocations:
Statistic: ['Sum']
ConcurrentExecutions:
Statistic: ['Maximum']
IteratorAge:
Statistic: ['Maximum']
ApiGateway:
5XXError:
Statistic: ['Sum']
4XXError:
Statistic: ['Sum']
Latency:
Statistic: ['Average', 'p95']
Count:
Statistic: ['Sum']
States:
# Step Functions
ExecutionsFailed:
Statistic: ['Sum']
ExecutionsThrottled:
Statistic: ['Sum']
ExecutionsTimedOut:
Statistic: ['Sum']
DynamoDB:
# Tables and GSIs
ReadThrottleEvents:
Statistic: ['Sum']
WriteThrottleEvents:
Statistic: ['Sum']
An example project is provided for reference: serverless-test-project
In order to release a new version of the project:
- update the package version in the top level
package.json
- run
npm run lerna:sync
to synchronise that version across all the sub packages - push these changes
- draft a new release in GitHub (the CI will do the publish to npm)
- serverless-plugin-aws-alerts
- Real World Serverless Application - Serverless Operations
- CDK Patterns - The CloudWatch Dashboard
- AWS Well Architected Serverless Applications Lens
- How to Monitor Lambda with CloudWatch Metrics - Yan Cui
Apache - LICENSE