Skip to content

Instant alarms and dashboards for Serverless, SAM, CDK and CloudFormation

License

Notifications You must be signed in to change notification settings

fourTheorem/slic-watch

Repository files navigation

slic-watch

serverless Build Coverage Status JavaScript Style Guide

SLIC Watch provides a CloudWatch Dashboard and Alarms for:

  1. AWS Lambda
  2. API Gateway
  3. Step Functions
  4. DynamoDB Tables
  5. Kinesis Data Streams (available for Lambda consumers, more coming soon)
  6. SQS Queues (Coming soon)

Currently, SLIC Watch is available as a Serverless Framework plugin.

Installation

npm install serverless-slic-watch-plugin --save-dev

Configuration

The topic configuration must be configured with the ARN of an SNS Topic. Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node). Supported options along with their defaults are shown below.

# ...

custom:
  slicWatch:
    topic: SNS_TOPIC_ARN

    alarms:
      enabled: true
      Period: 60
      EvaluationPeriods: 1
      TreatMissingData: notBreaching
      ComparisonOperator: GreaterThanThreshold
      Lambda: # Lambda Functions
        Errors:
          Threshold: 0
          Statistic: Sum
        ThrottlesPc: # Throttles are evaluated as a percentage of invocations
          Threshold: 0
        DurationPc: # Duration is evaluated as a percentage of the function timeout
          Threshold: 95
          Statistic: Maximum
        Invocations: # No invocation alarms are created by default. Override threshold to create alarms
          enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
          Threshold: null
          Statistic: Sum
        IteratorAge:
          Threshold: 10000
          Statistic: Maximum
      ApiGateway: # API Gateway REST APIs
        5XXError:
          Statistic: Average
          Threshold: 0
        4XXError:
          Statistic: Average
          Threshold: 0.05
        Latency:
          ExtendedStatistic: p99
          Threshold: 5000
      States: # Step Functions
        Statistic: Sum
        ExecutionsThrottled:
          Threshold: 0
        ExecutionsFailed:
          Threshold: 0
        ExecutionsTimedOut:
          Threshold: 0
      DynamoDB:
        # Consumed read/write capacity units are not alarmed. These should either
        # be part of an auto-scaling configuration for provisioned mode or should be automatically
        # avoided for on-demand mode. Instead, we rely on persistent throttling
        # to alert failures in these scenarios.
        # Throttles can occur in normal operation and are handled with retries. Threshold should
        # therefore be configured to provide meaningful alarms based on higher than average throttling.
        Statistic: Sum
        ReadThrottleEvents:
          Threshold: 10
        WriteThrottleEvents:
          Threshold: 10
        UserErrors:
          Threshold: 0
        SystemErrors:
          Threshold: 0

    dashboard:
      timeRange:
        # For possible 'start' and 'end' values, see
        # https:# docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html
        start: -PT3H
      metricPeriod: 300
      widgets:
        metricPeriod: 300
        width: 8
        height: 6
        Lambda:
          # Metrics per Lambda Function
          Errors:
            Statistic: ['Sum']
          Throttles:
            Statistic: ['Sum']
          Duration:
            Statistic: ['Average', 'p95', 'Maximum']
          Invocations:
            Statistic: ['Sum']
          ConcurrentExecutions:
            Statistic: ['Maximum']
          IteratorAge:
            Statistic: ['Maximum']
        ApiGateway:
          5XXError:
            Statistic: ['Sum']
          4XXError:
            Statistic: ['Sum']
          Latency:
            Statistic: ['Average', 'p95']
          Count:
            Statistic: ['Sum']
        States:
          # Step Functions
          ExecutionsFailed:
            Statistic: ['Sum']
          ExecutionsThrottled:
            Statistic: ['Sum']
          ExecutionsTimedOut:
            Statistic: ['Sum']
        DynamoDB:
          # Tables and GSIs
          ReadThrottleEvents:
            Statistic: ['Sum']
          WriteThrottleEvents:
            Statistic: ['Sum']
        

An example project is provided for reference: serverless-test-project

Releasing a new version

In order to release a new version of the project:

  • update the package version in the top level package.json
  • run npm run lerna:sync to synchronise that version across all the sub packages
  • push these changes
  • draft a new release in GitHub (the CI will do the publish to npm)

References

Other Projects

  1. serverless-plugin-aws-alerts
  2. Real World Serverless Application - Serverless Operations
  3. CDK Patterns - The CloudWatch Dashboard

Reading

  1. AWS Well Architected Serverless Applications Lens
  2. How to Monitor Lambda with CloudWatch Metrics - Yan Cui

LICENSE

Apache - LICENSE