Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using CW Agent on KernelApp #9

Open
cabral1888 opened this issue Feb 8, 2022 · 2 comments
Open

Error when using CW Agent on KernelApp #9

cabral1888 opened this issue Feb 8, 2022 · 2 comments

Comments

@cabral1888
Copy link

cabral1888 commented Feb 8, 2022

Hi all,

I am new to Sagemaker Studio and I was wondering if there is a way to monitor the studio usage, like, how many machines are being used, how much RAM and CPU the users are using. I've seen another repo of examples from notebook-lifecycle-config-examples (https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples) and I saw a very interesting lifecycle configuration: publish-instance-metrics.

I tried to reproduce this notebook-lifecycle-configuration inside studio-lifecycle-configuration, but no success. Here is my studio lifecycle configuration:

#!/bin/bash

set -e

# OVERVIEW
# This script publishes the system-level metrics from the Notebook instance to Cloudwatch.
#
# Note that this script will fail if either condition is not met
#   1. Ensure the Notebook Instance has internet connectivity to fetch the example config
#   2. Ensure the Notebook Instance execution role permissions to cloudwatch:PutMetricData to publish the system-level metrics
#
# https://aws.amazon.com/cloudwatch/pricing/
apt-get update
apt-get -y install jq

# PARAMETERS
NOTEBOOK_INSTANCE_NAME=$(jq '.ResourceName' /opt/ml/metadata/resource-metadata.json --raw-output)

echo "Fetching the CloudWatch agent configuration file."
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/publish-instance-metrics/amazon-cloudwatch-agent.json

sed -i -- "s/MyNotebookInstance/$NOTEBOOK_INSTANCE_NAME/g" amazon-cloudwatch-agent.json

echo "Starting the CloudWatch agent on the Notebook Instance."
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file://$(pwd)/amazon-cloudwatch-agent.json -s

In order to reproduce and try to understand what happened, I decided to use a terminal tab inside Sagemaker Studio and run the commands one by one and see what happens. The last command gave me the following output:

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl: 469: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl: systemctl: not found
unknown init system

I don't know if there is anything I'm missing, or if it isn't supported yet by sagemaker studio. Can you please help me on this issue?

P.S.: I'm using a Kernel with Python3 and Data Science docker image.

@cabral1888
Copy link
Author

Anyone?

@leezero-carbon
Copy link

@cabral1888 I have struggled with this recently and it's worth understanding the difference between SageMaker Notebook documentation and SageMaker Studio.

In the AWS docs it's sometimes hard (because they lack clarity) to understand which bit they are talking about.

Anyway the code you mention above is for running in an NoteBook instance not a Studio KernelGateway instance running your notebook. In the KernelGateway there is no init system and you shouldn't try and make one as that kinda breaks the idea of running Docker.

Instead... have a look into running a metric gathering process that's not reliant on an init system and can be spawned into the background e.g.
nohup custom-tool --config /opt/custom-tool/config.conf > /opt/custom-tool/output.log 2>&1 &

For me...I used telegraf with a CloudWatch output plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants