Note: This project is NOT officially supported by AppDynamics
In a typical microservices architecture where services are designed to be transient, AppDynamics customers may notice that although a service instance or container has been destroyed, the node that represents the service in question is still visible in the AppDynamics Controller - often with a critical health status.
The process of removing torn down nodes from AppDynamics is called 'marking nodes as historical'. By default, the Controller considers a node historical after about 20 days of inactivity and deletes the node after 30 days. When a node is marked as historical, the Controller suspends certain types of processing activities for the node, such as rule evaluation.
AppDynamics has an inbuilt solution to instantaneously mark nodes as historical, this is done by adding ‑Dappdynamics.jvm.shutdown.mark.node.as.historical=true
to the JVM startup argument.
The above solution does not work sometimes for the following reasons::
- This will ONLY work if the instrumented application shuts down gracefully, this is seldom the case for containerized applications.
- The .Net agent has not implemented a similar solution
- The minimum time the Controller can be configured to mark nodes as historical is 1 hour, this is too long in most cases as it results in false-positive alerts. The setting is
node.retention.period
. - No granularity to selectively apply the setting (in 3 above) to a set of applications or tiers.
We created this project to resolve the aforementioned limitations. The script runs at a pre-defined scheduled interval and mark nodes that have not reported to the controller over a pre-defined 'node availability threshold' period as historical nodes. The process runs only on a set of predefined application.
Historical nodes are not visible in the controller, as a result, it is important to keep an audit trail of all nodes that were marked as historical by this script. The Audit log is located in the logs
folder of the project.
Furthermore, whilst AppDynamics will not display a historical node, the controller will continue to retain it. If the agent starts reporting again within the time set in node.retention.period
it will reappear in the UI and the counter will reset. The default value for node.retention.period
is 500 hours, the minimum is 1 hour. In addition, if a node hasn't reported after the time set in node.permanent.deletion.period
, it will be permanently deleted from the Controller. The default is 720 hours and the minimum value is 6 hours.
The script was written and tested in Powershell Core 6.0 - which means it can run across Linux, Windows, macOS, containers, and can be bundled into a Lambda function.
- How to install PowerShell Core on Linux Documentation
- How to install PowerShell Core on macOS Documentation
- How to upgrade Windows PowerShell to 5.1 Documentation
- How to install PowerShell Core on Windows (if you decide to use Powershell Core on Windows instead of upgrading to Windows PowerShell 5.1) Documentation
It can also be bundled into a Docker container. Please refer to the Docker container section below
- Modify the config.json file properties as described in the table below or the environment variable if you intend to run it from Docker or Kubernetes
Config Property Name | Environment Variable | Description |
---|---|---|
NodeAvailabilityThresholdInMinutes | APPDYNAMICS_NODE_AVAILABILITY_THRESHOLD | This threshold is used to determine nodes that are due to be marked as historical on the basis of how long a node has lost contact with the Controller |
ExecutionFrequencyInMinutes | APPDYNAMICS_EXECUTION_FREQUENCY | This config property controls how long the script sleeps after each execution. Use this to control the execution frequency |
ControllerURL | APPDYNAMICS_CONTROLLER_URL | You AppDynamics controller URL - including http/s bit |
OAuthToken | APPDYNAMICS_OAUTH_TOKEN | Create an API Client that has an admin privilege on the target application(s). READ MORE |
ApplicationList | APPDYNAMICS_TARGET_APPLICATIONS | Define the list of target applications, comma separated: app1,app2,app3 |
ExecuteOnceORContinuous | APPDYNAMICS_EXECUTE_ONCE_OR_CONTINUOUS | Defaults to once . Set this to continuous if you need this script to run continuously in the background. When to set to once , it the overrides ExecutionFrequencyInMinutes setting |
-
Run the
NodeReaper.ps1
script. -
Check the
logs/Aduit.log
file. The audit log should look like this:
2020/01/07 00:43:17 INFO Marked con-06HT8BPDCM6(845513) in appd-fix-sleeving-uat application as a historical node.
2020/01/07 00:43:17 INFO Marked WIN-5OI56V7AVUB(845524) in appd-fix-sleeving-uat application as a historical node.
2020/01/07 00:52:46 INFO Marked LIN-06HT8BPDCM6(845512) in appd-fix-sleeving-uat application as a historical node.
2020/01/07 00:52:46 INFO Marked JET-5OI56V7AVUB(845524) in appd-fix-sleeving-uat application as a historical node.
2020/01/07 00:56:37 INFO Marked io2-IL1R5UC26B0(842997) in appd-fix-sleeving-dev application as a historical node.
2020/01/07 00:56:38 INFO Marked NAO-JMJ3IPQ4E1F(843018) in appd-fix-sleeving-dev application as a historical node.
2020/01/07 00:56:38 INFO Marked RO1-2KSN12PNIC2(843099) in appd-fix-sleeving-dev application as a historical node.
2020/01/07 01:08:06 INFO Marked MYN-JMJ3IPQ4E1F(843011) in appd-sion-fix-sleeving-dev application as a historical node.
2020/01/07 01:08:06 INFO Marked ZZ1-2KSN12PNIC2(843097) in appd-fix-sleeving-dev application as a historical node.
You may pull the official image docker pull appdynamicscx/mark-nodes-historical
or build yours using the build-docker-image.ps1
script
- To run this script in a docker container, enter the details in the
config.json
file as described above and rundocker-compose up
.
Alternatively, use environment variables as shown in the env.list
file. For example:
docker run -d --env-file env.list appdynamicscx/mark-nodes-historical
The first time you run this command, you will see a lot of console output as the Docker image is built, followed by output similar to this:
mark-nodes-historical $ docker-compose up --build
Building mark-nodes-historical
Step 1/7 : FROM mcr.microsoft.com/powershell
---> 10749ad42dfb
Step 2/7 : RUN apt-get update && apt-get upgrade -y && apt-get clean
---> Using cache
---> 5a69f02c768b
Step 3/7 : ENV SCRIPT_HOME /opt/appdynamics/mark-node-historical
---> Using cache
---> 8aeee3ab41a3
Step 4/7 : RUN mkdir -p ${SCRIPT_HOME}
---> Using cache
---> cab2c82e4082
Step 5/7 : COPY * ${SCRIPT_HOME}/
---> bc9d437ae1e6
Step 6/7 : WORKDIR ${SCRIPT_HOME}
---> Running in 422f52bc2df9
Removing intermediate container 422f52bc2df9
---> d03b2a7b7353
Step 7/7 : CMD ls -ltr & pwsh ./NodeReaper.ps1
---> Running in e0466b469f3f
Removing intermediate container e0466b469f3f
---> da31e68f5ede
Successfully built da31e68f5ede
Successfully tagged appdynamics/node-reaper:latest
Recreating node-reaper ... done
Attaching to node-reaper
-
To Stop the container, run:
docker-compose stop
-
To Rebuild the container,
run docker-compose up --build
Please refer to the kubernetes
folder for the manifest examples.
- Add 'All application' flag.
1 - https://docs.appdynamics.com/display/latest/Historical+and+Disconnected+Nodes