The built-in troubleshooting functionality in the iotedge
CLI, "iotedge check", performs configuration and connectivity checks for commonly encountered issues.
iotedge help check
displays detailed usage information.
The troubleshooting tool is focused on
-
Surfacing potential problems that prevent the edge device from connecting to the cloud.
-
Surfacing potential configuration deviations from recommended production best-practices.
By design, it does not check for errors in the edge workload deployment. For example, it does not check that the device can access any private container registries, errors in module create options, etc. Deployment validation is best performed in the facility where it is authored.
Checks that would involve parsing IoT Edge module logs or metrics are also out of scope.
Results from checks are characterized as either errors or warnings.
Errors have a high likelihood of preventing the IoT Edge runtime or the modules from connecting to the cloud.
Warnings might not affect immediate connectivity but are potential deviations from best practices, and may affect long term stability, offline operation or supportability of the edge device.
If there are warnings but no errors, the tool will exit successfully with code 0. Use --warnings-as-errors
to treat warnings as errors.
This check validates that IoT Edge's config.yaml
is valid and free of any syntax (e.g. whitespace) errors.
If the check fails with an error, the line number and position reported in the error may not be the exact location of the problem.
If the config.yaml
uses manual provisioning with a connection string, this check validates that the connection string is well-formed and contains the required Hostname
, DeviceId
and SharedAccessKey
parameters.
This check validates that a container engine is installed and running, and is accessible at the endpoint specified in the moby_runtime.uri
field.
If the device is running Windows and set to use Windows containers, this check validates that the Windows version is supported.
While the Windows installer script prevents installing on an unsupported OS version, it is possible to install on a supported OS version that then gets updated to a newer version that isn't supported.
This check validates that the value of the hostname
field in the config.yaml
is the same as the device's actual hostname, or that it's a fully-qualified domain name with the device hostname as the first component.
It also validates that the value complies with RFC 1035, since some modules and downstream devices have difficulty connecting to a domain name that doesn't comply with that RFC.
This check validates that the value of the connect.management_uri
field in the config.yaml
is valid, and that the IoT Edge daemon's management endpoint can be queried through it.
This check validates that the version of the IoT Edge daemon is the same as the value specified in https://aka.ms/latest-iotedge-stable
You can override the expected version using the --expected-iotedged-version
switch, in which case the tool will not query that URL.
Note that the tool does not validate the versions of the Edge Agent and Edge Hub modules.
This check validates that the device's local time is close to the time reported by an NTP server. pool.ntp.org:123
is used by default, and can be overridden with the --ntp-server
parameter.
This check validates that a container sees a local time that is close to the host device's local time.
This check validates that a DNS server has been specified in the container engine's daemon.json
file. DNS best practices are documented at https://aka.ms/iotedge-prod-checklist-dns
It is possible to specify a DNS server in the Edge device's deployment instead of in the container engine's daemon.json
, and the tool does not detect this. If you have done so, you should ignore this warning.
This check validates that if IPv6 container network configuration is enabled in config.yaml
(by setting the value of moby_runtime.network.ipv6
field to true
), the container engine's daemon.json
file also has IPv6 support enabled. To enable IPv6 support for the container runtime, please refer to this guide https://aka.ms/iotedge-docker-ipv6.
IPv6 container runtime network configuration is currently not supported for the Windows operating system and this check fails if IPv6 support is enabled in the container enginer's daemon.json
file.
This check validates that device CA and trusted CA certificates have been defined in the certificates
section of the config.yaml
. If these certificates are not specified, the device operates in quickstart mode and is not supported in production. Certificate management best practices are documented at https://aka.ms/iotedge-prod-checklist-certs
This check validates that the device CA certificate is valid for at least seven more days.
If the certificate has already expired, it is reported as an error. If the certificate will expire in less than seven days, it is reported as a warning.
This check validates that the container engine is the Moby container engine. Any other container engine, such as Docker CE, is not supported in production. See https://aka.ms/iotedge-prod-checklist-moby for details.
This check validates that the container engine is configured to rotate module logs, by specifying log options and limits in the container engine's daemon.json
. Log management best practices are documented at https://aka.ms/iotedge-prod-checklist-logs
By setting these properties in daemon.json
, the settings are automatically propagated to all module containers. It is also possible to specify this in the Edge device's deployment instead, and the tool does not detect this. If you have done so, you should ignore this warning.
production readiness: Edge Agent's / Edge Hub's storage directory is persisted on the host filesystem
The tool checks the Edge Agent and Edge Hub containers to validate that their respective storage directories are mounted from the host. If this is not done, it is possible that some state is lost if the containers are deleted or updated, such as Edge Agent's cache of module state or Edge Hub's unsent messages.
These checks require the Edge Agent and Edge Hub containers to have been created.
If the device is set up to use DPS provisioning, the tool connects to the DPS endpoint and completes a TLS handshake with it.
The tool connects to the IoT Hub's AMQP port (5671), HTTPS port (443) and MQTT port (8883), and completes a TLS handshake for each. This verifies that the IoT Hub is reachable from the device, and that the device is configured to accept its TLS certificate.
When using manual provisioning, the FQDN of the IoT Hub is taken from the connection string. For DPS provisioning, you must specify the FQDN of the IoT Hub using the --iothub-hostname
parameter.
The IoT Edge daemon only uses the HTTPS protocol to connect to the IoT Hub, but connectivity from the host for the AMQP and MQTT protocols can be useful when investigating issues.
The tool launches a diagnostics container on the default (bridge
) container network. This container connects to the IoT Hub's AMQP port (5671), HTTPS port (443) and MQTT port (8883). This verifies that the IoT Hub is reachable from containers on the default container network.
When using manual provisioning, the FQDN of the IoT Hub is taken from the connection string. For DPS provisioning, you must specify the FQDN of the IoT Hub using the --iothub-hostname
parameter.
Note that these checks do not perform a TLS handshake with the IoT Hub. They only test that a TCP connection can be established to the respective port.
Note that these checks do not run for Windows containers since they are redundant with the following checks.
The tool launches a diagnostics container on the IoT Edge container network specified by the moby_runtime.network
field (defaults to azure-iot-edge
on Linux and nat
on Windows). This container connects to the IoT Hub's AMQP port (5671), HTTPS port (443) and MQTT port (8883). This verifies that the IoT Hub is reachable from containers on the IoT Edge container network.
When using manual provisioning, the FQDN of the IoT Hub is taken from the connection string. For DPS provisioning, you must specify the FQDN of the IoT Hub using the --iothub-hostname
parameter.
Note that these checks do not perform a TLS handshake with the IoT Hub. They only test that a TCP connection can be established to the respective port.
Edge Hub can bind to ports on the host so that it can be used as a gateway for leaf devices. For example, the default createOptions
for Edge Hub set it to bind to ports 443, 5671 and 8883. If any of these ports are already in use on the host device by other services, the Edge Hub container will be unable to start up. The tool validates that Edge Hub is already running (in which case it has successfully bound to any ports it wanted to bind to), or that the ports are available for it to bind to when it does start.
On a new device, the IoT Edge daemon doesn't try to start the Edge Hub container until a deployment is applied to that device. Until then, this check will return an error because the tool can only detect which ports to test for if the IoT Edge daemon has tried to start the Edge Hub container at least once.