Observability Metrics Tutorial
This tutorial shows you how to set up a complete monitoring stack for ZeroTier networks using Prometheus and Grafana. You can use this type of monitoring pipeline to measure the health and activity of your networks, whether you're using our managed environment or hosting your own controller.
Overview
ZeroTier Inc doesn't have access to your traffic. We don't currently supply a monitoring "dashboard" for your networks and nodes, but you can build your own powerful observability system!
This tutorial includes an example docker-compose configuration and setup scripts to run a ZeroTier network along with an observability stack based on Prometheus and Grafana.
Prerequisites
To run this locally, you'll need the following tools installed:
- Docker + docker-compose
- jq (JSON command-line processor)
We've tested this demo on MacOS and Linux. On Windows, you'll need to use WSL2 for a local Linux environment. (Note: pull requests to add PowerShell support for the setup script are welcome!)
For this example, we're running a standalone controller, which allows us to avoid capturing and injecting a valid ZeroTier Central API token into the demo environment. However, the same observability tools and configuration methods will work for managed networks as well.
Setup
1. Get the Tutorial Files
Clone or download the tutorial repository:
git clone https://github.com/zerotier/metrics-tutorial.git
cd metrics-tutorial
2. Start the Stack
Run the docker-compose setup and initialization script:
docker-compose up -d
./setup.sh
This will start:
- ZeroTier controller
- Prometheus metrics collection
- Grafana dashboard service
- Example network configuration
Grafana Configuration
1. Access Grafana
Open http://localhost:3000/ in your browser.
2. Initial Login
- Username:
admin - Password:
admin - Set a new password or click 'Skip'
3. Configure Prometheus Data Source
- Navigate to Connections → Data Sources → Add New Data Source
- Select Prometheus (first option)
- Set the Prometheus server URL:
http://metrics-tutorial-prometheus-1:9090 - Click Save and test
- Click Explore view
4. Create Your First Query
- Select
zt_packetin the Metric dropdown - Choose Last 15 minutes in the range picker (upper-right)
- Click Run query
The graph will populate in the bottom half of the screen showing ZeroTier packet metrics.
5. Monitor Prometheus Jobs
You can also check the status of Prometheus jobs directly at: http://localhost:9090
Metrics Endpoint
URL: http://localhost:9993/metrics
Authentication: Requires token from the metricstoken.secret file in the ZeroTier home directory. This file is separate from the authtoken.secret file used for the main service API.
Disk output: Metrics are also written to metrics.prom in the ZeroTier home directory, enabling collection via the Prometheus Node Exporter textfile collector.
Available Metrics
ZeroTierOne exports ten metrics organized into three categories: general packet metrics, network-level metrics, and peer-level metrics. All metrics use the zt_ prefix.
General Packet Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
zt_packet | Counter | packet_type, direction | Total packet counts by type and direction |
zt_packet_error | Counter | error_type, direction | Packet errors by type and direction |
zt_data | Counter | protocol, direction | Bytes transmitted/received by protocol |
Network-Level Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
zt_num_networks | Gauge | - | Number of networks currently joined |
zt_network_multicast_groups_subscribed | Gauge | network_id | Multicast group subscriptions per network |
zt_network_packets | Counter | network_id, direction | Packets sent/received per network |
Peer-Level Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
zt_peer_latency | Histogram | node_id | Peer-to-peer latency distribution (ms) |
zt_peer_path_count | Gauge | node_id, status | Number of paths to each peer by status |
zt_peer_packets | Counter | node_id, direction | Packets sent/received per peer |
zt_peer_packet_errors | Counter | node_id | Packet errors per peer |
Label Values
| Label | Values | Description |
|---|---|---|
| direction | tx, rx | Transmit or receive |
| node_id | 10-char hex | ZeroTier node identifier |
| network_id | 16-char hex | ZeroTier network identifier |
| status | varies | Path status (active/inactive) |
| packet_type | varies | ZeroTier protocol packet type |
| error_type | varies | Error classification |
Configuration
Enabling Metrics by Version
| Version | Metrics Enabled By Default | Configuration Required |
|---|---|---|
| 1.14.0 – 1.16.0 | Yes | None |
| 1.16.1+ | No | Local config toggle |
For version 1.16.1 and later, create or edit local.conf in the ZeroTier home directory:
Linux/macOS: /var/lib/zerotier-one/local.conf
Windows: C:\ProgramData\ZeroTier\One\local.conf
{
"settings": {
"allowMetrics": true
}
}
Restart the ZeroTier service after modifying this file.
Compile-Time Options
| Flag | Default | Effect |
|---|---|---|
ZT_ENABLE_METRICS | Enabled | Enables metrics compilation |
ZT_NO_PEER_METRICS | Not set | Disables per-peer metrics to reduce cardinality |
The ZT_NO_PEER_METRICS flag is useful for deployments with high peer counts where per-peer metric cardinality would create excessive load on the metrics collection system.
Implementation Files
The metrics implementation resides in:
| File | Purpose |
|---|---|
osdep/Metrics.hpp | Metric declarations and types |
osdep/Metrics.cpp | Metric initialization and serialization |
service/OneService.cpp | HTTP endpoint handler |
Example Prometheus Configuration
scrape_configs:
- job_name: 'zerotier'
static_configs:
- targets: ['localhost:9993']
bearer_token_file: '/var/lib/zerotier-one/metricstoken.secret'
Example Queries
Total packets by direction:
sum by (direction) (rate(zt_packet[5m]))
Average peer latency:
histogram_quantile(0.95, rate(zt_peer_latency_bucket[5m]))
Network traffic by network:
sum by (network_id, direction) (rate(zt_network_packets[5m]))
Creating Custom Dashboards
Now that you have the basic pipeline running, you can create custom dashboards reporting ZeroTier metrics. Consider visualizing:
- Network traffic patterns over time
- Peer connectivity status
- Error rates and network health
- Geographic distribution of connections
- Bandwidth utilization
Cleanup
When you're done with the tutorial, clean up the Docker containers:
./cleanup.sh
Alternative Monitoring Tools
Beyond Prometheus and Grafana, you can use your preferred monitoring tools over your ZeroTier networks:
- Prometheus Blackbox exporter - Network probing and monitoring
- SmokePing - Network latency monitoring
- UptimeKuma - Simple uptime monitoring
- Custom solutions using Traffic Observation and Interception
Additional Resources
- ZeroTier Metrics Tutorial Repository - Complete source code and configuration files
- ZeroTier Controller API - Programmatic network management
Community
If you've found a cool use for this set of observability hooks, we'd love to hear about it! Please reach out via:
- Reddit (r/zerotier)
- Email: [email protected]
- GitHub Issues - For technical feedback and improvements