The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of routing them to the correct receiver through an integration such as email. See, Alertmanager in Prometheus documentation.
Perform the following steps on an OES server where Prometheus is installed:
Download and unzip the Alertmanager files from the Prometheus website.
tar xvfz alertmanager-0.25.0.linux-amd64.tar.gz
Copy alert_manager.sh script to unzipped directory and run the script. See, alert_manager.sh.
sh ./alert_manager.sh
After the installation of the Alertmanager on a target, update the static Prometheus server configuration and restart the Prometheus service.
On the Prometheus (Monitoring) server edit the Prometheus configuration file.
/etc/prometheus/exporter-config.yml.
Update the hostname or IP address of the Alertmanager in the targets section (highlighted in the example below).
global:
scrape_interval: 15s
scrape_configs:
- job_name: Docker Servers
static_configs:
- targets: ['localhost:8080']
- job_name: OES Servers
static_configs:
- targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com']
- job_name: 'alert-manager'
static_configs:
- targets: ['localhost:9093']Restart the service after the configuration file is updated.
systemctl daemon-reload systemctl restart prometheus.service
Perform the following steps to configure Alertmanager notification system:
Enter the SMTP server information in the /etc/alertmanager/alertmanager.yml file.
Change the example information according to your requirements.
route:
group_by: [Alertname]
group_interval: 30s
repeat_interval: 30s
# Send all notifications to me.
receiver: email-me
receivers:
- name: email-me
email_configs:
- send_resolved: true
to: admin@email.com
from: demo@email.com
smarthost: smtp.email.com:587
auth_username: demo@email.com
auth_identity: demo@email.com
auth_password: <enter_the_password>Create a rule file named prometheus_rules.yml in the /etc/prometheus directory.The example that follows will alert you if any node is unavailable for more than a minute or if there is less than 10% of its disk space left.
groups:
- name: custom_rules
rules:
- record: node_memory_MemFree_percent
expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
- record: node_filesystem_free_percent
expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance [{{ $labels.instance }}] down"
description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
- alert: DiskSpaceFree10Percent
expr: node_filesystem_free_percent <= 10
labels:
severity: warning
annotations:
summary: "Instance [{{ $labels.instance }}] has 10% or less Free disk space"
description: "[{{ $labels.instance }}] has only {{ $value }}% or less free."
For more information about Alertmanager, see Configuration and Alerting Rules in Prometheus documentation site.
Edit the configuration file (/etc/prometheus/exporter-config.yml) to include the rule global file and alerting configuration for the notification system (highlighted in the example below).
global:
scrape_interval: 15s
scrape_configs:
- job_name: Docker Servers
static_configs:
- targets: ['localhost:8080']
- job_name: OES Servers
static_configs:
- targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com']
- job_name: 'alert-manager'
static_configs:
- targets: ['localhost:9093']
rule_files:
- "prometheus_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
# alertmanager:9093
- localhost:9093Restart the service after the configuration file is updated.
systemctl daemon-reload systemctl restart prometheus.service systemctl restart alertmanager.service