Prometheus Overview and Setup

Overview

Prometheus is an opensource monitoring solution that gathers time series based numerical data. It is a project which was started by Google’s ex-employees at SoundCloud. 

To monitor your services and infra with Prometheus your service needs to expose an endpoint in the form of port or URL. For example:- {{localhost:9090}}. The endpoint is an HTTP interface that exposes the metrics.

For some platforms such as Kubernetes and skyDNS Prometheus act as directly instrumented software that means you don’t have to install any kind of exporters to monitor these platforms. It can directly monitor by Prometheus.

One of the best thing about Prometheus is that it uses a Time Series Database(TSDB) because of that you can use mathematical operations, queries to analyze them. Prometheus uses SQLite as a database but it keeps the monitoring data in volumes.

Pre-requisites

  • A CentOS 7 or Ubuntu VM
  • A non-root sudo user, preferably one named prometheus

Installing Prometheus Server

First, create a new directory to store all the files you download in this tutorial and move to it.

mkdir /opt/prometheus-setup
cd /opt/prometheus-setup
Create a user named “prometheus”

useradd prometheus

Use wget to download the latest build of the Prometheus server and time-series database from GitHub.


wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
The Prometheus monitoring system consists of several components, each of which needs to be installed separately.

Use tar to extract prometheus-2.0.0.linux-amd64.tar.gz:

tar -xvzf ~/opt/prometheus-setup/prometheus-2.0.0.linux-amd64.tar.gz .
 Place your executable file somewhere in your PATH variable, or add them into a path for easy access.

mv prometheus-2.0.0.linux-amd64  prometheus
sudo mv  prometheus/prometheus  /usr/bin/
sudo chown prometheus:prometheus /usr/bin/prometheus
sudo chown -R prometheus:prometheus /opt/prometheus-setup/
mkdir /etc/prometheus
mv prometheus/prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/
prometheus --version
  

You should see the following message on your screen:

  prometheus,       version 2.0.0 (branch: HEAD, revision: 0a74f98628a0463dddc90528220c94de5032d1a0)
  build user:       root@615b82cb36b6
  build date:       20171108-07:11:59
  go version:       go1.9.2
Create a service for Prometheus 

sudo vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus

[Service]
User=prometheus
ExecStart=/usr/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /opt/prometheus-setup/

[Install]
WantedBy=multi-user.target
systemctl daemon-reload

systemctl start prometheus

systemctl enable prometheus

Installing Node Exporter


Prometheus was developed for the purpose of monitoring web services. In order to monitor the metrics of your server, you should install a tool called Node Exporter. Node Exporter, as its name suggests, exports lots of metrics (such as disk I/O statistics, CPU load, memory usage, network statistics, and more) in a format Prometheus understands. Enter the Downloads directory and use wget to download the latest build of Node Exporter which is available on GitHub.

Node exporter is a binary which is written in go which monitors the resources such as cpu, ram and filesystem. 

wget https://github.com/prometheus/node_exporter/releases/download/v0.15.1/node_exporter-0.15.1.linux-amd64.tar.gz

You can now use the tar command to extract : node_exporter-0.15.1.linux-amd64.tar.gz

tar -xvzf node_exporter-0.15.1.linux-amd64.tar.gz .

mv node_exporter-0.15.1.linux-amd64 node-exporter

Perform this action:-

mv node-exporter/node_exporter /usr/bin/

Running Node Exporter as a Service

Create a user named “prometheus” on the machine on which you are going to create node exporter service.

useradd prometheus

To make it easy to start and stop the Node Exporter, let us now convert it into a service. Use vi or any other text editor to create a unit configuration file called node_exporter.service.


sudo vi /etc/systemd/system/node_exporter.service
This file should contain the path of the node_exporter executable, and also specify which user should run the executable. Accordingly, add the following code:

[Unit]
Description=Node Exporter

[Service]
User=prometheus
ExecStart=/usr/bin/node_exporter

[Install]
WantedBy=default.target

Save the file and exit the text editor. Reload systemd so that it reads the configuration file you just created.


sudo systemctl daemon-reload
At this point, Node Exporter is available as a service which can be managed using the systemctl command. Enable it so that it starts automatically at boot time.

sudo systemctl enable node_exporter.service
You can now either reboot your server or use the following command to start the service manually:
sudo systemctl start node_exporter.service
Once it starts, use a browser to view Node Exporter’s web interface, which is available at http://your_server_ip:9100/metrics. You should see a page with a lot of text:

Starting Prometheus Server with a new node

Before you start Prometheus, you must first edit a configuration file for it called prometheus.yml.

vim /etc/prometheus/prometheus.yml
Copy the following code into the file.

# my global configuration which means it will applicable for all jobs in file
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. scrape_interval should be provided for scraping data from exporters 
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. Evaluation interval checks at particular time is there any update on alerting rules or not.

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. Here we will define our rules file path 
#rule_files:
#  - "node_rules.yml"
#  - "db_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape: In the scrape config we can define our job definitions
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'node-exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'. 
    # target are the machine on which exporter are running and exposing data at particular port.
    static_configs:
      - targets: ['localhost:9100']
After adding configuration in prometheus.yml. We should restart the service by

systemctl restart prometheus
This creates a scrape_configs section and defines a job called a node. It includes the URL of your Node Exporter’s web interface in its array of targets. The scrape_interval is set to 15 seconds so that Prometheus scrapes the metrics once every fifteen seconds. You could name your job anything you want, but calling it “node” allows you to use the default console templates of Node Exporter.
Use a browser to visit Prometheus’s homepage available at http://your_server_ip:9090. You’ll see the following homepage. Visit http://your_server_ip:9090/consoles/node.html to access the Node Console and click on your server, localhost:9100, to view its metrics.

System Monitoring

One of the main task of a system administrator is system monitoring, system monitoring usually involves monitoring the ram & disk space usage of the system …. In this blog I’ll be talking about my experience as a system admin & how I do it.

Usually system monitoring is divided into 2 parts Continuous system monitoring and troubleshooting system issues when system crosses a threshold value & you have to figure out the issue & try to resolve it.

In continuous system monitoring a system is put under continuous monitoring i.e the system ram usage is within defined limit or not, the disk space occupancy should not cross a predefined threshold …. To achieve continuous monitoring you can use couple of tools available in market such as nagios, omd we are primarily using these tools their would be other tools available also for this purpose.

Continuous system monitoring serves one purpose where they notify about any deviation from the expected state of the system, the next step is to troubleshoot this issue & resolve it accordingly. As a first step I usually execute top command, top is a very powerful command apart from just viewing the processes activity in real you can do a lot of things i.e

  • If you want to add/remove fields : press f & then you can choose the fields to add/remove
  • If you want to change ordering of  fields : press O & then you can move fields
  • If you want to change the sort order : press F or O
there are lot of other options available as well, if you want to explore them pressing h will provide you a list of all the options.

You can also read about htop, htop is an advanced form of top where you can view some graphs as well though I’ven’t used htop so much but I’m planning to 🙂

One thing to note sometimes you are not able ot run top command due to high resource utilization, in that case you have to use cat /proc/loadavg to view the load on the system & cat /proc/meminfo to view current memory state of the system.

One of the useful command if top doesn’t work
ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -5
This command will give the top 5 processes by memory usage.

Also there are couple of other commands that you can use
free : To view the memory usage of system
df : To view the file system information
du : To view the disk usage

One tip : To increase the memory of system you can create a swap memory & it is always recommended to create a swap on a partition only. Another best practice for swap area is if your system RAM is below 8 gb your swap area should be double of your ram otherwise it should be half of your RAM size