Nginx monitoring using Telegraf/Prometheus/Grafana

Nginx is one of the most popular and widely used web servers mostly because of its speed and reliability. Nevertheless, it is paramount to keep track of the performance and availability that would help you to proactively prepare yourself for the worst scenarios like sudden/unexpected hikes in traffic. It will also keep you updated about the current state and health of your application.

This article will guide you on how to get Nginx Web Server metrics and visualize them. The main goal is a quick deployment and configuration using well-known open-source projects like Grafana, Prometheus, and Telegraf.

Prometheus: An open-source, time-series database for event monitoring and alerting managed by the Cloud Native Computing Foundation (CNCF).

Grafana: An Open-source project for analytics software and visualization of the metrics from any database.

Telegraf: An Open-source project, plugin-driven agent to collect and send metrics.

What is NGINX?

NGINX is open-source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.

NGINX can also be used as a mail proxy and a generic TCP proxy. However, this article does not directly address NGINX monitoring for these use cases.

NGINX Key metrics

You can catch two categories of issues by monitoring NGINX: (a) resource issues within NGINX itself and (b) problems that develop elsewhere in your web infrastructure.

Monitoring will include requests per second for most NGINX users in some metrics, which provides a high-level view of combined end-user activity:

Server error rate indicates how often your servers are failing to process requests that seem reasonable and Request processing time, which describes how long your servers are taking to process client requests.

We generally monitor these three key categories of metrics:

  1. Basic activity metrics:

Whatever is your NGINX use case, you will no doubt want to monitor how many clients requests your servers receive and how those requests are being processed.

NGINX can report basic activity metrics exactly like open source NGINX, but it also provides a secondary module that reports metrics slightly differently.

Nginx Status Module Metrics:

Now let’s take a look at the metrics available via nginx_http_stub_status_module:

Active connections: The current number of active (accepted) connections from clients, including all connections, with the status Idle/Waiting, Reading, and Writing.

Accepts: The total number of accepted connections from clients since the Nginx master process started. Note that reloading configurations or restarting worker processes will not reset this metric. If you terminate and restart the master process, you will reset the metric.

Handled: The total number of handled connections from clients since the Nginx master process started. It will be lower than accepted only in cases where a connection is dropped before it is handled.

Requests: The total number of client requests since the Nginx master process started. A request is an application-level (HTTP, etc.) event. It occurs when a client requests a resource via the application protocol. A single connection can (and often does) make many requests. So most of the time, there are more requests than accepted/handled connections.

Reading: The current number of (accepted) connections from clients where Nginx is reading the request. Measured at the time the status module was queried.

Writing: The current number of connections from clients where Nginx is writing a response back to the client.

Waiting: The current number of connections from clients that are in the Idle/Waiting state.

  1. Error metrics

NGINX error metrics tell you how often your servers are returning errors instead of producing useful work. Client errors are represented by 200 status codes, server errors with 400 status codes.

  1. Performance metrics

Request time: The time taken to process each request in seconds.

How to collect NGINX metrics

How you go about capturing metrics depends on what version of NGINX you’re using, as well as what metrics you want to access. NGINX has status modules that report metrics, and NGINX can also be configured to report certain metrics in its logs:

Metrics:

  • Accepts/Accepted
  • Active
  • Handled
  • Requests
  • Reading
  • Writing
  • Waiting
  • 200 and 400  status code
  • Request time

Metrics collection: NGINX metrics

Step 1: NGINX exposes several basic metrics about server activity on a simple status page, provided you have the HTTP stub status module enabled. To check if the module is already enabled, run:

    nginx -V 2>&1 | grep -o with-http_stub_status_module

The status module is enabled if you see with-http_stub_status_module as output in the terminal.

Step 2: After verifying the module is enabled or enabling it yourself, you will also need to modify your NGINX configuration to set up a locally accessible URL (e.g., /nginx_status) for the status page:

Configure Nginx Stats in Virtualhost 

vim /etc/nginx/conf.d/stub_status_nginx.conf
server {
	listen 81 default_server;
	listen [::]:81 default_server;

	root /var/www/html;
	index index.html index.htm index.nginx-debian.html;

	server_name _;

	location / {
		try_files $uri $uri/ =404;
	}

	location /nginx_status {
        	stub_status;
        	allow 127.0.0.1;
        	deny all;
        	}
}

Step 3: Verify the Nginx Configuration

    sudo nginx -t
    sudo systemctl reload nginx

Step 4: Now you can view the status page to see your metrics:

curl http://127.0.0.1/nginx_status
Output:
Active connections: 2 
server accepts handled requests
3 3 44 
Reading: 0 Writing: 1 Waiting: 1 
sudo systemctl restart nginx

Metrics collection: NGINX logs

Change the permission of these two files to bring the access log and error logs 

cd /var/log/nginx/
sudo chmod 755 access.log error.log

How to monitor NGINX with Telegraf

We use Telegraf to fetch Nginx metrics and logs and send them to Prometheus for monitoring and analysis.

To collect metrics from NGINX, you first need to ensure that NGINX has an enabled status module and a URL for reporting its status metrics.

Integrating Telegraf and NGINX

Telegraf: Telegraf is an agent written in go for collecting metrics and logs from local or remote sources.

Firstly install the Telegraf on the Nginx server and configure it.

Link to download Telegraf

Configure the Telegraf agent

sudo vim /etc/telegraf/telegraf.conf
#nginx-metrics and logs
[[inputs.nginx]]
   	 urls = ["http://localhost/nginx_status"]
   	 response_timeout = "5s"
[[inputs.tail]]
  	 name_override = "nginxlog"
  	 files = ["/var/log/nginx/access.log"]
   	 from_beginning = true
  	 pipe = false
  	 data_format = "grok"
  	 grok_patterns = ["%{COMBINED_LOG_FORMAT}"]
sudo systemctl restart telegraf

Download Prometheus and configuration with Telegraf:

Link to download Prometheus

Telegraf configuration with Prometheus:

You need to open the Telegraf config file using a vim text editor and go to line number 105 and comment that this line disables the influxdb like this.

After this you need to add the promethe configuration with port number 9125 like this.

sudo vim /etc/telegraf/telegraf.conf
Output plugin
[[outputs.prometheus_client]]
    listen = "0.0.0.0:9125"

Save and exit from the vim text editor and restart the telegraph service by using the following command.

sudo systemctl restart telegraf

Now you need to verify the Telegraf service with logs. Use the following command.

tail -f /var/log/telegraf/telegraf.log 

Adding Telegraf in Prometheus:

To configure Telegraf in Prometheus, You need to add the following config in Prometheus config file /etc/prometheus/prometheus.yml and restart the Prometheus service.

sudo vim /etc/prometheus/prometheus.yml

and add the Telegraf config like this.

 - job_name: Telegraf
    # If telegraf is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9125'] 

Save and exit from Prometheus main configuration file and restart the Prometheus service to get new changes.

sudo systemctl restart prometheus

Installation and configuration of Grafana:

Link to download Grafana

Link to download Nginx dashboard on Grafana:

Conclusion

This blog has told you about some of the most useful metrics that you can monitor on your NGINX server.

If you follow the above steps, you will be able to monitor Nginx using Prometheus and Grafana. I hope we have explained each and every step clearly, if you have any queries and doubts or suggestions feel free to comment below, we would be happy to hear from you.

Blog Pundit: Bhupender Rawat and Adeel Ahmad

Opstree is an End to End DevOps solution provider

Connect Us

3 thoughts on “Nginx monitoring using Telegraf/Prometheus/Grafana”

  1. Excellent article! Much appreciated…

    Just one suggestion from my end, changing permissions of Nginx log files isn’t the right thing todo as by default log files are rotated daily leading to empty dashboard panels next days..

    I think the right place is to change at `/etc/logrotate.d/nginx` to somewhat like `create 0655 www-data adm`. Default was `create 0640 www-data adm`. Also, this needs telegraf systemd service to be restarted as it wasn’t picking up the log files post permission changes.

    Other than this, it’s a great article.

  2. how would I go about reporting on the routes visited for example, if I had the following requests in the logs

    /2019/11/09/nginx-monitoring-using-telegraf-prometheus-grafana/
    /2020/11/09/nginx-monitoring-using-telegraf-prometheus-grafana/
    /2021/11/09/nginx-monitoring-using-telegraf-prometheus-grafana/
    /2022/11/09/nginx-monitoring-using-telegraf-prometheus-grafana/
    /support/nginx/

    and I wanted to report on visits to the first part of the url so for example /2019 or /support

    is that possible ?

    using influxdb and telegraf

    Thanks

  3. Thanks for the article!

    But how to remove duplicate events? For example, 404 errors in access.log are repeatedly sent to metrics and break statistics

Leave a Reply