MySQL Monitoring

In recent time, I invested a good amount of time in learning and working on monitoring esp. Database Monitoring. So I found this medium the best way to share my journey, findings and obviously spectacular dashboards.

This blog will help you understand why we need MySQL monitoring and how we can do it.

Let’s start with the need to implement MySQL monitoring. There are multiple areas which we can monitor, here I am enlightening some important ones.

1. Resource Utilization
First of all, you have no idea what’s going on with MySQL, you can not know if it’s in a haywire state if there is no monitoring.
An ample number of queries run through it. Some of them are lightweight and some of them are very heavy which makes CPU over-utilized or overload. In that case, if we talk about production, a number of requests can be flushed out making it a business loss.

2. Database Connections
Sometimes the number of connections run out and no further connections are left for application to communicate with DB. In the absence of monitoring, it’s really hard to figure out the root cause.

3. Replication Lag
When we use MySQL as a master-slave cluster, real-time replication of data from the master to slave is a key factor to monitor. The lag between master and slave should be zero.

In my scenario, the slave is being used for DB replication from the master and also serving read queries to avoid overburden on the master. Now if replication lag is high and at the same time if any read query is triggered for the slave, what will happen? The same data which is on the master will not get replicated on the slave because of replication lag!
That read query will show an unexpected or erroneous result.

4. Query Analytics
Monitoring DB also helps in identify what queries are taking a long time. It helps in identify and optimize slow queries. At the end its all about being fast.

Ok, so how to monitor MySQL. There are multiple enterprise solutions available to monitor Database with a single click solution, but I didn’t have the luck to go for paid solutions. So, I have started exploring open source solutions which will cover all my requirements. Finally, I got one.

Its Percona Monitoring And management (PMM)Tool

PMM is an open-source platform to monitor and manage MySQL Database, that we can run in own environment. It provides a time-series database which ensures reliable and real-time data.

Installing PMM Server

curl -fsSL https://raw.githubusercontent.com/percona/pmm/master/get-pmm.sh  -o get-pmm.sh

Change permission to make it executable

chmod +x get-pmm.sh

Now run pmm script to install it.

./ get-pmm.sh

This will run a docker container. Once docker container is up and running we will install PMM client and will bind the port

Installing PMM Client

Add the below repo

wget https://repo.percona.com/apt/percona-release_0.1-6.$(lsb_release -sc)_all.deb

Install the package from added repository

dpkg -i percona-release_0.1-6.$(lsb_release -sc)_all.deb

Update your ubuntu follow below steps:

apt-get update

pmm-admin config --server <server_ip>:<port>-get update

pmm-admin add mysql

It’s not only MySQL you can monitor, in fact, but pmm also allows to integrate it with other databases as well like Amazon RDS, Postgres, and MongoDB.

There are many alternatives for MySQL monitoring in the market like Nagios, VividCortex Analyser, SolarWinds server and application monitor, LogicMonitor and Management tool, MySQL OpsPack etc. But exploring open-source tools has its pros and cons but the level of learning you get from it, that makes it worth using. So anyone out there reading this blog I would suggest to give it a try.

Happy monitoring!!

Image Source: https://www.kisspng.com/png-clip-art-brand-line-technology-text-messaging-6462820/preview.html

Stay Away Replication Lag !

Recently, I got a requirement to facilitate backup for the data and a way to analyze it without using the main database. MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

Panic Starts

Everything was running smoothly in the night I configured it. But the joy didn’t last for long as the traffic hits in the morning, slave starts getting behind the master with few seconds which increases with the activity on the application. At the peak time, it was playing in thousands of second.

What now, I had to dig deep into MySQL Replication

How it works
What can probably cause the lag
An approach that minimizes or eliminates it

How MySQL Replication Works

On the master

First of all, master writes replication events to a special log called binary log. This is usually a very lightweight activity because writes are buffered and they are sequential. The binary log file stores data that replication slave will be reading later.

On the replica

When you start replication, two threads are started on the slave:
1. IO thread
This process connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file called relay log.
Even though there’s only one thread reading the binary log from the master and one writing relay log on the slave, very rarely copying of replication events is a slower element of the replication. There could be a network delay, causing a steady delay of a few hundred milliseconds.
If you want to see where IO thread currently is, check the following in “show slave status \G”
Master_Log_File – last file copied from the master (most of the time it would be the same as last binary log written by a master)
Read_Master_Log_Pos – binary log from the master is copied over to the relay log on the slave up until this position.
And then you can compare it to the output of “show master status/G” from the master.

mysql> show master status\G;
*************************** 1. row ***************************
             File: db01-binary-log.000032
         Position: 1008761891
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

2. SQL thread
The second process – SQL thread – reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible.
Going back to “show slave status /G”, you can get the current status of SQL thread from the following variables:
Relay_Master_Log_File – binary log from the master, that SQL thread is “working on” (in reality it is working on relay log, so it’s just a convenient way to display information)
Exec_Master_Log_Pos – which position from the master binary log is being executed by SQL thread.

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: <master_ip>
                  Master_User: <replication_user>
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db01-binary-log.000032
          Read_Master_Log_Pos: 1008768810
               Relay_Log_File: relay-bin.000093
                Relay_Log_Pos: 1008769035
        Relay_Master_Log_File: db01-binary-log.000032
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
.
.
          Exec_Master_Log_Pos: 1008768810
              Relay_Log_Space: 1008769305
.
.
        Seconds_Behind_Master: 0
.
.
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
.
.
1 row in set (0.00 sec)

Why Replication Lag Occurred

Replication lag occurs when the slaves cannot keep up with the updates occurring on the master. Unapplied changes accumulate in the slave’s relay logs and the version of the database on the slaves becomes increasingly different from that of the master.

Caught The Culprit

Let me take you through my journey how I crossed the river.
First, I took the reference to multiple blogs and started gobbling my mind with possible reasons suggesting

Hardware Faults (getting RAID in degraded mode)
MySQL Config Updates
- setting sync_binlog=1
- enabling log_slave_updates
- setting innodb_flush_log_at_trx_commit=1
- updating slave_parallel_workers to a higher value
- changing slave_parallel_type to support more parallel workers
Restarting Replication

But unfortunately, or say, it was my benightedness towards Database Administration that I was still searching for that twig which can help me from drowning.
And finally, I found one, my DBA friend who suggested me to look for the Binary Log Format that I am using. Let’s see what it is

Binary Logging Formats

The server uses several logging formats to record information in the binary log. The exact format employed depends on the version of MySQL being used. There are three logging formats:
STATEMENT: With statement-based replication, every SQL statement that could modify data is logged on the master. Then those SQL statements are replayed on the slaves against the same dataset and in the same context. There is always less data that is to be transferred between the master and the slave. But, the data inconsistency issue between the master and the slave that creeps up due to the way this kind of replication works.
ROW: With row-based replication, every “row modification” is logged on the master and is then applied to the slave. With row-based replication, each and every change can be replicated and hence this is the safest form of replication. On a system that frequently UPDATE a large number of rows, it produces very large update logs and generates a lot of network traffic between the master and the slave.
MIXED: A third option is also available: mixed logging. With mixed logging, statement-based logging is used by default, but the logging mode switches automatically to row-based in certain cases.

Changing Binary Log Format

The Binary Log Format is updated on Master MySQL server and requires MySQL service restart to reflect. It can be done for Global, Runtime or Session.

set at runtime with –binlog-format=format
setting the global (with the SUPER privilege)
session value of the binlog_format server variable

mysql> SET GLOBAL binlog_format=MIXED;

mysql> SET SESSION binlog_format=ROW;

mysql> SET binlog_format=STATEMENT;

So, earlier I was using STATEMENT BinLog Format, which is default one. Since I switched to MIXED BinLog Format, I am very delighted to share the below stats.
Current status of Master Read and Slave Execute position difference and Slave Lag (in sec), both are ZERO.

Replication Lag (in Seconds) graph for a month, powered by Prometheus-Grafana.

Now, What’s next ??

PERCONA STANDALONE SERVER

As a DevOps activist I am exploring Percona XtraDB. In a series of blogs I will share my learnings. This blog intends to capture step by step details of installation of Percona XtraDB in Standalone mode

Introduction:

Percona Server is an enhanced drop-in replacement for Mysql. It offers breakthrough performance, scalability, features, and instrumentation.

Percona focus on providing a solution for the most demanding applications, empowering users to get the best performance and lowest downtime possible.

The Percona XtraDB Storage Engine:

Percona XtraDB is an enhanced version of the InnoDB storage engine, designed to better scale on modern hardware, and including a variety of other features useful in high performance environments. It is fully backwards compatible, and so can be used as a drop-in replacement for standard InnoDB.
Percona XtraDB includes all of InnoDB’s robust, reliable ACID-compliant design and advanced MVCC architecture, and builds on that solid foundation with more features, more tunability. more metrics, and more scalability.
It is designed to scale better on many cores, to use memory more efficiently, and to be more convenient and useful.

Installation on ubuntu:

STEP 1: Add Percona Software Repositories

$ apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

STEP 2: Add this to /etc/apt/sources.list:

deb http://repo.percona.com/apt precise main
deb-src http://repo.percona.com/apt precise main

STEP 3: Update the local cache

$ apt-get update

STEP 4: Install the server and client packages

$ apt-get install percona-server-server-5.6 percona-server-client-5.6

STEP 5: Start Percona Server

$ service mysql start

Let me know if you have any suggestions.

Understanding Percona XtraDB cluster

As a DevOps activist I am exploring Percona XtraDB. In a series of blogs I will share my learnings. This blog intends to capture theoretical knowledge of Percona XtraDB Cluster.

Prerequisites

You should have basic knowledge of mysql.
OS – Ubuntu

What is Percona?

Percona XtraDB cluster is an open source, free MySql high availability and scalability software.

It provides:

Synchronous Replication: Transaction either committed on all nodes or none.
Multi-Master Replication: You can write to any node
Parallel applying events on slave. Real “parallel replication”.
Automatic node provisioning.
Data consistency. No more unsynchronized slaves.

Introduction

The cluster consists of nodes. The cluster’s recommended configuration is to have 3 nodes, however 2 nodes can be used as well.
Every node is a regular Mysql / Percona server setup. You can convert your existing MySQL / Percona Server into Node and roll Cluster using it as a base or you can detach Node from Cluster and use it as a regular server.
Each node will contain full copy of data.

Benefits of this approach:

Whenever you execute a query, it is executed locally. All data is available locally, so no remote access is required.
No central management. You can loose any node at any time, and cluster will continue functioning.
It is a good solution for scaling read workload. You can put read queries to any of the nodes.

Drawbacks:

Overhead of joining new node. New node will copy all data from an existing node. If it is 100 GB, it will copy 100 GB.
Not an effective write scaling solution. All writes have to go on all nodes.
Duplication of data. If you have 3 nodes, there will be 3 duplicates.

Difference between Percona XtraDB Cluster and MySQL Replication

For this we will have to look into the well known CAP theorem for distributed systems. According to this theorem, characteristics of Distributed systems are:

C – Consistency (all your data is consistent on all nodes),

A – Availability (your system is AVAILABLE to handle requests in case of failure of one or several nodes),

P – Partitioning tolerance (in case of inter-node connection failure, each node is still available to handle requests).

CAP theorem says that any Distributed system can have any two out of these three.

MySQL replication has: Availability and Partitioning tolerance.
Percona XtraDB Cluster has: Consistency and Availability.

So, MySql replication does not guarantee Consistency of data, while Percona XtraDB cluster provides consistency while it looses partitioning tolerance.

Components

Percona XtraDb Cluster is based on:

Percona Server with XtraDB and includes Write Set Replication patches.

It uses:

Galera Library: A generic synchronous Multi-Master replication plugin for transactional applications.
Galera supports:
- Incremental State Transfer (IST), useful in WAN deployments.
- RSU, Rolling Schema Update. Schema change does not block operations against table.

Percona XtraDB cluster limitations

Currently replication work only with InnoDB storage engine.

That means writes to table of other types, including (mysql.*) tables, are not replicated.

DDL statements are replicated in statement level and changes to mysql.* tables will get replicated that way.

So you can issue: CREATE USER …. , this will be replicated,

but issuing: INSERT INTO mysql.user …. , will not be replicated.

You can also enable experimental MyISAM replication support with wsrep_replicate_myisam.

Unsupported queries:
- LOCK/UNLOCK tables
- lock function (GET_LOCK(), RELEASE_LOCK()….)

Due to cluster level concurrency control, transaction issuing COMMIT may be aborted at that stage.

There can be two transactions writing to same rows and committing in separate Percona XtraDB Cluster nodes, and only one of the them can successfully commit. The failing one will be aborted. For cluster level aborts, Percona will give back deadlock error code.

The write throughput of whole cluster is limited by weakest node. If one node becomes slow, whole cluster will become slow.

FEATURES

High Availability

In a basic setup with 3 nodes, the Percona XtraDB cluster will continue to function if you take any of the nodes down. Even in a situation of node crash, or if node becomes unavailable over network, the cluster will continue to work, and queries can be issued on working nodes.

In case, when there are changes in data while node was down, there are two options that Node may use when it joins the cluster:

State Snapshot Transfer (SST): SST method performs full copy of data from one node to other. It’s used when a new node joins the cluster. One of the existing node will transfer data to it.
There are three available methods of SST:

- mysqldump
- rsync
- xtrabackup

Downside of “mysqldump” and “rsync” is that your cluster becomes READ-ONLY while data is copied from one node to other.

while

xtrabackup SST does not require this for entire syncing process.

Incremental State Transfer (IST): If a node is down for a short period of time, and then starts up, the node is able to fetch only those changes made during the period it was down.

This is done using caching mechanism on nodes. Each node contains a cache, ring-buffer of last N changes, and the node is able to transfer part of this cache. IST can be done only if the amount of changes needed to transfer is less than N. If it exceeds N, then the joining node has to perform SST.

Multi-Master Replication

Multi-Master replication stands for the ability to write to any node in the cluster, and not to worry that it will get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.
With Percona XtraDB Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is, the write is either committed on all the nodes or not committed at all.

All queries are executed locally on the node, and there is a special handling only on COMMIT. When the COMMIT is issued, the transaction has to pass certification on all the nodes. If it does not pass, you will receive ERROR as a response on that query. After that, transaction is applied on the local node.

Getting Started with Percona XtraDB Cluster

Percona XtraDB Cluster

Why Cluster Mode Introduction:

Percona XtraDB cluster is High Availability and Scalability solution for MySQL users which provides
Synchronous replication : Transaction either committed on all nodes or none.
Multi-master replication : You can write to any node.
Parallel applying events on slave : parallel event application on all slave nodes
Automatic node provisioning
Data consistency
Straight into the Act:Installing Percona XtraDB Cluster

Pre-requisites/Assumptions

OS – Ububtu
3 Ubuntu nodes are available

For the sake of this discussion lets name the nodes as

node 1
hostname: percona_xtradb_cluster1
IP: 192.168.1.2

node 2
hostname: percona_xtradb_cluster2
IP: 192.168.1.3

node 3
hostname: percona_xtradb_cluster3
IP: 192.168.1.4

Repeat the below steps on all nodes

STEP 1 : Add the Percona repository

$ echo "deb http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
$ echo "deb-src http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
$ apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

STEP 2 : After adding percona repository, Update apt cache so that new packages can be included in our apt-cache.

$ apt-get update

STEP 3 : Install Percona XtraDB Cluster :

$ apt-get install -y percona-xtradb-cluster-56 qpress xtrabackup

STEP 4 : Install additional package for editing files, downloading etc :

$ apt-get install -y python-software-properties vim wget curl netcat

With the above steps we have, installed Percona XtraDB Cluster on every node. Now we’ll configure each node, so that a cluster of three nodes can be formed.

Node Configuration:

Add/Modify file /etc/mysql/my.cnf on first node :

[MYSQLD] #This section is for mysql configuration
user = mysql                                                   
default_storage_engine = InnoDB                  
basedir = /usr
datadir = /var/lib/mysql
socket = /var/run/mysqld/mysqld.sock
port = 3306
innodb_autoinc_lock_mode = 2
log_queries_not_using_indexes = 1
max_allowed_packet = 128M
binlog_format = ROW
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_node_address = 192.168.1.2
wsrep_cluster_name="newcluster"
wsrep_cluster_address = gcomm://192.168.1.2,192.168.1.3,192.168.1.4
wsrep_node_name = cluster1
wsrep_slave_threads = 4
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst:secret

[sst] #This section is for sst(state snapshot transfer) configuration 
streamfmt = xbstream

[xtrabackup] #This section is defines tuning configuration for xtrabackup
compress
compact
parallel = 2
compress_threads = 2
rebuild_threads = 2

Note :
wsrep_node_address = {IP of current node}
wsrep_cluster_name= {Name of cluster}
wsrep_cluster_address = gcomm://{Comma separated IP address’s which are in cluster}
wsrep_node_name = {This is name of current node which is used to identify it in cluster}

Now as we have done node configuration. Now start first node services:
Start the node :

$ service mysql bootstrap-pxc

Make sst user for authentication of cluster nodes :

$ mysql -e "GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret';"

Check cluster status :

$ mysql -e "show global status like 'wsrep%';"

Configuration file for second node:

[MYSQLD]
user = mysql
default_storage_engine = InnoDB
basedir = /usr
datadir = /var/lib/mysql
socket = /var/run/mysqld/mysqld.sock
port = 3306
innodb_autoinc_lock_mode = 2
log_queries_not_using_indexes = 1
max_allowed_packet = 128M
binlog_format = ROW
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_node_address = 192.168.1.3
wsrep_cluster_name="newcluster"
wsrep_cluster_address = gcomm://192.168.1.2,192.168.1.3,192.168.1.4
wsrep_node_name = cluster2
wsrep_slave_threads = 4
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst:secret

[sst]
streamfmt = xbstream

[xtrabackup]
compress
compact
parallel = 2

After doing configuration, start services of node 2.
Start node 2 :

$ service mysql start

Check cluster status :

$ mysql -e "show global status like 'wsrep%';"

Now similarly you have to configure node 3. Changes are listed below.

Changes in configuration for node 3 :
      wsrep_node_address = 192.168.1.4
      wsrep_node_name = cluster3

Start node 3 :

$ service mysql start

Test percona XtraDb cluster:

Log-in by mysql client in any node:

<code prettyprint="" style="color: black; word-wrap: no

mysql>create database opstree;
mysql>use opstree;
mysql>create table nu113r(name varchar(50));
mysql>insert into nu113r values("zukin");
mysql>select * from nu113r;