{"id":123,"date":"2019-03-26T06:23:00","date_gmt":"2019-03-26T06:23:00","guid":{"rendered":"https:\/\/opstree.com\/blog\/\/2019\/03\/26\/stay-away-replication-lag\/"},"modified":"2019-10-09T16:07:37","modified_gmt":"2019-10-09T10:37:37","slug":"stay-away-replication-lag","status":"publish","type":"post","link":"https:\/\/opstree.com\/blog\/2019\/03\/26\/stay-away-replication-lag\/","title":{"rendered":"Stay Away Replication Lag !"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/7852b-mysql-master-slave-replication-1.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p>\n\nRecently, I got a requirement to facilitate backup for the data and a way to analyze it without using the main database.&nbsp;<strong>MySQL replication<\/strong>&nbsp;is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.&nbsp;<br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Panic Starts<\/h3>\n\n\n\n<p>Everything was running smoothly in the night I configured it. But the joy didn&#8217;t last for long as the traffic hits in the morning, slave starts getting behind the master with few seconds which increases with the activity on the application. At the peak time, it was playing in thousands of second.<br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"478\" height=\"632\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/07\/image.png\" alt=\"\" class=\"wp-image-562\" \/><\/figure>\n\n\n\n<p>&nbsp;What now, I had to dig deep into MySQL Replication<\/p>\n\n\n\n<ul><li>How it works&nbsp;<\/li><li>What can probably cause the lag<\/li><li>An approach that minimizes or eliminates it<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How MySQL Replication Works<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">On the master<\/h4>\n\n\n\n<p>First of all, master writes replication events to a special log called&nbsp;<strong>binary log<\/strong>. This is usually a very lightweight activity because writes are buffered and they are sequential. The binary log file stores data that replication slave will be reading later.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">On the replica<\/h4>\n\n\n\n<p>When you start replication, two threads are started on the slave:<br><strong>1. IO thread<\/strong><br>This process connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file called&nbsp;<strong>relay log<\/strong>.<br>Even though there\u2019s only one thread reading the binary log from the master and one writing relay log on the slave, very rarely copying of replication events is a slower element of the replication. There could be a network delay, causing a steady delay of a few hundred milliseconds.<br>If you want to see where IO thread currently is, check the following in&nbsp;<em>\u201cshow slave status \\G\u201d<\/em><br><strong>Master_Log_File<\/strong>&nbsp;\u2013 last file copied from the master (most of the time it would be the same as last binary log written by a master)<br><strong>Read_Master_Log_Pos<\/strong>&nbsp;\u2013 binary log from the master is copied over to the relay log on the slave up until this position.<br>And then you can compare it to the output of&nbsp;<em>\u201cshow master status\/G\u201d<\/em>&nbsp;from the master.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mysql&gt; show master status\\G;\n*************************** 1. row ***************************\n             File: <strong>db01-binary-log.000032<\/strong>\n         Position: <strong>1008761891<\/strong>\n     Binlog_Do_DB: \n Binlog_Ignore_DB: \nExecuted_Gtid_Set: \n1 row in set (0.00 sec)<\/pre>\n\n\n\n<p><strong>2. SQL thread<\/strong><br>The second process \u2013 SQL thread \u2013 reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible.<br>Going back to&nbsp;<em>\u201cshow slave status \/G\u201d<\/em>, you can get the current status of SQL thread from the following variables:<br><strong>Relay_Master_Log_File<\/strong>&nbsp;\u2013 binary log from the master, that SQL thread is \u201cworking on\u201d (in reality it is working on relay log, so it\u2019s just a convenient way to display information)<br><strong>Exec_Master_Log_Pos<\/strong>&nbsp;\u2013 which position from the master binary log is being executed by SQL thread.<br><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mysql&gt; show slave status\\G;\n*************************** 1. row ***************************\n               Slave_IO_State: Waiting for master to send event\n                  Master_Host: &lt;master_ip&gt;\n                  Master_User: &lt;replication_user&gt;\n                  Master_Port: 3306\n                Connect_Retry: 60\n<strong>              Master_Log_File: db01-binary-log.000032<\/strong>\n<strong>          Read_Master_Log_Pos: 1008768810<\/strong>\n               Relay_Log_File: relay-bin.000093\n                Relay_Log_Pos: 1008769035\n<strong>        Relay_Master_Log_File: db01-binary-log.000032<\/strong>\n             Slave_IO_Running: Yes\n            Slave_SQL_Running: Yes\n.\n.\n          Exec_Master_Log_Pos: <strong>1008768810<\/strong>\n              Relay_Log_Space: 1008769305\n.\n.\n<strong>        Seconds_Behind_Master: 0<\/strong>\n.\n.\n      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates\n.\n.\n1 row in set (0.00 sec)\n<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Why Replication Lag Occurred<\/h3>\n\n\n\n<h3 class=\"wp-block-heading\"><\/h3>\n\n\n\n<p><strong>Replication lag&nbsp;<\/strong>occurs when the slaves cannot keep up with the updates occurring on the master. Unapplied changes accumulate in the slave&#8217;s relay logs and the version of the database on the slaves becomes increasingly different from that of the master.<br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Caught The Culprit<\/h3>\n\n\n\n<p>Let me take you through my journey how I crossed the river.&nbsp;<br>First, I took the reference to multiple blogs and started gobbling my mind with possible reasons suggesting&nbsp;<br><\/p>\n\n\n\n<ul><li>Hardware Faults (getting RAID in degraded mode)<\/li><li>MySQL Config Updates&nbsp;<ul><li>setting&nbsp;<em>sync_binlog=1<\/em><\/li><li>enabling&nbsp;<em>log_slave_updates<\/em><\/li><li>setting&nbsp;<em>innodb_flush_log_at_trx_commit=1<\/em><\/li><li>updating&nbsp;<em>slave_parallel_workers<\/em>&nbsp;to a higher value<\/li><li>changing&nbsp;<em>slave_parallel_type<\/em>&nbsp;to support more parallel workers<\/li><\/ul><\/li><li>Restarting Replication&nbsp;<\/li><\/ul>\n\n\n\n<p>But unfortunately, or say, it was my benightedness towards Database Administration that I was still searching for that twig which can help me from drowning.<br>And finally, I found one, my DBA friend who suggested me to look for the Binary Log Format that I am using. Let&#8217;s see what it is<br><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Binary Logging Formats<\/h4>\n\n\n\n<p>The server uses several logging formats to record information in the binary log. The exact format employed depends on the version of MySQL being used. There are three logging formats:<br><strong>STATEMENT:&nbsp;<\/strong>With statement-based replication, every SQL statement that could modify data is logged on the master. Then those SQL statements are replayed on the slaves against the same dataset and in the same context. There is always less data that is to be transferred between the master and the slave. But, the data inconsistency issue between the master and the slave that creeps up due to the way this kind of replication works.<br><strong>ROW:&nbsp;<\/strong>With row-based replication, every \u201crow modification\u201d is logged on the master and is then applied to the slave. With row-based replication, each and every change can be replicated and hence this is the safest form of replication. On a system that frequently UPDATE a large number of rows, it produces very large update logs and generates a lot of network traffic between the master and the slave.<br><strong>MIXED:&nbsp;<\/strong>A third option is also available: mixed logging. With mixed logging, statement-based logging is used by default, but the logging mode switches automatically to row-based in certain cases.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Changing Binary Log Format<\/h4>\n\n\n\n<p>The Binary Log Format is updated on Master MySQL server and requires MySQL service restart to reflect. It can be done for Global, Runtime or Session.<br><\/p>\n\n\n\n<ul><li>set at runtime with &#8211;binlog-format=format<\/li><li>setting the global (with the SUPER privilege)<\/li><li>session value of the binlog_format server variable<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\">mysql&gt; SET GLOBAL binlog_format=MIXED;\n\nmysql&gt; SET SESSION binlog_format=ROW;\n\nmysql&gt; SET binlog_format=STATEMENT; <\/pre>\n\n\n\n<p>So, earlier I was using&nbsp;<strong>STATEMENT<\/strong>&nbsp;BinLog Format, which is default one. Since I switched to&nbsp;<strong>MIXED<\/strong>&nbsp;BinLog Format, I am very delighted to share the below stats.<br>Current status of&nbsp; Master Read and Slave Execute position difference and Slave Lag (in sec), both are&nbsp;<strong>ZERO.<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><a href=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/3db81-image1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/3db81-image1.png?w=300\" alt=\"\" width=\"521\" height=\"50\" \/><\/a><\/figure>\n\n\n\n<p>Replication Lag (in Seconds) graph for a month, powered by Prometheus-Grafana.<br><\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><a href=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/7a3cb-image2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/7a3cb-image2.png?w=300\" alt=\"\" width=\"524\" height=\"119\" \/><\/a><\/figure>\n\n\n\n<p>Now, What&#8217;s next ??<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"525\" height=\"467\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2019\/03\/2.png?w=525\" alt=\"\" class=\"wp-image-983\" \/><\/figure>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, I got a requirement to facilitate backup for the data and a way to analyze it without using the main database.&nbsp;MySQL replication&nbsp;is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.&nbsp; Panic Starts Everything was running smoothly in &hellip; <a href=\"https:\/\/opstree.com\/blog\/2019\/03\/26\/stay-away-replication-lag\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Stay Away Replication Lag !&#8221;<\/span><\/a><\/p>\n","protected":false},"author":159458173,"featured_media":29900,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[161546,4419,11987323,481477,1],"tags":[14893108,768739289,768739300,14923288],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/11\/DevSecOps-1.jpg","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfDBOm-1Z","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/123"}],"collection":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/users\/159458173"}],"replies":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/comments?post=123"}],"version-history":[{"count":6,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/123\/revisions"}],"predecessor-version":[{"id":1648,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/123\/revisions\/1648"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media\/29900"}],"wp:attachment":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media?parent=123"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/categories?post=123"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/tags?post=123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}