DEVOPS DONE RIGHT. - Page 102 of 119 - A blog site on our Real life experiences with various phases of DevOps starting from VCS, Build & Release, CI/CD, Cloud, Monitoring, Containerization.

Redis Best Practices and Performance Tuning for High-Speed Systems

In modern high traffic systems, Redis is one of the fastest in-memory data stores, but without proper tuning, even Redis can start showing performance bottlenecks.

The solution? Performance tuning and configuration optimization.

This guide covers the most important Redis performance tuning practices every DevOps engineer, SRE, or backend developer must follow.

One of the thing that I love about my organization is that you don’t have to do the same repetitive work, you will always get the chance to explore some new technologies. The same chance came across to me a few days back when one of our clients was facing issue with Redis.
They were using the Redis Cluster with Sentinel for which they were facing issue regarding performance, whenever the connection request was high the Redis Cluster was not able to bear the load.
Since they were using a decent configuration of the server in terms of CPU and Memory but the result was the same. So now what????
The Answer was to tune the performance. Continue reading “Redis Best Practices and Performance Tuning for High-Speed Systems”

How to resolve “Segmentation fault (core dumped)”

A segmentation fault, often called a segfault, is a notorious error that occurs when a program attempts to access memory beyond its permitted limits. For Ubuntu users, dealing with these errors can be particularly frustrating and complicated. In this detailed guide, we’ll delve into the nuances of segmentation faults, explore their causes, and discover practical ways to fix them. Whether you’re an experienced developer or just a Linux enthusiast, this article will provide you with the information you need to effectively handle segmentation faults.

The phrase “Core dumped” indicates that when the crash occurred, the operating system saved a full snapshot of the program’s memory (known as a “core dump”) to a file on the disk. This file is crucial for debugging because it contains the exact state of the program at the moment it failed, including details like the call stack, variable values, and memory mappings.

Why Does “Segmentation fault (core dumped)” Happen?

Here are some frequent culprits:

Crashing binaries when upgrading your system
Programs attempting to access invalid memory locations
Outdated or broken software packages
Cache corruption that can occur during installation or updates
Problems tied to specific software dependencies

[ Also Read: How To Debug a Bash Shell Script? ]

How to Fix Segmentation Fault in Ubuntu

Segmentation fault is when your system tries to access a page of memory that doesn’t exist. Core dumped means when a part of code tries to perform read and write operation on a read-only or free location. Segfaults are generally associated with the file named core and It generally happens during up-gradation.

While running some commands during the core-dump situation you may encounter with “Unable to open lock file” this is because the system is trying to capture a bit block which is not existing, This is due to the crashing of binaries of some specific programs.

You may do backtracking or debugging to resolve it but the solution is to repair the broken packages and we can do it by performing the below-mentioned steps:

Method 1: Fix Using the Command Line

Step 1: Remove the lock files present at different locations.

sudo rm -rf /var/lib/apt/lists/lock /var/cache/apt/archives/lock /var/lib/dpkg/lock and restart your system h.cdccdc

Step 2: Remove repository cache.

sudo apt-get clean all

Step 3: Update and upgrade your repository cache.

sudo apt-get update && sudo apt-get upgrade

Step 4: Now upgrade your distribution, it will update your packages.

sudo apt-get dist-upgrade

Step 5: Find the broken packages and delete them forcefully.

sudo dpkg -l | grep ^..r | apt-get purge

Method 2: Fix Using Recovery Mode (GUI)

Step 1: Run Ubuntu in startup mode by pressing the Esc key after the restart.

Step 2: Select Advanced options for Ubuntu

Step 3: Run Ubuntu in the recovery mode and you will be listed with many options.

Step 4: First select “Repair broken packages”

Step 5: Then select “Resume normal boot”

So, we have two methods of resolving segmentation fault: CLI and the GUI. Sometimes, it may also happen that the “apt” command is not working because of segfault, so our CLI method will not work, in that case also don’t worry as the GUI method gonna work for us always.

Prevention Tips

To prevent segmentation faults moving forward, keep these tips in mind:

Make it a habit to regularly update and upgrade your system packages.
Try not to interrupt installations or upgrades once they’ve started.
Periodically clean out the repository cache to keep things tidy.
Always use stable versions of your applications to minimize issues.

Final Thoughts

Facing a segmentation fault (core dump) can be quite frustrating, but it is not impossible. Using the CLI method or recovery mode, you can quickly fix the problem and get your system working properly again.

To minimize the chances of encountering this error in the future, make sure your system is always updated and take precautions when upgrading.

The closer you think you are, the less you’ll actually see

I hope you have seen the movie Now you see me, it has a famous quote The closer you think you are, the less you’ll actually see. Well, this blog is not about this movie but how I got stuck into an issue, because I was not paying attention and looking at the things closely and seeing less hence not able to resolve the issue.

There is a lot happening in today’s DevOps world. And HashiCorp has emerged out to be a big player in this game. Terraform is one of the open source tools to manage infrastructure as code. It plays well with most of the cloud provider. But with all these continuous improvements and enhancements there comes a possibility of issues as well. Below article is about such a scenario. And in case you have found yourself in the same trouble. You are lucky to reach the right page.

I was learning terraform and performing a simple task to launch an Ubuntu EC2 instance in us-east-1 region. For which I required the AMI Id, which I copied from the AWS console as shown in below screenshot.

Once I got the AMI Id, I tried to create the instance using terraform, below is the screenshot of the code

provider “aws” {

region = “us-east-1”

access_key = “XXXXXXXXXXXXXXXXXX”

secret_key = “XXXXXXXXXXXXXXXXXXX”

}

resource “aws_instance” “sandy” {

ami = “ami-036ede09922dadc9b“

instance_type = “t2.micro”

subnet_id = “subnet-0bf4261d26b8dc3fc”

}

I was expecting to see the magic of Terraform but what I got below ugly error.

Terraform was not allowing to spin up the instance. I tried couple of things which didn’t work. As you can see the error message didn’t give too much information. Finally, I thought of giving it a try by doing same task via AWS web console. I searched for the same ubuntu AMI and selected the image as shown below. Rest of the things, I kept to default. And well, this time it got launched.

And it confused me more. Through console, it was working fine but while using Terraform it says not allowed. After a lot of hair pulling finally, I found the culprit which is a perfect example of how overlooking small things can lead to blunder.

Culprit

While copying the AMI ID from AWS console, I had copied the 64-bit (ARM) AMI ID. Please look carefully, the below screenshot

But while creating it through console I was selecting the default configuration which by is 64-bit(x86). Look at the below screenshot.

To explain it further, I tried to launch the VM with 64-bit (ARM) manually. And while selecting the AMI, I selected the 64-bit (ARM).

And here is the culprit. 64-bit(ARM) only supports a1 instance type

Conclusion

While launching the instance with the terraform, I tried using 64-bit (ARM) AMI ID mistakenly, primarily because for same AMI there are 2 AMI IDs and it is not very visible to eyes unless you pay special attention.

So folks, next time choosing an AMI ID keep it in mind what type of AMI you are selecting. It will save you a lot of time.

Stay Away Replication Lag !

Recently, I got a requirement to facilitate backup for the data and a way to analyze it without using the main database. MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

Panic Starts

Everything was running smoothly in the night I configured it. But the joy didn’t last for long as the traffic hits in the morning, slave starts getting behind the master with few seconds which increases with the activity on the application. At the peak time, it was playing in thousands of second.

What now, I had to dig deep into MySQL Replication

How it works
What can probably cause the lag
An approach that minimizes or eliminates it

How MySQL Replication Works

On the master

First of all, master writes replication events to a special log called binary log. This is usually a very lightweight activity because writes are buffered and they are sequential. The binary log file stores data that replication slave will be reading later.

On the replica

When you start replication, two threads are started on the slave:
1. IO thread
This process connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file called relay log.
Even though there’s only one thread reading the binary log from the master and one writing relay log on the slave, very rarely copying of replication events is a slower element of the replication. There could be a network delay, causing a steady delay of a few hundred milliseconds.
If you want to see where IO thread currently is, check the following in “show slave status \G”
Master_Log_File – last file copied from the master (most of the time it would be the same as last binary log written by a master)
Read_Master_Log_Pos – binary log from the master is copied over to the relay log on the slave up until this position.
And then you can compare it to the output of “show master status/G” from the master.

mysql> show master status\G;
*************************** 1. row ***************************
             File: db01-binary-log.000032
         Position: 1008761891
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

2. SQL thread
The second process – SQL thread – reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible.
Going back to “show slave status /G”, you can get the current status of SQL thread from the following variables:
Relay_Master_Log_File – binary log from the master, that SQL thread is “working on” (in reality it is working on relay log, so it’s just a convenient way to display information)
Exec_Master_Log_Pos – which position from the master binary log is being executed by SQL thread.

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: <master_ip>
                  Master_User: <replication_user>
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db01-binary-log.000032
          Read_Master_Log_Pos: 1008768810
               Relay_Log_File: relay-bin.000093
                Relay_Log_Pos: 1008769035
        Relay_Master_Log_File: db01-binary-log.000032
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
.
.
          Exec_Master_Log_Pos: 1008768810
              Relay_Log_Space: 1008769305
.
.
        Seconds_Behind_Master: 0
.
.
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
.
.
1 row in set (0.00 sec)

Why Replication Lag Occurred

Replication lag occurs when the slaves cannot keep up with the updates occurring on the master. Unapplied changes accumulate in the slave’s relay logs and the version of the database on the slaves becomes increasingly different from that of the master.

Caught The Culprit

Let me take you through my journey how I crossed the river.
First, I took the reference to multiple blogs and started gobbling my mind with possible reasons suggesting

Hardware Faults (getting RAID in degraded mode)
MySQL Config Updates
- setting sync_binlog=1
- enabling log_slave_updates
- setting innodb_flush_log_at_trx_commit=1
- updating slave_parallel_workers to a higher value
- changing slave_parallel_type to support more parallel workers
Restarting Replication

But unfortunately, or say, it was my benightedness towards Database Administration that I was still searching for that twig which can help me from drowning.
And finally, I found one, my DBA friend who suggested me to look for the Binary Log Format that I am using. Let’s see what it is

Binary Logging Formats

The server uses several logging formats to record information in the binary log. The exact format employed depends on the version of MySQL being used. There are three logging formats:
STATEMENT: With statement-based replication, every SQL statement that could modify data is logged on the master. Then those SQL statements are replayed on the slaves against the same dataset and in the same context. There is always less data that is to be transferred between the master and the slave. But, the data inconsistency issue between the master and the slave that creeps up due to the way this kind of replication works.
ROW: With row-based replication, every “row modification” is logged on the master and is then applied to the slave. With row-based replication, each and every change can be replicated and hence this is the safest form of replication. On a system that frequently UPDATE a large number of rows, it produces very large update logs and generates a lot of network traffic between the master and the slave.
MIXED: A third option is also available: mixed logging. With mixed logging, statement-based logging is used by default, but the logging mode switches automatically to row-based in certain cases.

Changing Binary Log Format

The Binary Log Format is updated on Master MySQL server and requires MySQL service restart to reflect. It can be done for Global, Runtime or Session.

set at runtime with –binlog-format=format
setting the global (with the SUPER privilege)
session value of the binlog_format server variable

mysql> SET GLOBAL binlog_format=MIXED;

mysql> SET SESSION binlog_format=ROW;

mysql> SET binlog_format=STATEMENT;

So, earlier I was using STATEMENT BinLog Format, which is default one. Since I switched to MIXED BinLog Format, I am very delighted to share the below stats.
Current status of Master Read and Slave Execute position difference and Slave Lag (in sec), both are ZERO.

Replication Lag (in Seconds) graph for a month, powered by Prometheus-Grafana.

Now, What’s next ??

Best Practices for Writing a Shell Script

I am a lazy DevOps Engineer. So whenever I came across the same task more than 2 times I automate that. Although now we have many automation tools, still the first thing that hit into our mind for automation is bash or shell script.
After making a lot of mistakes and messy scripts :), I am sharing my experiences for writing a good shell script which not only looks good but also it will reduce the chances of error.

The things that every code should have:-
– A minimum effort in the modification.
– Your program should talk in itself, so you don’t have to explain it.
– Reusability, Of course, I can’t write the same kind of script or program again and again.

I am a firm believer in learning by doing. So let’s create a problem statement for ourselves and then try to solve it via shell scripting with best practices :). I would like to have solutions in the comment section of this blog.

Problem Statement:- Write a shell script to install and uninstall a package(vim) depending on the arguments. The script should tell if the package is already installed. If no argument is passed it should print the help page.

So without wasting time let’s start for writing an awesome shell script. Here is the list of things that should always be taken care of while writing a shell script.

Lifespan of Script

If your script is procedural(each subsequent steps relies on the previous step to complete), do me a favor and add set -e in starting of the script so that the script exists on the first error. For example:-

#!/bin/bash
set -e # Script exists on the first failure
set -x # For debugging purpose

Functions

Ahha, Functions are my most favorite part of programming. There is a saying

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

To achieve this always try to use functions and name them properly so that anyone can understand the function just by reading its name. Functions also provide the concept of re-usability. It also removes the duplicating of code, how? let’s see this

#!/bin/bash 
install_package() {
   local PACKAGE_NAME="$1"
   yum install "${PACKAGE_NAME}" -y
}
install_package "vim"

Command Sanity

Usually, scripts call other scripts or binary. When we are dealing with commands there are chances that commands will not be available on all systems. So my suggestion is to check them before proceeding.

#!/bin/bash  
check_package() {
    local PACKAGE_NAME="$1"
    if ! command -v "${PACKAGE_NAME}" > /dev/null 2>&1
    then
           printf "${PACKAGE_NAME} is not installed.\n"
    else
           printf "${PACKAGE_NAME} is already installed.\n"
    fi
}
check_package "vim"

Help Page

If you guys are familiar with Linux, you have certainly noticed that every Linux command has its help page. The same thing can be true for the script as well. It would be really helpful to include –help flag.

#!/bin/bash  
INITIAL_PARAMS="$*"
help_function() {
   {
        printf "Usage:- ./script <option>\n"
        printf "Options:\n"
        printf " -a ==> Install all base softwares\n"
        printf " -r ==> Remove base softwares\n"
    }
}
arg_checker() {
     if [ "${INITIAL_PARAMS}" == "--help" ]; then
            help_function
     fi
}
arg_checker

Logging

Logging is the most critical thing for everyone whether he is a developer, sysadmin or DevOps. Debugging seems to be impossible without logs. As we know most applications generate logs for understanding that what is happening with the application, the same practice can be implemented for shell script as well. For generating logs we have a bash utility called logger.

#!/bin/bash 
DATE=$(date)
declare DATE
check_file() {
     local FILENAME="$1"
     if ! ls "${FILENAME}" > /dev/null 2>&1
     then
            logger -s "${DATE}: ${FILENAME} doesn't exists"
     else
           logger -s "${DATE}: ${FILENAME} found successfuly"
     fi
}
check_file "/etc/passwd"

Variables

I like to name my variables in Capital letters with an underscore, In this way, I will not get confused with the function name and variable name. Never give a,b,c etc. as a variable name instead of that try to give a proper name to a variable as well just like functions.

#!/bin/bash 
# Use declare for declaring global variables
declare GLOBAL_MESSAGE="Hey, I am a global message"
# Use local for declaring local variables inside the function
message_print() {
    local LOCAL_MESSAGE="Hey, I am a local message"
    printf "Global Message:- ${GLOBAL_MESSAGE}\n"
    printf "Local Message:- ${LOCAL_MESSAGE}\n"
}
message_print

Cases

Cases are also a fascinating part of shell script. But the question is when to use this? According to me if your shell program is providing more than one functionality basis on the arguments then you should go for cases. For example:- If your shell utility provides the capability of installing and uninstalling the software.

#!/bin/bash  
print_message() {
    MESSAGE="$1"
    echo "${MESSAGE}"
}
case "$1" in
   -i|--input)
      print_message "Input Message"
      ;;
   -o|--output)
        print_message "Output Message"
        ;;
   --debug)
       print_message "Debug Message"
       ;;
    *)
      print_message "Wrong Input"
      ;;
esac

In this blog, we have covered functions, variables, the lifespan of a script, logging, help page, command sanity. I hope these topics help you in your daily life while using the shell script. If you have any feedback please let me know through comments.
Cheers Till the next Time!!!!