Speeding up Ansible Execution Part 1

The knowledge of one of the SCM tools is a must for any DevOps engineer, ANSIBLE is one of the popular tools in this category, we all are aware of the ease that Ansible provides whether it is infra provisioning, orchestration or application deployment.
The reason for the vast popularity of Ansible is the long list of modules it provides to support any level of automation, moreover it also gives users the flexibility to create their own modules as per their requirement.
But The purpose of this blog is not to mention the features that ansible provides, but to show how we can speed up our playbook execution in Ansible, as a beginner executing ansible, is very easy and it also feels like saving a lot of time with it, but as you dive deep into it, you will come to know that running ansible playbooks will engage you for a considerable amount of time.
There are a lot of articles available on the internet on how we can speed up our ansible execution, so I have decided to sum up those articles into my blog, with the following methods, we can reduce our execution time without compromising with the overall performance of Ansible.
Before starting, I request you guys to make a small change in your ansible configuration file (ansible.cfg), this small change will help you in tracking the time it will take for the playbook execution, and it also lists out the time is taken by each task.
Just add these lines to your ansible.cfg file under default section,

[default]

callback_whitelist = profile_tasks

Forks

When you are running your playbooks on various hosts, then you may have noticed that the number of servers where the playbook executes simultaneously is 5. You can increase this number inside the ansible.cfg file:
# ansible.cfg

forks = 10

or with a command line argument to ansible-playbook with the -f or –forks options. We can increase or decrease this value as per our requirement.
while using forks we should use “local_action” or “delegated” steps limited in number, as with higher fork value it will affect the ansible-server’s performance.

Async

In ansible, each task blocks the playbook, meaning the connections stay open until the task is done on each node, which is some cases takes a lot of time, here we can use “async” for those particular tasks, with the help of this ansible will automatically move to another task without waiting for the task execution on each node.
To launch a task asynchronously, we need to specify its maximum runtime and how frequently we would like to poll for status, it’s default value in 10 sec.
tasks:

– name: “name of the task”

command: “command we want to execute”

async: 40

poll: 15
The only condition is that the subsequent tasks must not have a dependency on this task.

Free Strategy

When running Ansible playbooks, you might have noticed that the Ansible runs every task on each node one by one, it will not move to another task until a particular task is completed on each node, which will take a lot of time, in some cases.
By default, the strategy is set to “linear”, we can set it to free.
—

– hosts: “hosts/groups”

name: “name of the playbook”

strategy: free

It will run the playbook on each host independently, without waiting for each node to complete.
Facts gathering is the default feature while executing playbook, sometimes we don’t need it.
In those cases, we can disable facts gathering, This has advantages in scaling Ansible in push mode with very large numbers of systems.
—

– hosts: “hosts/groups”

name: “name of the playbook”

gather_facts: no

Pipelining

For each task in Ansible, there are lots of ssh connection created, which results in increasing the total execution time. Pipelining reduces the number of ssh operations required to execute a module by executing many Ansible modules without an actual file transfer. We just have to make these changes in the ansible.cfg file,
# ansible.cfg Pipelining = True
Although this can result in a very significant performance improvement when enabled, Pipelining is disabled by default because requiretty is enabled by default for many distros.

Poll Interval

When we run any the task in Ansible, it starts polling to check if the task is completed on the host or not, we can decrease this polling interval time in ansible.cfg to increase its performance, but it will increase the CPU usage, so we need to adjust its value accordingly We just have to adjust this the parameter in the ansible.cfg file,
internal_poll_interval=0.001

so, these are the various ways to decrease our playbook execution time in Ansible, generally we don’t use all these methods in a single setup, we use these features as per the requirement,
The main motive of writing this blog is to determine the factors which will help in fine-tuning the Ansible performance, and there are many more factors which serves the same purpose but here I am mentioning the most important parameters among them.
I hope I have covered all the important aspects of the blog, feel free to provide your valuable feedback.
Thanks !!!

Source:

https://mitogen.networkgenomics.com/ansible_detailed.html

Redis Zero Downtime Cluster Migration

A few days back I came across a problem of migrating a Redis Master-Slave setup to Redis Cluster. Initially, I thought it to be a piece of cake since I have been already working on Redis, but there was a hitch, “Zero Downtime Migration”. Also, the redis was getting used as a database, not as Caching Server. So I started to think of different ways for migrating Redis Master-Slave setup to Redis Cluster and finally, I came up with an idea of migration.

Before we jump to migration, I want to give an overview regarding when we can use Redis as a database, and how to choose which setup we should go with Master-Slave or Cluster mode.

Redis as a Database

Sometimes getting data from disks can be time-consuming. In order to increase the performance, we can put the requests those either need to be served first or rapidly in Redis memory and then the Redis service there will keep rest of the data in the main database. So the whole architecture will look like this:-

Redis Master-Slave Replication

Beginning with the explanation about Redis Master-Slave. In this phenomenon, Redis can replicate data to any number of nodes. ie. it lets the slave have the exact copy of their master. This helps in performance optimizations.

I bet now you can use Redis as a Database.

Redis Cluster

A Redis cluster is simply a data sharding strategy. It automatically partitions data across multiple Redis nodes. It is an advanced feature of Redis which achieves distributed storage and prevents a single point of failure.

Replication vs Sharding

Replication is also known as mirroring of data. In replication, all the data get copied from the master node to the slave node.

Sharding is also known as partitioning. It splits up the data by the key to multiple nodes.

As shown in the above figure, all keys 1, 2, 3, 4 are getting stored on both machine A and B.

In sharding, the keys are getting distributed across both machine A and B. That is, the machine A will hold the 1, 3 key and machine B will hold 2, 4 key.

I guess now everyone has a good idea about Redis working mechanism. So let’s start discussing the migration of Redis.

Migration

Unfortunately, redis doesn’t have a direct way of migrating data from Redis-Master Slave to Redis Cluster. Let me explain it to you why?

We can start Redis service in either cluster mode or standalone mode. Now your solution would be that we can change the Redis Configuration value on-fly(means without restarting the Redis Service) with redis-cli. Yes, you are absolutely correct we can change the Redis configuration on-fly but unfortunately, Redis Mode(cluster or standalone) can’t be decided on-fly, for that we have to restart the service.

I guess now you guys will understand my situation :).

For migration, there are multiple ways of doing it. However, we needed to migrate the data without downtime or any interruptions to the service.

We decided the best course of action was a steps process:-

Firstly we needed to create a different Redis Cluster environment. The architecture of the cluster environment was something like

The next step was to update all the services (application) to send all the write operations to both servers(cluster and master-slave). The read commands (GET) will still go to the old setup.
But still, we don’t have the guarantee that all non-expirable data would make it over. So we can run a step to iterate through all of the keys and DUMP/RESTORE them into the new setup.
Once the new Redis Server looks good we could make the appropriate changes to the application to point solely to the new Redis Server.

I know the all steps are easy except the second step. Fortunately, redis provides a method of key scanning through which we can scan all the key and take a dump of it and then restore it in the new Redis Server.
To achieve this I have created a python utility in which you have to define the connection details of your old Redis Server and new Redis Server.

You can find the utility here.

https://github.com/opstree/redis-migration

I have provided the detail information on using this utility in the README file itself. I guess my experience will help you guys while redis migration.

Replication or Clustering?

I know most people have a query that when should we use replication and when clustering :).

If you have more data than RAM in a single machine, use Redis Cluster to shard the data across multiple databases.

If you have less data than RAM in a machine, set up a master-slave replication with sentinel in front to handle the fai-lover.

The main idea of writing this blog was to spread information about Replication and Sharding mechanism and how to choose the right one and if mistakenly you have chosen the wrong one, how to migrate it from :).

There are multiple factors yet to be explored to enhance the flow of migration if you find that before I do, please let me know to improve this blog.

I hope I explained everything and clear enough to understand.

Thanks for reading. I’d really appreciate any and all feedback, please leave your comment below if you guys have some feedbacks.

Happy Coding!!!!

Initially the script reference from taken here
https://gist.github.com/aniketsupertramp/1ede2071aea3257f9bb6be1d766a88f

ANSIBLE DYNAMIC INVENTORY IS IT SO HARD?

Thinking what the above diagram is all about. Once you are done with this blog, you will know exactly what it is. Till one month ago, I was of the opinion that Dynamic Inventory is a cool way of managing your AWS infrastructure as you don’t have to track your servers you just have to apply proper tags and Ansible Dynamic Inventory magically manages the inventory for you. Having said that I was not really comfortable using dynamic inventory as it was a black box I tried going through the Python script which was very cryptic & difficult to understand. If you are of the same opinion, then this blog is worth reading as I will try to demystify how things work in Dynamic Inventory and how you can implement your own Dynamic inventory using a very simple python script.

You can refer below article if you want to implement Dynamic inventory for your AWS infrastructure.

https://aws.amazon.com/blogs/apn/getting-started-with-ansible-and-dynamic-amazon-ec2-inventory-management/

Now coming to what is dynamic inventory and how you can create one. You have to understand what Ansible accepts as an inventory file. Ansible expects a JSON in the below format. Below is the screenshot showing the bare minimum content which is required by Ansible. Ansible expects a dictionary of groups (each group having a list of group>hosts, and group variables in the group>vars dictionary), and a _meta dictionary that stores host variables for all hosts individually (inside a hostvars dictionary).

So as long as you can write a script which generates output in the above JSON format. Ansible won’t give you any trouble. So let’s start creating our own custom inventory.

I have created a python script customdynamicinventory.py which reads the data from input.csv and generates the JSON as mentioned above. For simplicity, I have kept my input.csv as simple as possible. You can find the code here:-

https://github.com/SUNIL23891YADAV/dynamicinventory.git

If you want to test it just clone the code and replace the IP, user and key details as per your environment in the input.csv file. To make sure that our python script is generating the output in standard JSON format as expected by Ansible. You can run ./customdynamicinventory.py –list
And it will generate the output in standard JSON format as shown in below screenshot.

If you want to check how the static inventory file would have looked for the above scenario. You can refer to the below screenshot. It would have served the same purpose as the above dynamic inventory

Now to make sure your custom inventory is working fine. You can run

ansible all -i customdynamicinventory.py -m ping

It will try to ping all the hosts mentioned in the CSV. Let’s check it

See it is working, that’s how easy it is.

Instead of a static CSV file, we can have a database where all the hosts and related details are getting updated dynamically. Then Ansible dynamic inventory script can use this database as an inventory source as long as it returns a JSON structure, mentioned in the first screenshot.

Kafka Manager On Kubernetes

kafka-manager-on-kubernetes

We likely know Kafka as a durable, scalable and fault-tolerant publish-subscribe messaging system. Recently I got a requirement to efficiently monitor and manage our Kafka cluster, and I started looking for different solutions. Kafka-manager is an open source tool introduced by Yahoo to manage and monitor the Apache Kafka cluster via UI.

Before I share my experience of configuring Kafka manager on Kubernetes, let’s go through its considerable features

As per their documentation on github below are the major features:

Clusters:

Manage multiple clusters.

Easy inspection of the cluster state.

Brokers:

Run preferred replica election.
Generate partition assignments with the option to select brokers to use
Run reassignment of a partition (based on generated assignments)

Topics:

Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+)
Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in broker config)
The topic list now indicates topics marked for deletion (only supported on 0.8.2+)
Batch generate partition assignments for multiple topics with the option to select brokers to use
Batch run reassignment of partition for multiple topics
Add partitions to an existing topic
Update config for an existing topic

Metrics:

Optionally filter out consumers that do not have ids/ owners/ & offsets/ directories in zookeeper.
Optionally enable JMX polling for broker level and topic level metrics.

Prerequisites of Kafka Manager:

We should have a running Apache Kafka with Apache Zookeeper.

Apache Zookeeper
Apache Kafka

Deployment on Kubernetes:

To deploy Kafka Manager on Kubernetes, we need to create deployment and service file as given below.

You can find these sample file at https://github.com/vishant07/kafka-manager

After deployment, we should able to access Kafka manager service via http://:8080

We have two files to and to achieve above-mentioned setup. Let’s have a brief description of the different attributes used in these files.

Deployment configuration file:

namespace: provide a namespace to isolate application within Kubernetes.
replicas: number of containers to spun up.
image: provide the path of docker image to be used.
containerPorts: on which port you want to run your application.
environment: “ZK_HOSTS” provide the address of already running zookeeper.

Service configuration file:
This file contains the details to create Kafka manager service ok Kubernetes. For demo purpose, I have used the node port method to expose my service.

As we are using Kubernetes for our underlying platform of deployment it is recommended not to use external IP to access any service. Either we should go with LoadBalancer or use ingress (recommended method) rather than exposing all microservices.

To configure ingress, please take a note from Kubernetes Ingress.

Once we are able to access Kafka manager we can see similar screens.

Cluster Management

Topic List

Major Issues

To get broker level and topic level metrics we have to enable JMX polling.

So what we will generally do is to set the environment variable in the kubernetes manifest but somehow it is not working most of the times.

To resolve this you need to update JMX settings while creating your docker image as given as below.

vim /opt/kafka/bin/kafka-run-class.sh


if [ -z "$KAFKA_JMX_OPTS" ]; then

#KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false "

KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=$HOSTNAME -Djava.net.preferIPv4Stack=true"

fi

Conclusion

Deploying Kafka manager on Kubernetes encourages the easy setup, provides efficient manageability and all-time availability. Managing Kafka cluster over CLI becomes a tedious task and here Kafka manager helps to focus more on the use of Kafka rather than investing our time to configure and manage it. It becomes useful at Enterprise Level, where system engineers can manage multiple Kafka clusters easily via UI.

Reference links:

Image: google image search

Documentation: https://github.com/yahoo/kafka-manager

Redis Best Practices and Performance Tuning for High-Speed Systems

In modern high traffic systems, Redis is one of the fastest in-memory data stores, but without proper tuning, even Redis can start showing performance bottlenecks.

The solution? Performance tuning and configuration optimization.

This guide covers the most important Redis performance tuning practices every DevOps engineer, SRE, or backend developer must follow.

One of the thing that I love about my organization is that you don’t have to do the same repetitive work, you will always get the chance to explore some new technologies. The same chance came across to me a few days back when one of our clients was facing issue with Redis.
They were using the Redis Cluster with Sentinel for which they were facing issue regarding performance, whenever the connection request was high the Redis Cluster was not able to bear the load.
Since they were using a decent configuration of the server in terms of CPU and Memory but the result was the same. So now what????
The Answer was to tune the performance. Continue reading “Redis Best Practices and Performance Tuning for High-Speed Systems”