Intro to Packer

Packer is an opensource tool developed by HashiCorp to create machine images for multiple cloud platforms like AWS, GCP, Azure or even VMWare. As the name suggests it packs all your software, packages, configurations while baking your machine images. Perhaps Packer is the only tool right now in the market which solely focuses on creating machine images and giving us the ability to automate the machine image creation process.

In this blog post, we will learn What Packer does and how it does things. Sounds Interesting!!!!

 
What is Packer and Machine Images
 
“Packer can be used to creating identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs on every major operating system, and is highly performant, creating machine images for multiple platforms in parallel.” 
https://www.packer.io/intro/ 
It does installs and configures the software by using different SCM tools such as Ansible, Chef or Puppet, shell scripts within your Packer-made images. You can either include your scripts in json template itself or you can source it from a file.
 
“A machine image is a single static unit that contains a pre-configured operating system and installed software which is used to quickly create new running machines. Machine image formats change for each platform. Some examples include AMIs for EC2, VMDK/VMX files for VMware, OVF exports for VirtualBox, etc”
https://www.packer.io/intro/ 
 
Why the Heck We Should Consider to Learn Packer!!!!!!!!
 
Consider a couple of scenarios mentioned below:-
 
Scenario1
 
If you want to have an immutable infrastructure in place. The key guideline behind an immutable infrastructure is that you never modify a running server. If a change is required, you instead completely replace the server with a new instance that contains the update or change.
The new server instance is created with an origin image that is built upon or a restored image from a previously defined server state. Version control and tag your images for easy rollback and distribution. Image contains all the application code, runtime dependencies, and configuration–in essence, the state needed for the software to run as expected. You will want to minimize the time required to bake all your required stuff into your image which can be achieved if you have proper tag maintained over your previous release images which can be used as an origin-golden image to bake the new image. The entire process of baking and using images become outstandingly easy by using Packer.
 
Scenario2
 
If you have autoscaling in place, there must be a requirement to scale up a new serviceable VMs as soon as possible but there are some concerns which spoil your expectations of serviceable VMs in the least time:
  • OS boot 
  • OS configuration 
  • SCM with Ansible or Chef 
  • Setting up your application
With having a pre-baked image in place, your time to scale up your VMs will drastically decrease.
 
So How Does Packer Works!!!!!!!!
 
Packer uses the JSON file as a template, it takes template as in input rolls up a temporary VMs based on the details provided, does the required configuration and stops it. After stopping the VMs it starts creating the image and save it as the name/tag provided in the template.
 
json file packer engine EC2 AMI
              
                                                       
Basic Concepts of Packer
 
There are two things which you will need to know to get started with packer:
  • Templates 
  • Commands
 
Templates
 
There are four sections in the Packer template: 
  • Variables(optional)-is an object of one or more key/value strings that define user variables contained in the template. If it is not specified, then no variables are defined 
  • Builders(required)- is an array of one or more objects that defines the builders that will be used to create machine images for this template, and configures each of those builders. 
  • Provisioners(optional)- is an array of one or more objects that defines the provisioners that will be used to install and configure the software for the machines created by each of the builders 
  • post-processors(optional)-  is an array of one or more objects that defines the various post-processing steps to take with the built images. If not specified, then no post-processing will be done
Sub-Commands
 
Likewise, Unix packer also takes subcommand and options. There are three sub-commands:
  • build-The packer build command takes a template and runs all the builds within it in order to generate a set of artefacts. The various builds specified within a template are executed in parallel unless otherwise specified. And the artefacts that are created will be outputted at the end of the build. 
  • validate- The packer validate command is used to validate the syntax and configuration of a template. The command will return a zero exit status on success, and a non-zero exit status on failure. Additionally, if a template doesn’t validate, any error messages will be outputted. 
  • inspect -The packer inspect takes a template and outputs the various components a template defines. This can help you quickly learn about a template without having to dive into the JSON itself. The command will tell you things like what variables a template accepts, the builders it defines, the provisioners it defines and the order they’ll run, and more.

Hope this blog helps you understand the basics of Packer. Having covered all the basics understanding, we can now “Get Started With Packer”.

ANSIBLE DYNAMIC INVENTORY IS IT SO HARD?

Thinking what the above diagram is all about. Once you are done with this blog, you will know exactly what it is. Till one month ago, I was of the opinion that Dynamic Inventory is a cool way of managing your AWS infrastructure as you don’t have to track your servers you just have to apply proper tags and Ansible Dynamic Inventory magically manages the inventory for you. Having said that I was not really comfortable using dynamic inventory as it was a black box I tried going through the Python script which was very cryptic & difficult to understand. If you are of the same opinion, then this blog is worth reading as I will try to demystify how things work in Dynamic Inventory and how you can implement your own Dynamic inventory using a very simple python script.

You can refer below article if you want to implement Dynamic inventory for your AWS infrastructure.

https://aws.amazon.com/blogs/apn/getting-started-with-ansible-and-dynamic-amazon-ec2-inventory-management/

Now coming to what is dynamic inventory and how you can create one. You have to understand what Ansible accepts as an inventory file. Ansible expects a JSON in the below format. Below is the screenshot showing the bare minimum content which is required by Ansible. Ansible expects a dictionary of groups (each group having a list of group>hosts, and group variables in the group>vars dictionary), and a _meta dictionary that stores host variables for all hosts individually (inside a hostvars dictionary).

So as long as you can write a script which generates output in the above JSON format. Ansible won’t give you any trouble. So let’s start creating our own custom inventory.

I have created a python script customdynamicinventory.py which reads the data from input.csv and generates the JSON as mentioned above. For simplicity, I have kept my input.csv as simple as possible. You can find the code here:-

https://github.com/SUNIL23891YADAV/dynamicinventory.git

If you want to test it just clone the code and replace the IP, user and key details as per your environment in the input.csv file. To make sure that our python script is generating the output in standard JSON format as expected by Ansible. You can run ./customdynamicinventory.py –list
And it will generate the output in standard JSON format as shown in below screenshot.




If you want to check how the static inventory file would have looked for the above scenario. You can refer to the below screenshot. It would have served the same purpose as the above dynamic inventory

Now to make sure your custom inventory is working fine. You can run

ansible all -i  customdynamicinventory.py -m ping

It will try to ping all the hosts mentioned in the CSV. Let’s check it

See it is working, that’s how easy it is.

Instead of a static CSV file, we can have a database where all the hosts and related details are getting updated dynamically. Then Ansible dynamic inventory script can use this database as an inventory source as long as it returns a JSON structure, mentioned in the first screenshot.

Redis Load Testing

As I mentioned in my previous blog on Redis Best Practices that in my upcoming blog I will discuss about load testing on Redis so here I am ready with my blog in which I will explain how can we measure our Redis Performance. Although there are plenty of articles out there on this similar topic, but I want to share my experience as a DevOps. Also, I want to share the methods which we are implementing at our organization.

So, Load testing What, Why?

The first thought which comes to our mind is why do we need load testing, as our environment is already working fine. Or it is working fine since the first launch.

But Hey!!! let me tell you something it’s not simple as that, as we know everything has its own limits and knowing your limits is always helpful. When we are not ready to face the problem of the increasing load, our environment can easily collapse. There is saying as well

Prevention is better than cure

In simple words, it means it’s easier to stop something bad happening in the first place than to repair the damage after it happened.

So when I decided to test the redis performance, it was not quite easy to start, there are plenty of load testing frameworks were present but which to choose? So I did the comparison between various load testing framework. Although there is already Jmeter was available which is a popular load testing tool but I found it a bit complex to learn easily and rapidly and also it requires a hell lot of resources so I have chosen an awesome Python load testing framework, Locust, which is very lightweight and easy to setup load testing framework.

The golden rule before starting the load testing is that you should have metrics or a number in your mind which you want to achieve.

So I will be using my own organization load testing utility which we have created for Redis Load or Performance Testing.

You can find the code at- https://github.com/opstree/redis-load-test

So you can simply clone the git repo like this:-

git clone https://github.com/opstree/redis-load-test.git

As Locust is a python based project so it doesn’t have long dependencies list but surely it does have some dependency. So after repo cloning, we have to install these dependencies

cd Scripts
pip3 install -r requirments.txt

Once the dependency hurdle is crossed we can move to the step in which we will connect our utility to Redis. To achieve this we have a file called redis.json in the Scripts folder of the repo, you just have to update the redis details in that file. For example:-

{
    "redis_host": "10.1.1.100",
    "redis_port": "6379",
    "redis_password": ""
}

Once you are done with connection details, kaboom you are all set to use the performance testing utility. Just go on the terminal and run this command.

locust -f redis_get_set.py

The output will be something like this

Now go and open up the URL on the browser by http://your_ip:8089

The UI page will look like this

You will have two empty blocks there:-

Number of users to simulate:- Total number of user connection request which you want to make.

Hatch Rate:- How quickly you want to spawn users.

After filling these details you can simply start swarming and you can wait until it completes its execution. Once the execution will be completed you will have this kind of details.

This page is loading data in the form of statistics but you can also see the data in a beautiful graph format on the same UI. For example-

I have also provided detailed information in the README file as well of the repo.

One of the benefits which I feel important in doing load testing is that you can set up a performance baseline according to your environment. And yes, if you are not getting the desired output of your Redis Performance you can check out our blog on Redis Best Practices and Performance Tuning.

The main idea of writing this blog was to encourage people to know the limitations of their environment and to make it ready for any kind of challenge.

I hope I explained everything clearly enough to understand. If you do have any questions or suggestions, please feel free to ask.

Cheers Till Next Time!!!

Linux Namespaces – Part 2

Before talking about the types of namespaces we are assuming that you have gone through our First Part of Linux Namespaces, if not you can check it here.

 

Types of Namespaces

So Basically we have seven types of Linux Namespaces:-
  1. CGroups:- Basically cgroups virtualize the view of process’s cgroups in /proc/[pid]/cgroups. Whenever a process creates a new cgroup it enters in a new namespace in which all current directories become cgroup root directories of the new namespace. So we can say that it isolates cgroup root directory.
  2. IPC(Interpolation Communication):- This namespace isolates interpolation communication. For example, In Linux, we have System V IPC(A communication mechanism) and Posfix (for message queues) which allows processes to exchange data in form of communication. So in simple words, we can say that IPC namespace isolates communication.
  3. Network:- This namespace isolates systems related to the network. For example:- network devices, IP protocols, Firewall Rules (That’s why we can use the single port with single service )
  4. Mount:- This namespace isolates mount points that can be seen by processes in each namespace. In simple words, you can take an example of filesystem mounting in which we can mount only one device or partition on a mount-point.
  5. PID:- This namespace isolates the PID. (In this child processes cannot see or trace the parent process but parent process can see or trace the child processes of the namespace. Processes in different PID namespace can have same PID.)
  6. User:- This namespace isolates security related identifier like group id and user id. In simple words, we can say that the process’s group and user id has full privilege inside the namespace but not outside the namespace.
  7. UTS:- This namespace provides the isolation on hostname and domain name. It means processes has a separate copy of domain name or hostname so while changing hostname or domain name it will not affect the rest of the system.

Namespace Management

This is the most advanced topic of Linux namespaces which should be done on kernel level. For the namespace management, you have to write a C program.
For management of namespace, we have these functions available in Linux:-
  • clone():-  If we use standalone clone() it will create a new process only, but if we pass one or more flags like CLONE_NEW*, then the new namespace will be created and child process will become the member of it.
  • setns():- This allows joining existing namespace. The namespace is specified by the file descriptor referenced to process.
  • unshare():- This allows calling process to disassociate from parts of current namespace. Basically, this function works on the processes that are being shared by other’s namespace as well for ex:- mount namespace.

Kafka Manager On Kubernetes

 

                                kafka-manager-on-kubernetes

 

We likely know Kafka as a durable, scalable and fault-tolerant publish-subscribe messaging system. Recently I got a requirement to efficiently monitor and manage our Kafka cluster, and I started looking for different solutions. Kafka-manager is an open source tool introduced by Yahoo to manage and monitor the Apache Kafka cluster via UI.

Before I share my experience of configuring Kafka manager on Kubernetes, let’s go through its considerable features
 

As per their documentation on github below are the major features: 

Clusters:
 
  • Manage multiple clusters.
  • Easy inspection of the cluster state.

Brokers:

  • Run preferred replica election.
  • Generate partition assignments with the option to select brokers to use
  • Run reassignment of a partition (based on generated assignments)

Topics:

  • Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+)
  • Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in broker config)
  • The topic list now indicates topics marked for deletion (only supported on 0.8.2+)
  • Batch generate partition assignments for multiple topics with the option to select brokers to use
  • Batch run reassignment of partition for multiple topics
  • Add partitions to an existing topic
  • Update config for an existing topic

Metrics:

  • Optionally filter out consumers that do not have ids/ owners/ & offsets/ directories in zookeeper.
  • Optionally enable JMX polling for broker level and topic level metrics.

Prerequisites of Kafka Manager:

We should have a running Apache Kafka with Apache Zookeeper.

 
  • Apache Zookeeper
  • Apache Kafka

Deployment on Kubernetes: 

To deploy Kafka Manager on Kubernetes, we need to create deployment and service file as given below.
 
You can find these sample file at https://github.com/vishant07/kafka-manager

After deployment, we should able to access Kafka manager service via http://:8080

We have two files to Kafka-manager-service.yaml and kafka-manager.yaml to achieve above-mentioned setup. Let’s have a brief description of the different attributes used in these files. 

Deployment configuration file: 


namespace: provide a namespace to isolate application within Kubernetes.

replicas: number of containers to spun up.
image: provide the path of docker image to be used.
containerPorts: on which port you want to run your application.
environment: “ZK_HOSTS” provide the address of already running zookeeper.

Service configuration file:

This file contains the details to create Kafka manager service ok Kubernetes. For demo purpose, I have used the node port method to expose my service. 

As we are using Kubernetes for our underlying platform of deployment it is recommended not to use external IP to access any service. Either we should go with LoadBalancer or use ingress (recommended method) rather than exposing all microservices.  


To configure ingress, please take a note from Kubernetes Ingress.


Once we are able to access Kafka manager we can see similar screens. 
 

Cluster Management

Topic List

 

 
 

Major Issues

 
To get broker level and topic level metrics we have to enable JMX polling.
 
So what we will generally do is to set the environment variable in the kubernetes manifest but somehow it is not working most of the times.

 

To resolve this you need to update JMX settings while creating your docker image as given as below.

vim /opt/kafka/bin/kafka-run-class.sh


if [ -z "$KAFKA_JMX_OPTS" ]; then
#KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false "

KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=$HOSTNAME -Djava.net.preferIPv4Stack=true"

fi
 

Conclusion

 
Deploying Kafka manager on Kubernetes encourages the easy setup, provides efficient manageability and all-time availability. Managing Kafka cluster over CLI becomes a tedious task and here Kafka manager helps to focus more on the use of Kafka rather than investing our time to configure and manage it.  It becomes useful at Enterprise Level, where system engineers can manage multiple Kafka clusters easily via UI.
Reference links:
Image: google image search
Documentation: https://github.com/yahoo/kafka-manager