Getting Started with Percona XtraDB Cluster

Percona XtraDB Cluster

As a DevOps activist I am exploring Percona XtraDB. In a series of blogs I will share my learnings. This blog intends to capture step by step details of installation of Percona XtraDB in Cluster mode. 
 

Why Cluster Mode Introduction:

Percona XtraDB cluster is High Availability and Scalability solution for MySQL users which provides
          Synchronous replication : Transaction either committed on all nodes                 or none.
          Multi-master replication : You can write to any node.
          Parallel applying events on slave : parallel event application on all slave                 nodes
          Automatic node provisioning
          Data consistency
Straight into the Act:Installing Percona XtraDB Cluster

Pre-requisites/Assumptions
  1. OS – Ububtu
  2. 3 Ubuntu nodes are available
For the sake of this discussion lets name the nodes as
node 1
hostname:
percona_xtradb_cluster1
IP: 192.168.1.2

node 2
hostname: percona_xtradb_cluster2
IP: 192.168.1.3

node 3
hostname: percona_xtradb_cluster3
IP: 192.168.1.4

Repeat the below steps on all nodes

STEP 1 : Add the Percona repository

$ echo "deb http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
$ echo "deb-src http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list
$ apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
STEP 2 : After adding percona repository, Update apt cache so that new packages can be included in our apt-cache.
$ apt-get update

STEP 3 : Install Percona XtraDB Cluster :

$ apt-get install -y percona-xtradb-cluster-56 qpress xtrabackup

STEP 4 : Install additional package for editing files, downloading etc :

$ apt-get install -y python-software-properties vim wget curl netcat

With the above steps we have, installed Percona XtraDB Cluster on every node. Now we’ll configure each node, so that a cluster of three nodes can be formed.

Node Configuration:

Add/Modify file /etc/mysql/my.cnf on first node :

[MYSQLD] #This section is for mysql configuration
user = mysql
default_storage_engine = InnoDB
basedir = /usr
datadir = /var/lib/mysql
socket = /var/run/mysqld/mysqld.sock
port = 3306
innodb_autoinc_lock_mode = 2
log_queries_not_using_indexes = 1
max_allowed_packet = 128M
binlog_format = ROW
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_node_address = 192.168.1.2
wsrep_cluster_name="newcluster"
wsrep_cluster_address = gcomm://192.168.1.2,192.168.1.3,192.168.1.4
wsrep_node_name = cluster1
wsrep_slave_threads = 4
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst:secret

[sst] #This section is for sst(state snapshot transfer) configuration
streamfmt = xbstream

[xtrabackup] #This section is defines tuning configuration for xtrabackup
compress
compact
parallel = 2
compress_threads = 2
rebuild_threads = 2

Note :
         wsrep_node_address = {IP of current node}
         wsrep_cluster_name= {Name of cluster}
         wsrep_cluster_address = gcomm://{Comma separated IP address’s which are in cluster}
         wsrep_node_name = {This is name of current node which is used to identify it in cluster}

Now as we have done node configuration. Now start first node services:
Start the node :

$ service mysql bootstrap-pxc

Make sst user for authentication of cluster nodes :

$ mysql -e "GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret';"

Check cluster status :

$ mysql -e "show global status like 'wsrep%';"

Configuration file for second node:

[MYSQLD]
user = mysql
default_storage_engine = InnoDB
basedir = /usr
datadir = /var/lib/mysql
socket = /var/run/mysqld/mysqld.sock
port = 3306
innodb_autoinc_lock_mode = 2
log_queries_not_using_indexes = 1
max_allowed_packet = 128M
binlog_format = ROW
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_node_address = 192.168.1.3
wsrep_cluster_name="newcluster"
wsrep_cluster_address = gcomm://192.168.1.2,192.168.1.3,192.168.1.4
wsrep_node_name = cluster2
wsrep_slave_threads = 4
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst:secret

[sst]
streamfmt = xbstream

[xtrabackup]
compress
compact
parallel = 2

After doing configuration, start services of node 2.
Start node 2 :

$ service mysql start

Check cluster status :

$ mysql -e "show global status like 'wsrep%';"

Now similarly you have to configure node 3. Changes are listed below.

Changes in configuration for node 3 :
wsrep_node_address = 192.168.1.4

wsrep_node_name = cluster3

Start node 3 :

$ service mysql start

Test percona XtraDb cluster:

Log-in by mysql client in any node:

<code prettyprint="" style="color: black; word-wrap: no
mysql>create database opstree;
mysql>use opstree;
mysql>create table nu113r(name varchar(50));
mysql>insert into nu113r values("zukin");
mysql>select * from nu113r;

Check the database on other node by mysql client:

mysql>show databases;

Note : There should be a database named “opstree”.

mysql>use opstree;
mysql>select * from nu113r; 

Note : Data will be same as in the previous node.

Kernel-based Virtualization Machine

What is virtualizatin?
“Virtualization is a technology that combines or divides computing resources to present one or many operating environments using methodologies like hardware and software partitioning or aggregation, partial or complete machine simulation, emulation, time-sharing, and many others.”

This means, that virtualization uses technology to abstract from the real hardware and provides isolated environments, so called Virtual Machines. They are capable to run various applications or even a whole operating system. A goal not mentioned in the definition is to have nearly to native performance for running VMS. This is a very important point, because the users always want to get the most out of their hardware. Most of them are not willing to introduce virtualiztion technology, if a huge amount of CPU power is wasted by managing VMs.

Kernel-based Virtual Machine :
KVM is the first virtualization solution that has been integration into the vanilla Linux kernel. KVM has been initially developed by Qumranet, a small company located in Isreal. Redhat acquired Qumranet in September 2008, when KVM become more production ready. They see KVM as next generation of virtualiztion technology.

KVM Architecture:
Linux has all the mechanism a VMM needs to operate several VMs. KVM is implemented as a kernel module that can be loaded to extend Linux by these capabilities.
In a normal Linux environment each process runs either in user-mode or in kernel-mode. KVM introduces a third, the guest-mode. Therefore it relies on a

virtualization capable CPU either Intel VT or AMD-SVM extensions. A process in guest-mode has its own kernel-mode and user-mode. Thus, it is able to run an opeating system. Such processes are representing the VMs running on a KVM host. What the modes are used for from a hosts point of view:

  • user-mode: I/O when guest needs to access devices.
  • kernel-node: switch into guest-mode and handle exits due to I/O operations
  • guest-mode: execute guest code, which is the guest OS except I/O

Resource Management:
To increase the reuse code as mush as possible they mainly modified the Linux memory management, to allow mapping physical memory into the VMs address space. Therefore they added shadow page tables, that were needed in the early days of x8 virtualization.
The scheduler of an operating system computers an order in that each process is assigned to one of the variable CPUs. In this way, all running process is assigned to one of the available CPUs. In this way, all running processes are share the computing time. Since the KVM developers wanted to reuse most of the mechanisms of Linux. They simply implemented each VM as a process, relying on its scheduler to assign computing power to the VMs.

The KVM control interface: 
Once the KVM kernel module has been loaded, the /dev/kvm device node appears in the file-system that represents the interface of KVM. It allows to control the hypervisor through a set of ioctls, commonly used in certain operating systems as an interface for processes running in user-mode to communicate with a driver. The ioctl() system call allows to execute several operations to create new virtual machines, assign memory to a virtual machine, assign and start virtual CPUs.

Emulation of Hardware:
To provide hardware like hard disks, cd drivers or networks  cards to the VMs, KVM uses a highly modified QEMU, This is a so called platform virtulization tool, which allows to emulate  an entire PC platform including graphics, net working, disk drives and many more.

Execution-Model:
Figure describe the execution model of KVM. This is the loop of the actions used to operate the VMs. These actions are separated by the three modes.

let see the which tasks are done in which mode:

  • user-mode: The KVM module is called using ioctl() to execute guest code until I/O operations initiated by the guest or an external event occurs. Such an event may be the arrival of a network package, which could be the reply of a network package sent by the host earlier. Such events are expressed as signals that leads to an interruption of guest code execution.
  • Kernel-mode: The kernel causes the hardware to execute guest code natively. IF the processor exits the guest due to pending memory or I/O operations, the kernel performs the necessary tasks and resumes the flow of execution. If external events such as signals or I/O operations initiated by the guest exists, it exits to the user-mode.
  • guest-mode: This is on the hardware level, where the extended instruction set of  virtualization capable CPU is used to execute the native code, until an instruction called that needs assistance by KVM, a fault or an external interrupt.

While a VM runs, There are plenty of switches between these modes.

Chef Journey

I’m starting a blog series on chef where I would be taking you to a journey of managing my current infrastructure using Chef. To start with these are the high level tasks lists that I’ve in mind:

  • User Management : User’s creation or deletion on an environment(Dev/QA/Staging/Production) should be managed by chef, along with kind of access on the environment i.e read-only access, root access, or adding a user to some groups.
  • VPN Setup : Currently we are using openvpnas for managing secured access to our environment, it is manual right now so the vpn set-up will also be done by chef.
  • Apache Setup : We are using apache as web server that sits in front of our app server and also provides SSL.
  • Jar App : We have a SOA based set-up in which we have multiple micro java services, so we would be using chef to manage those jar app i.e deploying those jar app’s, starting/stopping those jar app’s.
  • Tomcat : Another major component type in our application are web apps that are hosted on tomcat server, the tomcat server is not managed as a service instead we create tomcat as an app user along with tomcat management scripts.
  • Mongo : We use replicated mongo as No SQL database in our application.
  • Logstash : For managing logs we are using log stash in a clustered set-up where all the log agents publish the logs to a central server and then served by Kibana, so this complete setup should also be managed by chef
  • ActiveMQ : We are using ActiveMQ for our queuing purpose

This list is not complete surely, I’ll be adding many more tasks in this list as I proceed in setting up my environment using chef as this is the first time I’ll be doing a set-up using Chef, but this list will be a good starting point.

Before jumping into creating the Chef cookbooks, runlists or data bags I’ve to setup the base infrastructure of Chef that is Chef Server to which all chef agents talk to, a chef workstation which would be updating the server with the configurations and a git repo to keep track of all my configuration as shown in the image given below.

In the next blog I’ll talk about how I’ll set-up a chef server. Let me know if you have any inputs for me or suggestion that how I should proceed with the chef set-up.

VPC per envrionvment versus Single VPC for all environments

This blog talks about the two possible ways of hosting your infrastructure in Cloud, though it will be more close to hosting on AWS as it is a real life example but this problem can be applied to any cloud infrastructure set-up. I’m just sharing my thoughts and pros & cons of both approaches but I would love to hear from the people reading this blog about their take as well what do they think.

Before jumping right away into the real talk I would like to give a bit of background on how I come up with this blog, I was working with a client in managing his cloud infrastructure where we had 4 environments dev, QA, Pre Production and Production and each environment had close to 20 instances, apart from applications instances there were some admin instances as well such as Icinga for monitoring, logstash for consolidating logs, Graphite Server to view the logs, VPN server to manage access of people.

At this point we got into a discussion that whether the current infrastructure set-up is the right one where we are having a separate VPC per environment or the ideal setup would have been a single VPC and the environments could have been separated by subnet’s i.e a pair of subnet(public private) for each environment

Both approaches had some pros & cons associated with them

Single VPC set-up

Pros:

  1. You only have a single VPC to manage
  2. You can consolidate your admin app’s such as Icinga, VPN server.

Cons:

  1. As you are separating your environments through subnets you need granular access control at your subnet level i.e instances in staging environment should not be allowed to talk to dev environment instances. Similarly you have to control access of people at granular level as well
  2. Scope of human error is high as all the instances will be on same VPC.

VPC per environment setup

Pros:

  1. You have a clear separation between your environments due to separate VPC’s.
  2. You will have finer access control on your environment as the access rules for VPC will effectively be access rules for your environments.
  3. As an admin it gives you a clear picture of your environments and you have an option to clone you complete environment very easily.

Cons:

  1. As mentioned in pros of Single VPC setup you are at some financial loss as you would be duplicating admin application’s across environments

In my opinion the decision of choosing a specific set-up largely depends on the scale of your environment if you have a small or even medium sized environment then you can have your infrastructure set-up as “All environments in single VPC”, in case of large set-up I strongly believe that VPC per environment set-up is the way to go.

Let me know your thoughts and also the points in favour or against of both of these approaches.

Chef Solo an Introduction

Introduction

Chef Solo is simple way to begin working with Chef. It is an open source version of the chef-client that allows using cookbooks with nodes without requiring access to a server. Chef Solo runs locally and requires that a cookbook (and any of its dependencies) be on the same physical disk as the node. It is a limited-functionality version of the chef-client and does not support the following:

  • Node data storage
  • Search indexes
  • Centralized distribution of cookbooks
  • A centralized API that interacts with and integrates infrastructure components
  • Authentication or authorization
  • Persistent attributes  

Installing chef-client  (Pre-requisite : curl )
Login to your box and run the following command to install the chef. Make sure that curl program is available on your box.

 curl -L https://www.opscode.com/chef/install.sh | bash  
cropinstall.jpg
To check if the installation was successful check the version of the installed chef-solo by:
 chef-solo -v  

version.jpg                                  

Making Chef Repository
Next step is to setup a file structure that will help organize various Chef files. Opscode, the makers of Chef provide one sample structure. They call it simply the Chef Repository.

 wget http://github.com/opscode/chef-repo/tarball/master  
structure.jpg 
 tar zxf master 
 mv opscode-chef-repo-**** chef-repo/ 
structure1.jpg
Assign cookbook’s path to the newly created cookbook directory inside the Chef Repository which will hold the cookbook

  mkdir .chef  
  echo "cookbook_path ['/root/chef-repo/cookbooks' ]" > .chef/knife.rb   
  knife cookbook site download apt  
.Chef folder

For Chef Solo this directory generally contains only knife.rb file. A knife.rb file is used to specify the chef-repo-specific configuration details for Knife. This file is the default configuration file and is loaded every time this executable is run. The configuration file is located at: ~/.chef/knife.rb. If a
knife.rb file is present in the . chef/knife.rb directory in the chef-repo, the settings contained within that file will override the default configuration settings. Sample content of knife.rb file can be:
 cookbook_path [ '/root/chef-repo/cookbooks' ]  
 role_path [ '/root/chef-repo/roles' ]  
 environment_path [ ' /root/chef-repo/environments ' ]  
 data_bag_path [ ' /root/chef-repo/data_bags ' ]  
Getting Started with Chef Solo
Before we’re able to run Chef Solo on our servers, we will need to add two files to our local Chef repository: solo.rb and node.json.
The solo.rb file tells Chef Solo where to find the cookbooks, roles, and data bags.

The node.json file sets the run list (and any other node-specific attributes if required).

     Create a solo.rb file inside our Chef repository with the following contents:
       current_dir = File.expand_path(File.dirname(__FILE__))  
       file_cache_path "#{current_dir}"  
       cookbook_path "#{current_dir}/cookbooks"  
       role_path "#{current_dir}/roles"  
       data_bag_path "#{current_dir}/data_bags"  
      
      Create a file called node.json inside your Chef repository with the following contents:
       {  
            "run_list": [ "recipe[]" ]  
       }  
      
                  

      Example:- 
      In this example i am going to install apt cookbook and the recipe which i am going to use is apt and here is my solo.rb and node.json files looks like
      solo1.jpg
      Our first Chef run  Goto chef-repo folder and execute following command

       chef-solo -c solo.rb -j node.json  
      run1.jpg
       
      run_last1.jpg

      How it works:

      1. solo.rb configures Chef Solo to look for its cookbooks, roles, and data bags   inside the current directory: the Chef repository.
           2. Chef Solo takes its node configuration from a JSON file, in our example we simply        called it node.json. If we’re going to manage multiple servers, we’ll need a separate    &nbsp &nbsp &nbsp file.
      1. Then, Chef Solo just executes a Chef run based on the configuration data found in
                 solo.rb and node.json