Apache Cassandra Migration: 3.x to 4.x Episode: 1 Basics

Well, I am a big fan of Apaches tools after Kafka and Zookeeper this would be my third tool Cassandra and my first database. I and my colleague have previously posted a blog on Kafka too. Please read this also you will also find it useful.

So while working casually like any other day. I just got a call from my manager for Cassandra Migration that to in 14 days. Well frankly speaking I was afraid because I was having zero knowledge of the Cassandra Database. Also, I needed to upgrade the running Cluster

So I accepted this challenge and completed it with no downtime So let’s see how.

So I will Start My Journey Learning Cassandra in this blog, DC/DR Setup of Cassandra in the next, and Migration in the last blog

What is Cassandra?

Cassandra is a NoSQL distributed database that allows you to store you large amount of data. You just need to create Keyspaces and in that, you would have your schema to store your data.

It is best if you want always an uptime for your applications because each node you add to your cluster is identical. Which means it has the same data as its fellow nodes. Also, it can even give a response to your application if the node is down that is something not expected but it is true the concept is called hinted off. Well at the end of the blog, you will fall in love with Cassandra just like me.

Cassandra VS Other Databases

CAP is the main reason. All the popular databases are based on CAP Theorem

  1. Consistency: We have multiple servers all read-write are consistent means all the servers have the same Data
  2. Availability: We have no single point of failure. We always have a failover so no Query fails
  3. Partition Tolerance: Multiple Servers can communicate with each other to increase data. So any other node fails you still have the node data available to another Node

Cassandra exists in Partition Tolerant and Availability and makes the following guarantees.

  1. High Scalability
  2. High Availability
  3. Durability
  4. Eventual Consistency of writes to a single table

How does Cassandra store its Data?

Source: https://cassandra.apache.org/_/cassandra-basics.html

Cassandra stores its data in the form of Rings, Partitioning, and Tokens

Let’s learn this by a small Example:
Like you are three friends, two of them have car of the Same brand VW and one of your friend owns Maruti. So here partition key will be VW. All the owners of VW will be found on the same node and data will be identical across all the nodes and they will have unique token.

VW(Partition Key )—>(Generates HASH Function)—>(Unique Token)234

MarutiPartition Key )—>(Generates HASH Function)—>(Unique Token)214

VW(Partition Key )—>(Generates HASH Function)—>(Unique Token)234

So if you see VW(Partition Key)—->Generates Unique Token that will be common for VW only and will be stored in token 234 in all the nodes.

What is TOKEN?

Token is generally when you create a cluster you assign token to nodes in that cluster in ring format. The maximum number of tokens that can be assigned to a node is 256 

Both Tokens and Partitioning together form a Ring to distribute data in each and every node.

Internal Communication between Nodes

Cassandra uses the mechanism Gossiping protocol. Where nodes keep on poking each other and communicating to make their data consistent

How to Communicate with Cassandra?

Nodetool Utility is the option to communicate with Cassandra.

Basic commands you need to learn before DC/DR setup and migration and mostly this command you will use on performing daily operations

  1. nodetool status : This command displays the number of nodes in the Cluster
  2. nodetool describecluster: Displays information regarding your cluster
  3. nodetool repair: Repairing your cluster and maintaining consistency across cluster
  4. nodetool compactionstats: What are the activity are being conducted in your SStables
  5. nodetool netstats: It gives you information about the node like its mode: JOINING, LEAVING, NORMAL, DECOMMISSIONED, CLIENT.
  6. nodetool tablehistograms <KEYSPACES> <TABLE>: It shows READ/WRITE latency on particular tables in keyspaces

Which Query Language does it follows?

CQL (Cassandra Query language)

Creating a database:

Creation of Schema in databases

Wrapping Up…

So these were some basic steps and commands you should know before going to migration.

In our next blog, we will talk about DC/DR Setup of Cassandra in which we will discuss terms like SimpleStrategy and Cassandra.yaml. So stay tuned for my next blog

Blog Pundits: Bhupender rawat and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

One thought on “Apache Cassandra Migration: 3.x to 4.x Episode: 1 Basics”

Leave a Reply