Apache Cassandra Migration: 3.x to 4.x Ep: 2 DC and DR Setup

Well in my previous blog, we learned about Cassandra’s basics. If you have not read it yet, you should go through it. We have discussed the basics of Cassandra which will be useful in your daily operations on the database.

So now we will deep-dive into Cassandra’s DC/DR Setup.

DC/DR setup is necessary in a production environment where you don’t know when an issue can occur. You need to have an immediate backup when your cluster is down, and you should always have another cluster to respond.

Cassandra is a database and for a database, we want it to remain up in any and every situation to avoid downtime of our applications. Disaster Recovery setup of databases is equally necessary as you do for your applications. So let’s get started with this super easy way where it will take a few minutes and make your DR Setup ready.

While Designing DR in Cassandra you just have to keep 2 files in mind.

Cassandra.yaml is the heart of your Cassandra Cluster. The below file you can use as it is while setting up the cluster. The points that need to be different are already listed below the image

cluster_name: Opstree-Datacentre2
num_tokens: 256
auto_bootstrap: false
hinted_handoff_enabled: true
max_hint_window: 3h
hinted_handoff_throttle: 1024KiB
max_hints_delivery_threads: 4
hints_directory: /var/lib/cassandra/hints
hints_flush_period: 10000ms
max_hints_file_size: 128MiB
auto_hints_cleanup_enabled: false
batchlog_replay_throttle: 1024KiB
authenticator: PasswordAuthenticator
#authenticator: AllowAllAuthenticator
authorizer: CassandraAuthorizer
#authorizer: AllowAllAuthorizer
role_manager: CassandraRoleManager
#network_authorizer: AllowAllNetworkAuthorizer
roles_validity: 2000ms
permissions_validity: 2000ms
credentials_validity: 2000ms
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
  - /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
cdc_enabled: false
disk_failure_policy: stop
commit_failure_policy: stop
prepared_statements_cache_size: null
key_cache_size: null
key_cache_save_period: 4h
row_cache_size: 0MiB
row_cache_save_period: 0s
counter_cache_size: null
counter_cache_save_period: 7200s
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period: 10000ms
commitlog_segment_size: 32MiB
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "10.21.2.1,172.16.0.2"
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity: null
index_summary_resize_interval: 60m
trickle_fsync: false
trickle_fsync_interval: 10240KiB
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.21.2.1
start_native_transport: true
native_transport_port: 9042
native_transport_allow_older_protocols: true
rpc_address: 10.21.2.1
rpc_keepalive: true
incremental_backups: true
snapshot_before_compaction: false
auto_snapshot: true
snapshot_links_per_second: 0
column_index_size: 64KiB
column_index_cache_size: 2KiB
concurrent_materialized_view_builders: 1
compaction_throughput: 64MiB/s
sstable_preemptive_open_interval: 50MiB
uuid_sstable_identifiers_enabled: false
read_request_timeout: 800ms
range_request_timeout: 10000ms
write_request_timeout: 1000ms
counter_write_request_timeout: 5000ms
cas_contention_timeout: 1000ms
truncate_request_timeout: 60000ms
request_timeout: 1000ms
##slow_query_log_timeout: 500ms
endpoint_snitch: GossipingPropertyFileSnitch
dynamic_snitch_update_interval: 100ms
dynamic_snitch_reset_interval: 600000ms
dynamic_snitch_badness_threshold: 1
server_encryption_options:
  internode_encryption: none
  legacy_ssl_storage_port_enabled: false
  keystore: conf/.keystore
  keystore_password: cassandra
  require_client_auth: false
  truststore: conf/.truststore
  truststore_password: cassandra
  require_endpoint_verification: false
client_encryption_options:
  enabled: false
  keystore: conf/.keystore
  keystore_password: cassandra
  require_client_auth: false
internode_compression: all
inter_dc_tcp_nodelay: false
trace_type_query_ttl: 1d
trace_type_repair_ttl: 7d
user_defined_functions_enabled: false
scripted_user_defined_functions_enabled: false
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
replica_filtering_protection:
  cached_rows_warn_threshold: 2000
  cached_rows_fail_threshold: 32000
batch_size_warn_threshold: 5KiB
batch_size_fail_threshold: 50KiB
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold: 100MiB
compaction_tombstone_warning_threshold: 100000
audit_logging_options:
  enabled: false
  logger:
    - class_name: BinAuditLogger
diagnostic_events_enabled: false
repaired_data_tracking_for_range_reads_enabled: false
repaired_data_tracking_for_partition_reads_enabled: false
report_unconfirmed_repaired_data_mismatches: false
materialized_views_enabled: false
sasi_indexes_enabled: false
transient_replication_enabled: false
drop_compact_storage_enabled: false

The properties would be the same in DC/DR the difference you need to make are listed below

cluster_name: The name should be the same in both the cluster so you don’t need to change your application properties. If one is down you already have a second cluster up with the same name. It’s like an active-active cluster setup
Snitches: GossipingPropertyFileSnitch should be there if you have 2 clusters in a single cluster you can use SimpleSnitch. But I would recommend you to use GossipingPropertyFileSnitch only because in the future if you want to switch from SimpleSnitch it would be difficult and you end up adding an extra step in your migration. I was lucky enough because my current cluster was on GPFS only
Seed_provider: These are the captains of your both clusters on DC and DR. You need to choose one IP from both the clusters like i have chosen above
listen_address: The should be unique. It would be the IP address or hostname that Cassandra binds to for connecting this node to other nodes
rpc_address: The should be unique. It would be the IP address or hostname that Cassandra binds to for listening on client connections

cassandra-rackdc.properties:

Cassandra-rack.properties: DC1 Cluster

Cassandra-rack.properties: DC2 Cluster

Rack.properties is where the whole game is played of distributing (DC/DR) setup. It only has two properties to look for

Datacentre: This would be the name of your Datacentre and it would be separate in both Clusters For example
- Cluster 1: Opstree-Datacentre1
- Cluster 2: Opstree-Datacentre2
Rack: It would be like in AWS you have Region and then you have availability zones. So Regions would be your data centre and AZs are your Racks Examples:
- us-west ⇒ Opstree-Datacentre1: Rack: us-west-1a
  - us-west ⇒ Opstree-Datacentre2: Rack: us-west-2b

Now your Configuration is done

Now its time to restart service and your cluster is up in DC and DR both

NOOOOOOOOOO!!!!!!!!!!!!!!! Where is my data in New Cluster

Source: https://cloudinary.com/blog/evolution_of_img_gif_without_the_gif

We have just made the DC/DR Setup. Data is still not copied to do so follow these easy steps:

You need to alter your keyspaces and a few of the Cassandra keyspaces that are created by default
By default, the Strategy of your keyspaces is SimpleStrategy you need to change it to Network Strategy

Login into your CQLSH

ALTER below keyspaces created by default

ALTER KEYSPACE system_distributed WITH replication = {'class': 'NetworkTopologyStrategy', ‘Opstree-Datacentre1’: ‘3’, ‘Opstree-Datacentre2’: ‘3’} ;

ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'Opstree-Datacentre1': '3', 'Opstree-Datacentre2': '0'} AND durable_writes = true;

ALTER KEYSPACE system_traces WITH replication = {'class': 'NetworkTopologyStrategy', 'Opstree-Datacentre1': '3', 'Opstree-Datacentre2': '0'} AND durable_writes = true;

system_distributed, system_auth, and system_traces should be changed from SimpleStategy to Network Strategy

ALTER your keyspace that you want to replicate in the DR setup

ALTER KEYSPACE <<opstree>> WITH replication = {'class': 'NetworkTopologyStrategy', 'Opstree-Datacentre1': '3', 'Opstree-Datacentre2': '0'} AND durable_writes = true;

The command you need to run will be nodetool repair from another cluster. On each new node in the cluster, it will sync and make it identical to your other cluster.
nodetool repair -dc Opstree-Datacentre1 (In each new node). Running this command will make your cluster in sync and repair your node from faulty data

Wrapping Up

This is how we did our DC/DR setup in Cassandra. In the next blog, we will discuss how we can Migrate the cluster from 3x to 4x. That would be quite interesting too, so stay tuned…

Meanwhile, if you have any questions do let us know through comments.

Blog Pundits: Prakash Jha and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us