Nifi Cluster Setup with External Zookeeper

Apache NiFi is an open-source data integration and automation tool that enables the automation of data flow between different systems. NiFi provides a user-friendly interface to design, control, and manage the flow of data between various sources and destinations. The tool is particularly useful in handling data from different sources, applying transformations, and routing it to different systems in real-time.

Why use the Nifi cluster over the standalone?

  • Performance: Clusters can handle higher throughput and provide better performance than standalone instances due to load distribution.
  • Fault Tolerance: Clusters provide high availability and fault tolerance such as if one node fails, the other nodes take over the processing.
  • Scalability: Clusters allow for scalability by adding more nodes, whereas standalone instances have limitations in scaling.


Recently while trying to set up Apache Nifi in cluster mode, I faced the challenge of finding proper documentation or any article describing how to do that exactly. In addition configuring the right cluster configurations was not an easy task. In this article, we will set up a three-node Zookeeper and Nifi cluster on Ubuntu. It’s important to note that the steps remain the same for other Linux distributions as well.

Enough said! Now let me show you how we can do it end to end.

Setting up Apache Nifi Cluster with external Zookeeper

ZooKeeper is a centralized service used for maintaining configuration information, naming, providing distributed synchronization, and providing group services within distributed systems. ZooKeeper is designed to be durable and secure. With multiple nodes, there’s added protection against data loss, corruption, and security breaches.

Three Node Zookeeper Cluster Setup

Prerequisites

Before you begin this installation, you will need the following:

  • Three Ubuntu 20.04 servers with a non-root user having sudo privileges.
  • OpenJDK 8 or 17 installed on all servers as Zookeeper requires Java to run.

Step 1 – Downloading and Extracting Zookeeper Binaries.

wget http://archive.apache.org/dist/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz && tar -xvzf apache-zookeeper-3.8.0-bin.tar.gz /opt/
cd /opt && mv apache-zookeeper-3.8.0-bin zookeeper

Step 2 – Creating zoo.cfg file in /opt/zookeeper/conf with configurations below


cat <<EOT >> /opt/conf/zookeeper/conf/zoo.cfg
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=5
server.1=<hostname>:2888:3888
server.2=<hostname>:2888:3888
server.3=<hostname>:2888:3888
EOT

Step 3 – Create a zookeeper directory in the lib folder. That will be the zookeeper’s data directory as mentioned in zoo.cfg file.

sudo mkdir /var/lib/zookeeper
sudo touch /var/lib/zookeeper/myid

Server 1

sudo sh -c "echo '1' > /var/lib/zookeeper/myid"

Server 2

sudo sh -c "echo '2' > /var/lib/zookeeper/myid"

Server 3

sudo sh -c "echo '3' > /var/lib/zookeeper/myid"

Step 4 – Restart the zookeeper on all nodes.

sudo /opt/zookeeper/bin/zkServer.sh  restart

After following the steps above if you check the status of the zookeeper on each node, you will one server elected as leader and the other two as followers.

Three Node Nifi Cluster Setup

Prerequisites

Before you begin this installation, you will need the following:

  • Three Ubuntu 20.04 servers with a non-root user having sudo privileges.
  • OpenJDK 8 or 17 installed on all servers as Zookeeper requires Java to run.

In the next steps, we are going to set up nifi-1.19.1 three-node cluster. But with this version of nifi we can not form a  nifi cluster directly as it requires  Nifi.sensitive.props.key. And to get a value for this key we need to start nifi as standalone on each node first.

Step 1 – Downloading and Extracting Nifi Binaries

wget https://archive.apache.org/dist/nifi/1.19.1/nifi-1.19.1-bin.zip && unzip nifi-1.19.1-bin.zip /opt/
cd /opt && mv nifi-1.19.1 nifi

Step 2 – Downloading and Extracting Nifi toolkit Binaries

wget https://archive.apache.org/dist/nifi/1.19.1/nifi-toolkit-1.19.1-bin.zip && unzip nifi-toolkit-1.19.1-bin.zip /opt/
cd /opt && mv nifi-toolkit-1.19.1 nifi-toolkit

Step 3 – Update nifi.properties file to start nifi on each node. Below are the properties that should be updated for that.


nifi.remote.input.host=<hostname>
nifi.remote.input.socket.port=9997
nifi.web.http.host=<hostname>
nifi.web.http.port=8443

Step 4 – Restart Nifi on all nodes.

/opt/nifi/bin/nifi.sh restart
/opt/nifi/bin/nifi.sh status

Nifi should be running on each node and we can access it on

http://<hostname>:8443/nifi

Note: Restarting nifi as standalone will generate a different value for nifi.sensitive.props.key in the nifi.properties file on each node. If the keys differ across nodes, sensitive information encrypted on one node with one key won’t be decryptable on another node with a different key, also there could be issues with data integrity and consistency, leading to problems in data processing or transfer. Hence it is important to copy the value of nifi.sensitive.props.key from one node to the other two nodes.

Step 5 – Now that Nifi is running as a standalone on each node it’s time to form a three-node nifi cluster and for that configuration below should be updated in /opt/nifi/nifi.properties.


nifi.cluster.is.node=true
nifi.cluster.node.address=<hostname>
nifi.cluster.node.protocol.port=9998
nifi.zookeeper.connect.string=<zookeeperhostname>:2181
nifi.state.management.embedded.zookeeper.start=false

Step 6 – Apart from updating the zookeeper connect string in the nifi.properties file, we need to update the zookeeper hostname in the cluster provider section of /opt/nifi/conf/state-management.xml file.

<property name="Connect String">zookeeperhostname:2181</property>

Step 7 – After updating files in the above steps, we need to restart nifi on each node again to form a cluster of nifi nodes.

/opt/nifi/bin/nifi.sh restart
/opt/nifi/bin/nifi.sh status

Now we should see a cluster being formed if we access nifi on

http://<hostname>:8443/nifi

Conclusion

So in this blog, I have covered the requirements and steps to set up the Nifi cluster with an external zookeeper. In the next blog, I will explain how we can secure nifi cluster with self-signed certificates. If you guys have any ideas or suggestions about my approach, please comment in the comment section. I would really appreciate your suggestions and feedback. Thanks for reading.

Blog Pundits: Pankaj Kumar and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

2 thoughts on “Nifi Cluster Setup with External Zookeeper”

Leave a Reply