An Overview of Latest Components of Azure HDInsight – Apache Tez, Yarn (MapReduce 2.0) Apache Storm & Kafka with HDP 2.1


Azure HDInsight 3.1 built on Hortonworks HDP 2.1 consists of lots of important components of hadoop 2.x like highly data-streaming component ‘Apache Tez’, the next generation(ngen) mapreduce 2.0 or ‘YARN’ node on top of HDFS along with realtime data streaming engine ‘Apache Storm’ & distributed message processing framework ‘Apache Kafka’. In this demo, we’ll check a little configuration info on each components running on Azure HDI cluster(3.1).

First, provision a HBase type HDI cluster through Azure PowerShell .

HBase

Next, you can check the provisioned Hbase HDI cluster on Azure Portal & enable RDP on it.

RDP

Next, On HDI cluster, first check the hadoop-components by browsing the directory ‘C:\apps\dist‘ where , you should see all components of HDP2.1 is prepared except Apache Storm.

Tez

Now, Tez -0.4.0.2.1.5.0-2057 is configured itself with HDI 3.1 Hbase cluster so, can check the hadoop-config page to run hive queries with Tez. For that, on cluster desktop, check the Yarn config page which clarifies the Yarn node status.

tez-hive

Now, Similarly, check the tez-site.xml , for configuration level & DAG node status purpose.

tez-site

Next, jump back to previous directory ‘C:\apps\‘ & write in search-pane on windows explorer ‘Storm‘. Copy the ‘storm-0.9.1.2.1.5.0-2057.zip‘ & paste it into ‘C:\apps\dist\‘ & then unzip it. Under .\bin directory find the Storm.cmd file which is needed for running Storm-Zookeeper, Storm-Nimbus, Storm-Supervisor & UI daemons.

First, configure the Storm.yaml with IPV4 address of HDI cluster then start executing first Storm-zookeeper nodes, master & slave daemon.

zookeeper

nimbus

Start the Supervisor (Worker) daemon job.

supervisor

And, at last start the UI job.

storm

Storm-UI can be viewed via web interface through browser on port 8080.

Storm UI

Next, to configure Apache Kafka for distributed message processing, we need to first download the stable version of kafka, I used here Kafka-0.8. You can download it from github repository as .zip https://github.com/apache/kafka

Now, after unzipping it , paste to same directory ‘C:\apps\dist\‘ with other components & start installation of Apache Kafka 0.8 on Azure HDI.

Before to do it, replace the windows .bat files under ‘C:\apps\dist\kafka-0.8\bin\windows\‘ with the latest kafka batch files for windows which can be downloaded from here.

Set the Java ClassPath on Hadoop command line or PowerShell as ‘Set Path=C:\apps\dist\java\bin

Next, update the scala & packages through the following commands.

.\sbt.bat update

kafka-sbt& then the list of commands :

.\sbt.bat package
.\sbt.bat assembly-package-dependency

After that, start the Zookeeper-server before starting Kafka-server.

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

kafka-zookeeper-start

Now, Start the Kafka server by running the following command.

.\bin\windows\kafka-server-start.bat .\config\server.properties

kafka-server-startNext, Create a Topic to post messages using the following command.

.\bin\windows\kafka-create-topic.bat --zookeeper localhost:2181 --replica 1 --partition 1 --topic test

kafka-topic
You can check the list of topics by using the following command.
.\bin\windows\kafka-list-topic.bat --zookeeper localhost:2181
List-topics

On Getting Success message, next start to post message on kafka cluster.Before that , start the console-producer by using the command.

.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test


SendMessage
Next, start the console-consumer by executing the following command.

.\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic test --from-beginning

Kafka-HDIThe following screenshot displays the demo of running Apache Kafka-0.8 clusters(Producers & Consumers) on Azure Hbase HDI 3.1 cluster.
 
Advertisements

About Anindita
Anindita Basak is working as Big Data Cloud Consultant in Microsoft. Worked in multiple MNCs as Developer & Senior Developer on Microsoft Azure, Data Platform, IoT & BI , Data Visualization, Data warehousing & ETL & of course in Hadoop platform.She played both as FTE & v- employee in Azure platform teams of Microsoft.Passionate about .NET , Java, Python & Data Science. She is also an active Big Data & Cloud Trainer & would love share her experience in IT Training Industry. She is an author, forum contributor, blogger & technical reviewer of various books on Big Data Hadoop, HDInsight, IoT & Data Science, SQL Server PDW & PowerBI.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: