What’s new in Azure SDK 2.5 & Visual Studio 2013 Update 4


Recently, after playing enough with Azure Stream Analytics , it’s time to move on with azure .net development & a new version of Azure sdk is published. Let’s have a quick overview on latest azure sdk.

First of all, lets download the sdk from webpi console, as directed ‘Microsoft Azure SDK 2.5 for .NET(VS 2013)

webpi

In this edition, there are few new components added like as:

i) EnvironmentTools.VS.msi

ii) HiveODBC32.msi

iii)HiveODBC64.msi

iv) Microsoft.Azure.HDInsightTools-x64.msi

v) Microsoft.Azure.HDInsightTools-x86.msi

so on…

Components

Now, after installing sdk 2.5 , lets start with Visual Studio 2013.

Vs2013-sdk2.5

Expand on ‘QuickStart’ under ‘Cloud’ & start exploring options to create AppService , Compute & DataService directly from VS 2013 /2012 itself.

sdk2.5

 

The default ‘DataBlobStorage1’ sample would be created in VS to create blob container, create a block blob/page blob, upload a new blob , delete a blob (all basic CRUD operations on blob using REST)

BlobStorage-VS

Next, the major improvements is done on Azure HDinsight shell integration into Visual Studio onto which you can now run your custom Hive table queries on HDFS of HDInsight clusters. Lets create a sample Hive query file on VS 2013.

Lets move into HDInsight tab on left side of VS installed menu & select HDInsight’ & select ‘HiveApplication’ to start with new Hive-ql. For this demo, I am selecting Hive Sample from VS.

HDI

 

On selecting Hive sample, I would be able to open the sample Hive queries on ‘weblogAnalysis.hql‘  & ‘sensordataAnalysis.hql’ from Azure HDinsight cluster.

Here goes a sample weblogAnalysis.hql:

DROP TABLE IF EXISTS weblogs;
— create table weblogs on space-delimited website log data.
— In this sample we will use the default container. You could also use ‘wasb://[container]@[storage account].blob.core.windows.net/Path/To/Data/’ to access the data in other containers.
CREATE EXTERNAL TABLE IF NOT EXISTS weblogs(s_date date, s_time string, s_sitename string, cs_method string, cs_uristem string,
cs_uriquery string, s_port int, cs_username string, c_ip string, cs_useragent string,
cs_cookie string, cs_referer string, cs_host string, sc_status int, sc_substatus int,
sc_win32status int, sc_bytes int, cs_bytes int, s_timetaken int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘
STORED AS TEXTFILE LOCATION ‘/HdiSamples/WebsiteLogSampleData/SampleLog/’
TBLPROPERTIES (‘skip.header.line.count’=’2’);

 

Before proceeding with the realtime hive queries, we need to make sure that the Azure HDI cluster is already provisioned & it might be either a simple Hadoop HDI cluster, HBase HDI cluster or Storm HDI cluster to build hive tables on top of it.

sensorhql-vs

There’s a new option came out for Azure HDI cluster to add custom powershell scripts while provisioning a HDI cluster using azure portal. Also, new additions of HDI cluster is exploration of R(official cran packages) & Apache Spark on hdinsight hdfs cluster which will be covered with demo next.

Advertisements

An Overview of Latest Components of Azure HDInsight – Apache Tez, Yarn (MapReduce 2.0) Apache Storm & Kafka with HDP 2.1


Azure HDInsight 3.1 built on Hortonworks HDP 2.1 consists of lots of important components of hadoop 2.x like highly data-streaming component ‘Apache Tez’, the next generation(ngen) mapreduce 2.0 or ‘YARN’ node on top of HDFS along with realtime data streaming engine ‘Apache Storm’ & distributed message processing framework ‘Apache Kafka’. In this demo, we’ll check a little configuration info on each components running on Azure HDI cluster(3.1).

First, provision a HBase type HDI cluster through Azure PowerShell .

HBase

Next, you can check the provisioned Hbase HDI cluster on Azure Portal & enable RDP on it.

RDP

Next, On HDI cluster, first check the hadoop-components by browsing the directory ‘C:\apps\dist‘ where , you should see all components of HDP2.1 is prepared except Apache Storm.

Tez

Now, Tez -0.4.0.2.1.5.0-2057 is configured itself with HDI 3.1 Hbase cluster so, can check the hadoop-config page to run hive queries with Tez. For that, on cluster desktop, check the Yarn config page which clarifies the Yarn node status.

tez-hive

Now, Similarly, check the tez-site.xml , for configuration level & DAG node status purpose.

tez-site

Next, jump back to previous directory ‘C:\apps\‘ & write in search-pane on windows explorer ‘Storm‘. Copy the ‘storm-0.9.1.2.1.5.0-2057.zip‘ & paste it into ‘C:\apps\dist\‘ & then unzip it. Under .\bin directory find the Storm.cmd file which is needed for running Storm-Zookeeper, Storm-Nimbus, Storm-Supervisor & UI daemons.

First, configure the Storm.yaml with IPV4 address of HDI cluster then start executing first Storm-zookeeper nodes, master & slave daemon.

zookeeper

nimbus

Start the Supervisor (Worker) daemon job.

supervisor

And, at last start the UI job.

storm

Storm-UI can be viewed via web interface through browser on port 8080.

Storm UI

Next, to configure Apache Kafka for distributed message processing, we need to first download the stable version of kafka, I used here Kafka-0.8. You can download it from github repository as .zip https://github.com/apache/kafka

Now, after unzipping it , paste to same directory ‘C:\apps\dist\‘ with other components & start installation of Apache Kafka 0.8 on Azure HDI.

Before to do it, replace the windows .bat files under ‘C:\apps\dist\kafka-0.8\bin\windows\‘ with the latest kafka batch files for windows which can be downloaded from here.

Set the Java ClassPath on Hadoop command line or PowerShell as ‘Set Path=C:\apps\dist\java\bin

Next, update the scala & packages through the following commands.

.\sbt.bat update

kafka-sbt& then the list of commands :

.\sbt.bat package
.\sbt.bat assembly-package-dependency

After that, start the Zookeeper-server before starting Kafka-server.

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

kafka-zookeeper-start

Now, Start the Kafka server by running the following command.

.\bin\windows\kafka-server-start.bat .\config\server.properties

kafka-server-startNext, Create a Topic to post messages using the following command.

.\bin\windows\kafka-create-topic.bat --zookeeper localhost:2181 --replica 1 --partition 1 --topic test

kafka-topic
You can check the list of topics by using the following command.
.\bin\windows\kafka-list-topic.bat --zookeeper localhost:2181
List-topics

On Getting Success message, next start to post message on kafka cluster.Before that , start the console-producer by using the command.

.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test


SendMessage
Next, start the console-consumer by executing the following command.

.\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic test --from-beginning

Kafka-HDIThe following screenshot displays the demo of running Apache Kafka-0.8 clusters(Producers & Consumers) on Azure Hbase HDI 3.1 cluster.
 
%d bloggers like this: