Worldwide EarthQuake Data Analysis (Nepal EarthQuake)- Microsoft PowerBI

Last weekend, we all were horrified by the terrible earthquake attack over Nepal, India & greater Asia, it’s continued over toll of millions of death which has several parameters to consider like ‘depth of KM’ of the earthquake, ‘magnitude of quake, severity of deaths, number of people died on quake, number of people died on shaking effect’ etc.

In this current powerbi demo, we’re using World’s dreadful earthquake incidents happened over the millennium 1900.



On , Excel power view dashboard, here depicted some the latest death toll analysis report of Nepal Earthquake 2015.




In latest, powerbi designer preview, represented worldwide country wise earthquake magnitude data analysis sorted by Total death & depth of KM of quake intensity.






Deployment of Apache Hadoop 2.7.0 on Ubuntu Vivid 15.04 on Azure Linux VM

Recently, on april 21st, the first release of 2015 of Apache Hadoop is committed, the version 2.7.0 is came up as dev edition. Lots of new updates have added.

  • This release drops support for JDK6 runtime & works with JDK 7+ only.
  • This release is yet not ready for production use. But, production users should wait for 2.7.1/2.7.2 release.

In Hadoop common, first time it has got support for Azure Blob storage – blob as file system for Azure.

Other than that, Hadoop HDFS has got support for file truncate, support for quotas per storage type & support for files with variable-length blocks. For Yarn & MapReduce, some of the new pluggins are added like Yarn authorization made pluggable, global caching of YARN localized resources, ability to limit running MapReduce of a job.

Here goes a step by step guide on installation of Apache Hadoop 2.7.0 on Azure Linux Virtual Machine(Ubuntu 15.04) .




Hortonworks Data Platform Administration: Deployment of Hortonworks HDP 2.2 using Ambari 1.7, Ambari Views, add-ons & Configuration of Yarn Capacity & Fair Scheduler.

In HDP administration certification, one of major important task is to set up the deployment of HDP cluster using Apache Ambari either on-premise  or in any cloud vendor’s cluster(e.g Amazon EC2, Microsoft Azure or Google Cloud Platform). HDP deployment & administration facility is available using Apache Ambari on both AWS EC2 & Azure VM linux platform.

In this video, we provided step by step guidance on installation of HDP 2.2 using Ambari on AWS Elastic Cloud cluster platform large instance, since deploying version is HDP 2.2, along with the basic steps of VM image creation, password-less SSH authentication, setting up secure encryption (.pem) file, installation of Ambari on RHEL 6.5, configuration & starting the service before HDP installation.

The master & slave nodes are deployed into separate clusters & total seven (7) VM s are utilized for the demo.


In the next video, We have shown the latest updates of Ambari 1.7 on Hortonworks HDP 2.2 version clusters, Ambari views & several new add-ons which makes easy for configuration of Yarn Capacity schedulers & Yarn Fair Schedulers, job versioning concepts, easy addition of new hosts in the production cluster, downloading additional components of hive settings XML configuration file(e.g. hive-site.xml) on local system.


Detail sessions are available for candidates looking for Hortonworks HDP administration training(certifications). You can contact us if you are looking for an online instructor led real-world industry expert based training course.


Hadoop Administration: Installation scripts of Apache Hadoop (2.6.0) on Ubuntu Unicorn as Multi-Node cluster

Recently , just published a quick step by step guide on deployment of Apache Hadoop (2.6.0) single -node cluster on Ubuntu unicorn(14.10) image, you can get the full installation video here.


Here, the full deployment of Apache Hadoop (2.6.0) multi-node cluster setup details are provided. The primary hardward requirements are needed to run the setup :

1. VMware Player/Workstation(if Windows/Linux) or VMware Fusion(if OSX)

2. More than 4 GB of RAM for primary OS

3. More than 60 GB of Disk space

4. Intel VT-X capable processor.

5. Ubuntu/CentOs/Red Hat/Sese OS Image(as guest OS)

Now, the step by step multinode hadoop clustering  scripts are provided.


Checkout the Ipaddress of each master & slaves node:


Namenode > hadoopmaster >

Datanodes > hadoopslave1 >
hadoopslave2 >
hadoopslave3 >

Clone Hadoop Single node cluster as hadoopmaster

Hadoopmaster Node

$ sudo gedit /etc/hosts


$ sudo gedit /etc/hostname


$ cd /usr/local/hadoop/etc/hadoop

$ sudo gedit core-site.xml

replace localhost as hadoopmaster

$ sudo gedit hdfs-site.xml

replace value as 3 (represents no of datanode)

          $ sudo gedit yarn-site.xml

add the following configuration


$ sudo gedit mapred-site.xml.template
replace as mapred.job.tracker

replace yarn as hadoopmaster:54311

$ sudo rm -rf /usr/local/hadoop/hadoop_data

Shutdown hadoopmaster node

Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3

Hadoopslave Node (conf should be done on each slavenode)

$ sudo gedit /etc/hostname


          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

$ sudo chown -R trainer:trainer /usr/local/hadoop

          $ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

remove dfs.namenode.dir property section

reboot all nodes

Hadoopmaster Node

          $ sudo gedit /usr/local/hadoop/etc/hadoop/masters


$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

remove localhost and add


$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

   remove dfs.datanode.dir property section

          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ sudo chown -R trainer:trainer /usr/local/hadoop

$ sudo ssh-copy-id -i ~/.ssh/ trainer@hadoopmaster

$ sudo ssh-copy-id -i ~/.ssh/ trainer@hadoopslave1

$ sudo ssh-copy-id -i ~/.ssh/ trainer@hadoopslave2

$ sudo ssh-copy-id -i ~/.ssh/ trainer@hadoopslave3

$ ssh hadoopmaster

$ exit

$ ssh hadoopslave1

$ exit

$  ssh hadoopslave2

$ exit

$ ssh hadoopslave3

$ exit

$ hadoop namenode -format


$ jps (check in all 3 datanodes)

for checking Hadoop web console :

http://hadoopmasteripaddress :8088/
http://hadoopmasteripaddress :50070/
http://hadoopmasteripaddress :50090/

http://hadoopmasteripaddress  :50075/


Installation Commands of Apache Hadoop 2.6.0 as Single Node Pseudo-Distributed mode on Ubuntu 14.10 (Step by Step)

$ sudo apt-get update

$ sudo apt-get install default-jdk

$ java -version

$ sudo apt-get install ssh

$ sudo apt-get install rsync

$ ssh-keygen -t dsa -P ‘ ‘ -f ~/.ssh/id_dsa

$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

$ wget -c

$ sudo tar -zxvf hadoop-2.6.0.tar.gz

$ sudo mv hadoop-2.6.0 /usr/local/hadoop

$ update-alternatives –config java

$ sudo gedit ~/.bashrc

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

Now apply the variables.

$ source ~/.bashrc

There are a number of xml files within the Hadoop folder that require editing which are:

  • mapred-site.xml
  • yarn-site.xml
  • core-site.xml
  • hdfs-site.xml

The files can be found in /usr/local/hadoop/etc/hadoop/.First copy the mapred-site template file over and then edit it.



Next, go to the following path.

$ cd /usr/local/hadoop/etc/Hadoop

Add the following text between the configuration tabs.




Add the following text between the configuration tabs.



Add the following text between the configuration tabs.


Add the following text between the configuration tabs.




Note other locations can be used in hdfs by separating values with a comma, e.g.

file:/home/hadoopuser/hadoopspace/hdfs/datanode, .disk2/Hadoop/datanode, . .

Add an entry for JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/namenode

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/datanode

$ sudo chown hadoopuser:hadoopuser -R /usr/local/hadoop

Next format the namenode.


Issue the following commands.



Issue the jps command and verify that the following jobs are running:


At this point Hadoop has been installed and configured

type on terminal ,

firefox http://localhost:50070(namenode)

firefox http://localhost:50075(datanode)

firefox http://localhost:50090(checkpoint namenode)

firefox http://localhost:8088(Yarn Cluster)



A lap around the latest PowerBI annoucements , Socrata OData Feed & RealTime Fast Streaming Data Analytics

Last month, 27th february 2015 , there are some new awesome features connected with Microsoft PowerBI, lets have a quick look at this, first of all , in this release , the powerbi comes out of office 365 & Microsoft Office veils & you can now connect your data not only from Excel workbooks /Azure but also from PowerBI Designer files, Sendgrid, SalesForce CRM, Microsoft SQL Server Analysis Service, Azure Stream Analytics(private preview).

In the first demo, I’ve collected real time data from White House Visitors Records directory using OData feed by Socrata api using this link from Excel -> PowerQuery-> OData Feed or Excel-> Data-> OData Feed option.




Next, import data into PowerPivot table & build out the linked tables to put out the powerview dashboard.




Also, you can sign up for PowerBI public preview dashboard here , but make sure that the preview is now available for users in United States only.

The PowerMap tour is compiled along with the latest features introduced as Custom Maps in PowerMap & rich set of effects. The powermap tour on White House Visitors records index analysis is available on Youtube.

Upload the excel PowerView Dashboard workbook on PowerBI public preview portal & you can view the amazing experience including PowerQ&A without the environment of Office 365.



In new powerbi public preview portal , lots of option by which you can import data like SQL Server Analysis Service, Excel workbook, PowerBI Designer files, SendGrid, SalesForce CRM, Microsoft Dynamics, Marketo, GitHub, ZenDesk etc.


The new powerbi designer file is available for free download by this link & some spectacular views have been introduced in the designer preview like Tree charts, Gauge, Combo, Tabular etc.




In the next demo, I extracted real time 9-1-1 call records index data from & analysed 911 call records index over 2 days , possible report locations, types of reports all over US & of course over greater Seattle Area.


What’s new in Azure SDK 2.5 & Visual Studio 2013 Update 4

Recently, after playing enough with Azure Stream Analytics , it’s time to move on with azure .net development & a new version of Azure sdk is published. Let’s have a quick overview on latest azure sdk.

First of all, lets download the sdk from webpi console, as directed ‘Microsoft Azure SDK 2.5 for .NET(VS 2013)


In this edition, there are few new components added like as:

i) EnvironmentTools.VS.msi

ii) HiveODBC32.msi


iv) Microsoft.Azure.HDInsightTools-x64.msi

v) Microsoft.Azure.HDInsightTools-x86.msi

so on…


Now, after installing sdk 2.5 , lets start with Visual Studio 2013.


Expand on ‘QuickStart’ under ‘Cloud’ & start exploring options to create AppService , Compute & DataService directly from VS 2013 /2012 itself.



The default ‘DataBlobStorage1′ sample would be created in VS to create blob container, create a block blob/page blob, upload a new blob , delete a blob (all basic CRUD operations on blob using REST)


Next, the major improvements is done on Azure HDinsight shell integration into Visual Studio onto which you can now run your custom Hive table queries on HDFS of HDInsight clusters. Lets create a sample Hive query file on VS 2013.

Lets move into HDInsight tab on left side of VS installed menu & select HDInsight’ & select ‘HiveApplication’ to start with new Hive-ql. For this demo, I am selecting Hive Sample from VS.



On selecting Hive sample, I would be able to open the sample Hive queries on ‘weblogAnalysis.hql‘  & ‘sensordataAnalysis.hql’ from Azure HDinsight cluster.

Here goes a sample weblogAnalysis.hql:

— create table weblogs on space-delimited website log data.
— In this sample we will use the default container. You could also use ‘wasb://[container]@[storage account]’ to access the data in other containers.
CREATE EXTERNAL TABLE IF NOT EXISTS weblogs(s_date date, s_time string, s_sitename string, cs_method string, cs_uristem string,
cs_uriquery string, s_port int, cs_username string, c_ip string, cs_useragent string,
cs_cookie string, cs_referer string, cs_host string, sc_status int, sc_substatus int,
sc_win32status int, sc_bytes int, cs_bytes int, s_timetaken int )
STORED AS TEXTFILE LOCATION ‘/HdiSamples/WebsiteLogSampleData/SampleLog/’
TBLPROPERTIES (‘skip.header.line.count’=’2′);


Before proceeding with the realtime hive queries, we need to make sure that the Azure HDI cluster is already provisioned & it might be either a simple Hadoop HDI cluster, HBase HDI cluster or Storm HDI cluster to build hive tables on top of it.


There’s a new option came out for Azure HDI cluster to add custom powershell scripts while provisioning a HDI cluster using azure portal. Also, new additions of HDI cluster is exploration of R(official cran packages) & Apache Spark on hdinsight hdfs cluster which will be covered with demo next.


Get every new post delivered to your Inbox.

Join 201 other followers

%d bloggers like this: