Hortonworks Data Platform Administration: Deployment of Hortonworks HDP 2.2 using Ambari 1.7, Ambari Views, add-ons & Configuration of Yarn Capacity & Fair Scheduler.


In HDP administration certification, one of major important task is to set up the deployment of HDP cluster using Apache Ambari either on-premise  or in any cloud vendor’s cluster(e.g Amazon EC2, Microsoft Azure or Google Cloud Platform). HDP deployment & administration facility is available using Apache Ambari on both AWS EC2 & Azure VM linux platform.

In this video, we provided step by step guidance on installation of HDP 2.2 using Ambari on AWS Elastic Cloud cluster platform large instance, since deploying version is HDP 2.2, along with the basic steps of VM image creation, password-less SSH authentication, setting up secure encryption (.pem) file, installation of Ambari on RHEL 6.5, configuration & starting the service before HDP installation.

The master & slave nodes are deployed into separate clusters & total seven (7) VM s are utilized for the demo.

 

In the next video, We have shown the latest updates of Ambari 1.7 on Hortonworks HDP 2.2 version clusters, Ambari views & several new add-ons which makes easy for configuration of Yarn Capacity schedulers & Yarn Fair Schedulers, job versioning concepts, easy addition of new hosts in the production cluster, downloading additional components of hive settings XML configuration file(e.g. hive-site.xml) on local system.

 

Detail sessions are available for candidates looking for Hortonworks HDP administration training(certifications). You can contact us if you are looking for an online instructor led real-world industry expert based training course.

 

Hadoop Administration: Installation scripts of Apache Hadoop (2.6.0) on Ubuntu Unicorn as Multi-Node cluster


Recently , just published a quick step by step guide on deployment of Apache Hadoop (2.6.0) single -node cluster on Ubuntu unicorn(14.10) image, you can get the full installation video here.

 

Here, the full deployment of Apache Hadoop (2.6.0) multi-node cluster setup details are provided. The primary hardward requirements are needed to run the setup :

1. VMware Player/Workstation(if Windows/Linux) or VMware Fusion(if OSX)

2. More than 4 GB of RAM for primary OS

3. More than 60 GB of Disk space

4. Intel VT-X capable processor.

5. Ubuntu/CentOs/Red Hat/Sese OS Image(as guest OS)

Now, the step by step multinode hadoop clustering  scripts are provided.

 

Checkout the Ipaddress of each master & slaves node:

$ifconfig

Namenode > hadoopmaster > 192.168.23.132

Datanodes > hadoopslave1 > 192.168.23.133
hadoopslave2 > 192.168.23.134
hadoopslave3 > 192.168.23.135

Clone Hadoop Single node cluster as hadoopmaster

Hadoopmaster Node

$ sudo gedit /etc/hosts

hadoopmaster   192.168.23.132
hadoopslave1   192.168.23.133
hadoopslave2   192.168.23.134
hadoopslave3   192.168.23.135

$ sudo gedit /etc/hostname

hadoopmaster

$ cd /usr/local/hadoop/etc/hadoop

$ sudo gedit core-site.xml

replace localhost as hadoopmaster

$ sudo gedit hdfs-site.xml

replace value as 3 (represents no of datanode)

          $ sudo gedit yarn-site.xml

add the following configuration

<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster:8025</value>
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster:8030</value>
<property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster:8050</value>
<property>
</configuration>

$ sudo gedit mapred-site.xml.template
replace mapreduce.framework.name as mapred.job.tracker

replace yarn as hadoopmaster:54311

$ sudo rm -rf /usr/local/hadoop/hadoop_data

Shutdown hadoopmaster node

Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3

Hadoopslave Node (conf should be done on each slavenode)

$ sudo gedit /etc/hostname

hadoopslave<nodenumberhere>

          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

$ sudo chown -R trainer:trainer /usr/local/hadoop

          $ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

remove dfs.namenode.dir property section

reboot all nodes

Hadoopmaster Node

          $ sudo gedit /usr/local/hadoop/etc/hadoop/masters

hadoopmaster

$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

remove localhost and add

hadoopslave1
hadoopslave2
hadoopslave3

$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

   remove dfs.datanode.dir property section

          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ sudo chown -R trainer:trainer /usr/local/hadoop

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopmaster

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave1

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave2

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave3

$ ssh hadoopmaster

$ exit

$ ssh hadoopslave1

$ exit

$  ssh hadoopslave2

$ exit

$ ssh hadoopslave3

$ exit

$ hadoop namenode -format

$ start-all.sh

$ jps (check in all 3 datanodes)


for checking Hadoop web console :

http://hadoopmasteripaddress :8088/
http://hadoopmasteripaddress :50070/
http://hadoopmasteripaddress :50090/

http://hadoopmasteripaddress  :50075/

 

Installation Commands of Apache Hadoop 2.6.0 as Single Node Pseudo-Distributed mode on Ubuntu 14.10 (Step by Step)


$ sudo apt-get update

$ sudo apt-get install default-jdk

$ java -version

$ sudo apt-get install ssh

$ sudo apt-get install rsync

$ ssh-keygen -t dsa -P ‘ ‘ -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

$ wget -c http://mirror.olnevhost.net/pub/apache/hadoop/common/current/hadoop-2.6.0.tar.gz

$ sudo tar -zxvf hadoop-2.6.0.tar.gz

$ sudo mv hadoop-2.6.0 /usr/local/hadoop

$ update-alternatives –config java

$ sudo gedit ~/.bashrc

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

Now apply the variables.

$ source ~/.bashrc

There are a number of xml files within the Hadoop folder that require editing which are:

  • mapred-site.xml
  • yarn-site.xml
  • core-site.xml
  • hdfs-site.xml
  • hadoop-env.sh

The files can be found in /usr/local/hadoop/etc/hadoop/.First copy the mapred-site template file over and then edit it.

mapred-site.xml

mapreduce-xml

Next, go to the following path.

$ cd /usr/local/hadoop/etc/Hadoop

Add the following text between the configuration tabs.

mapred-site.xml.template

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

yarn-site.xml

Add the following text between the configuration tabs.

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

core-site.xml

Add the following text between the configuration tabs.
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

hdfs-site.xml

Add the following text between the configuration tabs.

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoopuser/hadoopspace/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoopuser/hadoopspace/hdfs/namenode/datanode</value>
</property>

Note other locations can be used in hdfs by separating values with a comma, e.g.

file:/home/hadoopuser/hadoopspace/hdfs/datanode, .disk2/Hadoop/datanode, . .

hadoop-env.sh

Add an entry for JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/namenode

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/datanode

$ sudo chown hadoopuser:hadoopuser -R /usr/local/hadoop

Next format the namenode.

hdfs-format

Issue the following commands.

./start-dfs.sh
./start-yarn.sh

StartDemons

Issue the jps command and verify that the following jobs are running:

jps

At this point Hadoop has been installed and configured

type on terminal ,

firefox http://localhost:50070(namenode)

firefox http://localhost:50075(datanode)

firefox http://localhost:50090(checkpoint namenode)

firefox http://localhost:8088(Yarn Cluster)

Hadoop-namenode

MapReduce

A lap around the latest PowerBI annoucements , Socrata OData Feed & RealTime Fast Streaming Data Analytics


Last month, 27th february 2015 , there are some new awesome features connected with Microsoft PowerBI, lets have a quick look at this, first of all , in this release , the powerbi comes out of office 365 & Microsoft Office veils & you can now connect your data not only from Excel workbooks /Azure but also from PowerBI Designer files, Sendgrid, SalesForce CRM, Microsoft SQL Server Analysis Service, Azure Stream Analytics(private preview).

In the first demo, I’ve collected real time data from White House Visitors Records directory using OData feed by Socrata api using this link http://open.whitehouse.gov/OData.svc/p86s-ychb from Excel -> PowerQuery-> OData Feed or Excel-> Data-> OData Feed option.

PowerQuery

 

 

Next, import data into PowerPivot table & build out the linked tables to put out the powerview dashboard.

 

White-House

 

Also, you can sign up for PowerBI public preview dashboard here , but make sure that the preview is now available for users in United States only.

The PowerMap tour is compiled along with the latest features introduced as Custom Maps in PowerMap & rich set of effects. The powermap tour on White House Visitors records index analysis is available on Youtube.

Upload the excel PowerView Dashboard workbook on PowerBI public preview portal & you can view the amazing experience including PowerQ&A without the environment of Office 365.

PowerBI-PublicPreview

 

In new powerbi public preview portal , lots of option by which you can import data like SQL Server Analysis Service, Excel workbook, PowerBI Designer files, SendGrid, SalesForce CRM, Microsoft Dynamics, Marketo, GitHub, ZenDesk etc.

Get-Data

The new powerbi designer file is available for free download by this link & some spectacular views have been introduced in the designer preview like Tree charts, Gauge, Combo, Tabular etc.

Designer

 

 

In the next demo, I extracted real time 9-1-1 call records index data from http://data.seattle.gov/ & analysed 911 call records index over 2 days , possible report locations, types of reports all over US & of course over greater Seattle Area.

 

What’s new in Azure SDK 2.5 & Visual Studio 2013 Update 4


Recently, after playing enough with Azure Stream Analytics , it’s time to move on with azure .net development & a new version of Azure sdk is published. Let’s have a quick overview on latest azure sdk.

First of all, lets download the sdk from webpi console, as directed ‘Microsoft Azure SDK 2.5 for .NET(VS 2013)

webpi

In this edition, there are few new components added like as:

i) EnvironmentTools.VS.msi

ii) HiveODBC32.msi

iii)HiveODBC64.msi

iv) Microsoft.Azure.HDInsightTools-x64.msi

v) Microsoft.Azure.HDInsightTools-x86.msi

so on…

Components

Now, after installing sdk 2.5 , lets start with Visual Studio 2013.

Vs2013-sdk2.5

Expand on ‘QuickStart’ under ‘Cloud’ & start exploring options to create AppService , Compute & DataService directly from VS 2013 /2012 itself.

sdk2.5

 

The default ‘DataBlobStorage1′ sample would be created in VS to create blob container, create a block blob/page blob, upload a new blob , delete a blob (all basic CRUD operations on blob using REST)

BlobStorage-VS

Next, the major improvements is done on Azure HDinsight shell integration into Visual Studio onto which you can now run your custom Hive table queries on HDFS of HDInsight clusters. Lets create a sample Hive query file on VS 2013.

Lets move into HDInsight tab on left side of VS installed menu & select HDInsight’ & select ‘HiveApplication’ to start with new Hive-ql. For this demo, I am selecting Hive Sample from VS.

HDI

 

On selecting Hive sample, I would be able to open the sample Hive queries on ‘weblogAnalysis.hql‘  & ‘sensordataAnalysis.hql’ from Azure HDinsight cluster.

Here goes a sample weblogAnalysis.hql:

DROP TABLE IF EXISTS weblogs;
— create table weblogs on space-delimited website log data.
— In this sample we will use the default container. You could also use ‘wasb://[container]@[storage account].blob.core.windows.net/Path/To/Data/’ to access the data in other containers.
CREATE EXTERNAL TABLE IF NOT EXISTS weblogs(s_date date, s_time string, s_sitename string, cs_method string, cs_uristem string,
cs_uriquery string, s_port int, cs_username string, c_ip string, cs_useragent string,
cs_cookie string, cs_referer string, cs_host string, sc_status int, sc_substatus int,
sc_win32status int, sc_bytes int, cs_bytes int, s_timetaken int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘
STORED AS TEXTFILE LOCATION ‘/HdiSamples/WebsiteLogSampleData/SampleLog/’
TBLPROPERTIES (‘skip.header.line.count’=’2′);

 

Before proceeding with the realtime hive queries, we need to make sure that the Azure HDI cluster is already provisioned & it might be either a simple Hadoop HDI cluster, HBase HDI cluster or Storm HDI cluster to build hive tables on top of it.

sensorhql-vs

There’s a new option came out for Azure HDI cluster to add custom powershell scripts while provisioning a HDI cluster using azure portal. Also, new additions of HDI cluster is exploration of R(official cran packages) & Apache Spark on hdinsight hdfs cluster which will be covered with demo next.

A brief tour on Windows 10 Preview


Today, Microsoft officially announced the preview of Windows 10 which is going to be available through Windows Insider program. Everyone is excited to download & install the new preview feature which is upcoming after Windows 8.1 , Windows Phone 8.1 & Windows Server 2012 R2. Well , a lot of new enhancements & magnificent charms like come back of ‘Start‘ menu button on desktop, snap enhancements, new task view button & lots more.

Shared an exciting presentation regarding the upcoming preview edition of Windows 10 & a few more features of it.

 

 

Lets wait for first download time for the next omni-WINDOWS.

Predictive Analytics of UK Electoral Decisions using PowerBI for Office 365


There was significant breaking update over last few days regarding Scotland voting referendum 2014, while in social media magnificently came up millions of tweets, likes , shares & overall big sentiment & prediction details about Scotland’s next future declaration.  In this demo, we would roll over quite a similar social ramp-up of predictive analysis of Voting results of UK over 2014 & 2009 using Microsoft PowerBI & Office 365.

First, throughout the demo, I used the powerbi components like PowerPivot, PowerQuery, PowerView & PowerMap along with PowerQ&A integrated with office 365. Lets start to consume the dataset from ‘online search‘ feature of PowerQuery. Searched here coined the term as ‘UK parliament elections prediction’ & selected the related OData feed URL.

online

Using PowerQuery editor, analyse & transform the data for processing & feeding into data-model.

Voting Data

Next, after building the data-model , featuring appropriate keys with datasets, first build -up the sample powerpivot dashboard.

Voting

To figure-out powerview reports , simply click on PowerView tab & start build Prediction analysis results of UK electoral decisions over 2014 & 2009.

PowerView

 

The predictive analytics of UK electoral decisions on 2014 & 2009 has been depicted with respected with representations data & key value of data differentiation which displays analysis through stacked bar & data representations key over entire electoral regions.

Next, Click on ‘Map’ icon & select ‘Launch Power Map‘ to build up PowerMap of 3D visualization on predicted analysed result set over the regions of United Kingdom.

 

icons

Create first a new ‘Tour’ & add layer to start move over 3D visualization with realistic dashboard views. For this demo, I used ‘electoral regions’ as ‘country‘ field to locate the geography on map.

PowerMap

I created a video presentation of the powermap 3D visualization tour of predictive analytics results of UK over 2014 & 2009.

Next, Check on PowerBI on office 365, you need to have either E3 /E4 subscription of Office 365 tenant or otherwise go for a trial account provisioning from here.

After provisioning PowerBI for Office 365, you need to add permissions for SharePoint users. Add ‘PowerBI for Office 365′ tenant under your subscription & move to ‘sites‘ category & click on ‘team site‘ app.

Next, inside ‘team site’ portal , you will be able to see the option ‘site content‘ , clicking on it jump to ‘PowerBI‘ section for the office 365 site.

 

site

 

PowerBI

Next, after entering into PowerBI tab , add/drag your excel 2013 workbook containing PowerView , PowerMap dashboards into Office 365 portal.

O365

Now, add some natural language enhanced Power Q & A on your analytics dashboard , click on option ‘Add to PowerQ&A‘ & start frame up relative questions to build up real time analytics dashboard on office 365.

For example, in this demo, I utilized the sample queryset as ‘show representations on 2014 by representation in 2009‘ on powerQ&A query bar.

PowerQ&A

‘Show Representations by Electoral Regions on 2014′ used as a search term & portrayed the predicted result as like this.

 

KeyQ&A

Also, visualizing the PowerBI site on o365 is overwhelming in terms of real time analysis all over the dataset & collaborating with the team.

Dashboardo365

 

Lastly, to access the real time predictive analytics report on PowerBI is accessible through PowerBI app on Windows Store which leverages to share , collaborate your analytics results on any device & enables to view it anywhere , anytime .

WinPowerBI

Follow

Get every new post delivered to your Inbox.

Join 196 other followers

%d bloggers like this: