Anindita's Blog

Pushing realtime Sensors data into ASA & visualize into Near Real-Time (NRT) PowerBI dashboard– frontier of IoT

July 2, 2015 Leave a comment

As per as the last demo on IoT foundation stuffs, we’ve seen how it’s possible to leverage the real-time data insights from social media datasets like Twitter with some keywords. In this demo, we are trying to pushing realtime sensors data from Windows Phone device to Azure Stream Analytics (through Service Bus EventHub channels) & after processing in ASA hub publishing out to realtime PowerBI dashboard or near real-time analytics(NRT) on PowerView for Excel by pushing out ASA events to Azure SQL database through Excel PowerQuery.

An overview of n-tier architecture of ASA on IoT foundation is like this:

While, IoT always enables customers to connect their own device on Azure cloud platform & bring out some real business value from it, whether it produces #BigData or #SmallData.

Another topic is pretty important is to get insights from Weblogs or telemetry data which can bring out good sentiment, click stream analytics values with machine learning.

Here goes a good high level discussion from IoT team.

Coming back to the demo, so, first implemented a sample app for generating Accelerometer 3D events (X, Y, Z) on Windows Phone & Windows Store devices(Universal app) & pushing the generated events as block blob to Azure Service Bus Event Hub.

Attached sample code snippet.

private async void ReadingChanged(object sender, AccelerometerReadingChangedEventArgs e)
{

await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
{
AccelerometerReading reading = e.Reading;
ScenarioOutput_X.Text = String.Format(“{0,5:0.00}”, reading.AccelerationX);
ScenarioOutput_Y.Text = String.Format(“{0,5:0.00}”, reading.AccelerationY);
ScenarioOutput_Z.Text = String.Format(“{0,5:0.00}”, reading.AccelerationZ);
i++;

//Coordinate_X = String.Format(“{0,5:00.00}”,Coordinate_X + ScenarioOutput_X.Text);
//Coordinate_Y = String.Format(“{0,5:00.00}”, Coordinate_Y + ScenarioOutput_Y.Text);
//Coordinate_Z = String.Format(“{0,5:00.00}”, Coordinate_Z + ScenarioOutput_Z.Text);
dataDetails = i +”,”+ reading.AccelerationX + “,” + reading.AccelerationY + “,” + reading.AccelerationZ;

NewDataFile += Environment.NewLine + dataDetails;

});
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(“DefaultEndpointsProtocol=https;AccountName=yourazurestorageaccountname;

AccountKey=yourazurestorageaccountkey”);

CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

CloudBlobContainer container = blobClient.GetContainerReference(“accelerometer”);
await container.CreateIfNotExistsAsync();
//if (x == false)
//{
// await container.CreateAsync();
//}

CloudBlockBlob blockBlob = container.GetBlockBlobReference(newFileName);
// bool y = await blockBlob.ExistsAsync();
//if (!blockBlob.Equals(newFileName))
//{
container.GetBlockBlobReference(newFileName);
// await blockBlob.UploadTextAsync(dataDetails);

await blockBlob.UploadTextAsync(Headers + Environment.NewLine+ NewDataFile);
}

You can download the whole visual studio solution on Github.

Next challenge as usual is to send real sensor events to event hubs with accurate consumer key & publish millions of events to event hub at a time.

Here goes sample code snippet.

class Program
{
static string eventHubName = “youreventhubname”;
static string connectionString = GetServiceBusConnectionString();
static string data = string.Empty;
static void Main(string[] args)
{

string csv_file_path = string.Empty;
install();
//string csv_file_path = @””;
string[] filePath = Directory.GetFiles(@”Your CSV Sensor Data file directory”, “*.csv”);
int size = filePath.Length;
for (int i = 0; i < size; i++)
{
Console.WriteLine(filePath[i]);
csv_file_path = filePath[i];
}

DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine(“Rows count:” + csvData.Rows.Count);
DataTable table = csvData;
foreach (DataRow row in table.Rows)
{
// Console.WriteLine(“—Row—“);
foreach (var item in row.ItemArray)
{

data = item.ToString();
Console.Write(data);

var eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, eventHubName);
//while (true)
//{

try
{
foreach (DataRow rows in table.Rows)
{
var info = new Accelerometer
{

ID = rows.ItemArray[0].ToString(),
Coordinate_X = rows.ItemArray[1].ToString(),
Coordinate_Y = rows.ItemArray[2].ToString(),
Coordinate_Z = rows.ItemArray[3].ToString()

};
var serializedString = JsonConvert.SerializeObject(info);
var message = data;
Console.WriteLine(“{0}> Sending events: {1}”, DateTime.Now.ToString(), serializedString.ToString());
eventHubClient.SendAsync(new EventData(Encoding.UTF8.GetBytes(serializedString.ToString())));

}
}
catch (Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(“{0} > Exception: {1}”, DateTime.Now.ToString(), ex.Message);
Console.ResetColor();
}
Task.Delay(200);
//}
}

}
// Console.ReadLine();

Console.WriteLine(“Press Ctrl-C to stop the sender process”);
Console.WriteLine(“Press Enter to start now”);
Console.ReadLine();

// SendingRandomMessages().Wait();

}

public static void install()
{
string url = @”https://…………blob.core.windows.net/accelerometer/AccelerometerSensorData.csv”;
WebClient wc = new WebClient();
wc.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
wc.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
// Console.WriteLine(“Download OnProgress……”);

ConsoleHelper.ProgressTitle = “Downloading”;
ConsoleHelper.ProgressTotal = 10;
for (int i = 0; i <= 10; i++)
{
ConsoleHelper.ProgressValue = i;
Thread.Sleep(500);
if (i >= 5)
{
ConsoleHelper.ProgressHasWarning = true;
}
if (i >= 8)
{
ConsoleHelper.ProgressHasError = true;
}
}
ConsoleHelper.ProgressTotal = 0;
try
{
wc.DownloadFile(new Uri(url), @”\ASA\Sensors\Accelerometer\AccelerometerSensorData.csv”);
}
catch (Exception ex)
{
while (ex != null)
{
Console.WriteLine(ex.Message);
ex = ex.InnerException;
}
}
}
public static void Completed(object sender, AsyncCompletedEventArgs e)
{
Console.WriteLine(“Download Completed!”);
}

public static void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
Console.WriteLine(“{0} Downloaded {1} of {2} bytes,{3} % Complete….”,
(string)e.UserState,
e.BytesReceived,
e.TotalBytesToReceive,
e.ProgressPercentage);
DrawProgressBar(0, 100, Console.WindowWidth, ‘1’);
}

private static void DrawProgressBar(int complete, int maxVal, int barSize, char ProgressCharacter)
{
Console.CursorVisible = false;
int left = Console.CursorLeft;
decimal perc = (decimal)complete / (decimal)maxVal;
int chars = (int)Math.Floor(perc / ((decimal)1 / (decimal)barSize));
string p1 = String.Empty, p2 = String.Empty;

for (int i = 0; i < chars; i++) p1 += ProgressCharacter;
for (int i = 0; i < barSize – chars; i++) p2 += ProgressCharacter;

Console.ForegroundColor = ConsoleColor.Green;
Console.Write(p1);
Console.ForegroundColor = ConsoleColor.DarkGreen;
Console.Write(p2);

Console.ResetColor();
Console.Write(“{0}%”, (perc * 100).ToString(“N2”));
Console.CursorLeft = left;
}
private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
string data = string.Empty;
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { “,” });
csvReader.HasFieldsEnclosedInQuotes = true;

//read column names
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();

for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == “”)
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);

}
}
}
catch (Exception ex)
{

}
return csvData;
}

Now, built out ASA SQL query with specific window interval like in this demo, used ‘SlidingWindow(Second,no of interval)’ which generates computation on event hubs data based on the specific time interval mentioned in window.

Next, start implement the processed output visualization on PowerBI preview portal by selecting ‘Output’ tab of ASA job. Once, you provide all the dataset name of output & start the ASA job, on PowerBI portal, would be able to see the specific dataset is created with a small yellow star icon beside.

Here goes a step by step demonstration with video available on my Youtube channel.

Filed under Azure HDInsight, Azure IoT & Data Science, Hadoop, Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals, Windows Phone 8, Windows Store Universal Apps Tagged with Azure IOT, Azure Machine Learning, Azure Stream Analytics, Excel PowerQuery, IOT in Telemetry, Microsoft IOT, Microsoft PowerBI, Sentiment Analytics

Microsoft IoT Foundation: Realtime Tweets Streaming into Azure Stream Analytics with PowerBI & PowerBI Designer Preview

June 24, 2015 Leave a comment

The Azure Stream Analytics(ASA) is one of the major component of Microsoft #IoT foundation which has got ‘PowerBI‘ as its output connector for visualization of realtime data streaming into Event hub to Stream Analytics hub, just one month back as ‘public preview’.

In this demo, we’re going to focus to end to end realtime Tweets analytics collecting through Java code using ‘Twitter4j’ library, then store it into OneDrive storage as .csv file as well as storing it into Azure storage as block blob. Then, sending realtime tweets streamed into Service Bus Event Hubs for processing , so, after creating the stream analytics job make sure that the input connector is properly selected as data stream for ‘event hub’, then process ASA SQL query with specific ‘HoppingWindow(second,3) & ‘SlidingWindow(Minute,10,5) with overlapping/non-overlapping window frame of data streaming.

Finally , select the output connector as PowerBI & authorize with your organisational account. Once, your ASA job starts running, you would be able to see the powerbi dataset which you have selected as powerbi output dataset name, start building the ASA connected PowerBI report & Dashboard.

First, a good amount of real tweets are collected based on the specific keywords like #IoT, #BigData, #Analytics, #Windows10, #Azure, #ASA, #HDI, #PowerBI, #AML, #ADF etc.

The sample tweets are looks like this

DateTime,TwitterUserName,ProfileLocation,MorePreciseLocation,Country,TweetID
06/24/2015 07:25:19,CodeNotFound,France,613714525431418880
06/24/2015 07:25:19,sinequa,Paris – NY- London – Frankfurt,613714525385289728
06/24/2015 07:25:20,RavenBayService,Calgary, Alberta,613714527302098944
06/24/2015 07:25:20,eleanorstenner,,613714530112274432
06/24/2015 07:25:21,ISDI_edu,,613714530758230016
06/24/2015 07:25:23,muthamiphilo,Kenya,613714541562740736
06/24/2015 07:25:23,tombee74,ÜT: 48.88773,2.23806,613714541931851776
06/24/2015 07:25:25,EricLibow,,613714547975790592

Now, the data is sent to event hub for realtime processing & we’ve written the ASA-SQL like this.

CREATE TABLE input(
DateTime nvarchar(MAX),
TwitterUserName nvarchar(MAX),
ProfileLocation nvarchar(MAX),
MorePreciseLocation nvarchar(MAX),
Country nvarchar(MAX),
TweetID nvarchar(MAX))
SELECT input.DateTime, input.TwitterUserName,input.ProfileLocation,
input.MorePreciseLocation,input.Country,count(input.TweetID) as TweetCount
INTO output
FROM input Group By input.DateTime, input.TwitterUserName,input.ProfileLocation,input.MorePreciseLocation,
input.Country, SlidingWindow(second,10)

Next, start build up the PowerBI report on PowerBI preview portal. Once you build the Dashboard with report by pinning the graphs, it would like something like this.

You could be able to visualize the realtime update of data like #total tweet counts on the specific keywords, #total twitterusername tweeted , #total tweetloation etc.

In another demo, we’ve used the PowerBI Designer preview tool by collecting processed tweets coming out from ASA hub to ‘Azure Blob Storage’ & then picking it into ‘PowerBI Designer Preview’.

In latest PBI , we’ve got support of combo stacked chart, which we’ve utilized to depict #average tweetcount of those specific keywords by location & timeframe for few minutes & seconds interval.

Also, you could support for well end PowerQ&A features as well like ‘PowerBI for Office 365’ which has natural language processing (NLP) backed by Azure Machine Learning processing power enabled.

like if I throw a question on these realworld streaming dataset on PowerQ&A

show tweetcount where profilelocation is bayarea & London, Auckland, India, Bangalore,Paris as stacked column chart

After that, save the PBI designer file as .pbix & upload into www.powerbi.com , under get data->Local File section. It has got support for uploading PBI designer file as well as data source connector.

Upon uploading, built out the dashboard which has got facility of schedule refresh on preview portal itself. Right click on your PBI report on portal, select settings to open the schedule refresh page.

Here goes the realtime scheduled refresh dashboard of Twitter IoT Analytics on realtime tweets.

The same PBI dashboards can be visualized from the ‘PowerBI app for Windows Store or iOS’ . Here goes a demonstration.

Filed under Azure HDInsight, Hadoop, Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals, Windows Store Universal Apps Tagged with ADF, AML, Analytics, ASA, Azure Data Platform, Azure HDInsight, Azure Machine Learning, Azure Stream Analytics, BigData Design Pattern, data streaming, Hadoop, Interactive Data Streaming, Microsoft Azure IOT, Microsoft Big Data, Microsoft PowerBI, PowerBI App for Windows & iOS, realtime data streaming, Service Bus Event Hubs, Twitter Analytics, Windows10 IoT

Deployment of Apache Oozie 4.1.0 in Hadoop Cluster & schedule a MR job with Oozie

June 8, 2015 Leave a comment

In this demo, step by step instructions are provided to deploy Apache Oozie on hadoop & how to execute a job through MapReduce in oozie.

If we plan to install Oozie-4.0.1 or prior version JDK 1.6 required , if the jdk edition on Ubuntu is greater than or equal 1.7, then need to make changes in pom.xml file.
If we install oozie-4.1.0 or later, then jdk 1.7 is fine
Mapreduce job history server need to be configured & started successfully & remaining hadoop & yarn daemons should be running fine..
Hadoop should be running, i.e hdfs, mapreduce, yarn services should be running fine..install hadoop 2.6.0 is compatible with the version of oozie 4.1.0

5. In this video , I’ve depicted step by step guide on installation of Apache Oozie on hadoop cluster & starting Oozie web console.

Once, the oozie installation is done successfully, then start scheduling a Map-reduce job on hadoop cluster using oozie.

6. First, extract the Oozie-examples.tar.gz file

$ cd $OOZIE_HOME
$ tar -xvf oozie-examples.tar.gz

7. Next, Edit the job.properties file of oozie-examples directory.

$/usr/local/oozie/oozie-bin$ find examples/ -name “job.properties” -exec sed -i “s/localhost:8020/localhost:9000/g” ‘{}’ \;
$/usr/local/oozie/oozie-bin$ find examples/ -name “job.properties” -exec sed -i “s/localhost:8021/localhost:8032/g” ‘{}’ \;

8. Now, checkout the job.properties file located in the directory $OOZIE_HOME/examples/apps/map-reduce

9. Finally, copy the local files to HDFS , to start the MR jobs with Oozie.

$ hadoop fs -put examples examples

$ hdfs dfs -mkdir -p /user/oozieuser/examples/apps/map-reduce/lib

$hdfs dfs -copyFromLocal $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar /user/oozieuser/examples/apps/map-reduce/lib/hadoop-mapreduce-examples-2.6.0.jar
$ cd /usr/local/oozie/oozie-bin/
$/usr/local/oozie/oozie-bin$ oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run
job: 0000000-150216182818445-oozie-user-W
:/usr/local/oozie/oozie-bin$

10. Once , you have submitted the job on mapreduce node scheduled through oozie, checkout the status of the job execution.

$ usr/local/oozie/oozie-bin$ oozie job -oozie http://localhost:11000/oozie -info 0000000-150216182818445-oozie-user-W
$ usr/lib/oozie/oozie-bin$ oozie job -oozie http://localhost:11000/oozie -log 0000000-150216182818445-oozie-user-W

Filed under Azure HDInsight, Hadoop, Microsoft Azure Tagged with Apache Oozie, Azure Linux, MapReduce job scheduling Oozie, Oozie job co-ordinator, Oozie job scheduling, Oozie web console, Oozie workflow

Deployment of Hortonworks Data Platform (HDP 2.2.4) using Apache Ambari 2.0 on Azure Linux VM

May 27, 2015 Leave a comment

Recently, Apache Ambari 2.0 is released with several exciting features like Ambari stacks, views & ambari monitoring , metrices. Using Ambari 2.0, HDP 2.2.4 can be deployed which contains default support of Apache Spark, Apache Knox, Ambari Metrices, Apache Ranger etc(apart of other hadoop ecosystem components).

In this following demo from this video, we’ve depicted the steps of provisioning an azure linux centos 6.5 node, configuration of the node due to deployment of ambari, installation of ambari, starting ambari server/agent & finally deployment of HDP.

Filed under Hadoop Tagged with Apache Ambari, Apache Knox, Apache Ranger, Azure HDInsight, Azure IAAS, Hortonworks

Worldwide EarthQuake Data Analysis (Nepal EarthQuake)- Microsoft PowerBI

May 1, 2015 Leave a comment

Last weekend, we all were horrified by the terrible earthquake attack over Nepal, India & greater Asia, it’s continued over toll of millions of death which has several parameters to consider like ‘depth of KM’ of the earthquake, ‘magnitude of quake, severity of deaths, number of people died on quake, number of people died on shaking effect’ etc.

In this current powerbi demo, we’re using World’s dreadful earthquake incidents happened over the millennium 1900.

On , Excel power view dashboard, here depicted some the latest death toll analysis report of Nepal Earthquake 2015.

In latest, powerbi designer preview, represented worldwide country wise earthquake magnitude data analysis sorted by Total death & depth of KM of quake intensity.

Filed under Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals, Windows Store Universal Apps Tagged with Big Data Visualization, Microsoft PowerBI, Nepal Earthquake 2015, PowerBI Designer, PowerMap, PowerQuery, PowerView

Deployment of Apache Hadoop 2.7.0 on Ubuntu Vivid 15.04 on Azure Linux VM

May 1, 2015 1 Comment

Recently, on april 21st, the first release of 2015 of Apache Hadoop is committed, the version 2.7.0 is came up as dev edition. Lots of new updates have added.

This release drops support for JDK6 runtime & works with JDK 7+ only.
This release is yet not ready for production use. But, production users should wait for 2.7.1/2.7.2 release.

In Hadoop common, first time it has got support for Azure Blob storage – blob as file system for Azure.

Other than that, Hadoop HDFS has got support for file truncate, support for quotas per storage type & support for files with variable-length blocks. For Yarn & MapReduce, some of the new pluggins are added like Yarn authorization made pluggable, global caching of YARN localized resources, ability to limit running MapReduce of a job.

Here goes a step by step guide on installation of Apache Hadoop 2.7.0 on Azure Linux Virtual Machine(Ubuntu 15.04) .

Filed under Azure HDInsight, Azure PowerShell, Hadoop, Microsoft Azure Tagged with Apache Hadoop, Azure IAAS, Azure Linux, Hadoop administration, Hadoop Common, Hadoop HDFS, Hadoop YARN, PowerShell

Hortonworks Data Platform Administration: Deployment of Hortonworks HDP 2.2 using Ambari 1.7, Ambari Views, add-ons & Configuration of Yarn Capacity & Fair Scheduler.

April 1, 2015 Leave a comment

In HDP administration certification, one of major important task is to set up the deployment of HDP cluster using Apache Ambari either on-premise or in any cloud vendor’s cluster(e.g Amazon EC2, Microsoft Azure or Google Cloud Platform). HDP deployment & administration facility is available using Apache Ambari on both AWS EC2 & Azure VM linux platform.

In this video, we provided step by step guidance on installation of HDP 2.2 using Ambari on AWS Elastic Cloud cluster platform large instance, since deploying version is HDP 2.2, along with the basic steps of VM image creation, password-less SSH authentication, setting up secure encryption (.pem) file, installation of Ambari on RHEL 6.5, configuration & starting the service before HDP installation.

The master & slave nodes are deployed into separate clusters & total seven (7) VM s are utilized for the demo.

https://onedrive.live.com/?cid=7f185b0e5a1ba82e&id=7F185B0E5A1BA82E%2136999&action=Share

In the next video, We have shown the latest updates of Ambari 1.7 on Hortonworks HDP 2.2 version clusters, Ambari views & several new add-ons which makes easy for configuration of Yarn Capacity schedulers & Yarn Fair Schedulers, job versioning concepts, easy addition of new hosts in the production cluster, downloading additional components of hive settings XML configuration file(e.g. hive-site.xml) on local system.

Detail sessions are available for candidates looking for Hortonworks HDP administration training(certifications). You can contact us if you are looking for an online instructor led real-world industry expert based training course.

Filed under Azure HDInsight, Hadoop, Microsoft PowerBI Tagged with HDP Ambari, HDP Azure Linux, HDP deployment 2.2, HDP deployment AWS EC2, HDP Yarn Schedulers, Hortonworks Certifications, Hortonworks HDP administration, Linux Administration

Hadoop Administration: Installation scripts of Apache Hadoop (2.6.0) on Ubuntu Unicorn as Multi-Node cluster

March 8, 2015 Leave a comment

Recently , just published a quick step by step guide on deployment of Apache Hadoop (2.6.0) single -node cluster on Ubuntu unicorn(14.10) image, you can get the full installation video here.

Here, the full deployment of Apache Hadoop (2.6.0) multi-node cluster setup details are provided. The primary hardward requirements are needed to run the setup :

1. VMware Player/Workstation(if Windows/Linux) or VMware Fusion(if OSX)

2. More than 4 GB of RAM for primary OS

3. More than 60 GB of Disk space

4. Intel VT-X capable processor.

5. Ubuntu/CentOs/Red Hat/Sese OS Image(as guest OS)

Now, the step by step multinode hadoop clustering scripts are provided.

Checkout the Ipaddress of each master & slaves node:

$ifconfig

Namenode > hadoopmaster > 192.168.23.132

Datanodes > hadoopslave1 > 192.168.23.133
hadoopslave2 > 192.168.23.134
hadoopslave3 > 192.168.23.135

Clone Hadoop Single node cluster as hadoopmaster

Hadoopmaster Node

$ sudo gedit /etc/hosts

hadoopmaster 192.168.23.132
hadoopslave1 192.168.23.133
hadoopslave2 192.168.23.134
hadoopslave3 192.168.23.135

$ sudo gedit /etc/hostname

hadoopmaster

$ cd /usr/local/hadoop/etc/hadoop

$ sudo gedit core-site.xml

replace localhost as hadoopmaster

$ sudo gedit hdfs-site.xml

replace value 1 as 3 (represents no of datanode)

$ sudo gedit yarn-site.xml

add the following configuration

<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster:8025</value>
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster:8030</value>
<property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster:8050</value>
<property>
</configuration>

$ sudo gedit mapred-site.xml.template
replace mapreduce.framework.name as mapred.job.tracker

replace yarn as hadoopmaster:54311

$ sudo rm -rf /usr/local/hadoop/hadoop_data

Shutdown hadoopmaster node

Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3

Hadoopslave Node (conf should be done on each slavenode)

$ sudo gedit /etc/hostname

hadoopslave<nodenumberhere>

$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

$ sudo chown -R trainer:trainer /usr/local/hadoop

$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

remove dfs.namenode.dir property section

reboot all nodes

Hadoopmaster Node

$ sudo gedit /usr/local/hadoop/etc/hadoop/masters

hadoopmaster

$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

remove localhost and add

hadoopslave1
hadoopslave2
hadoopslave3

$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

remove dfs.datanode.dir property section

$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ sudo chown -R trainer:trainer /usr/local/hadoop

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopmaster

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave1

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave2

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave3

$ ssh hadoopmaster

$ exit

$ ssh hadoopslave1

$ exit

$ ssh hadoopslave2

$ exit

$ ssh hadoopslave3

$ exit

$ hadoop namenode -format

$ start-all.sh

$ jps (check in all 3 datanodes)

for checking Hadoop web console :

http://hadoopmasteripaddress :8088/
http://hadoopmasteripaddress :50070/
http://hadoopmasteripaddress :50090/

http://hadoopmasteripaddress :50075/

Filed under Azure HDInsight, Hadoop Tagged with Apache Hadoop, AppMaster, DataNode, Hadoop administration, Linux Bash, MapReduce, Namenode HA, Secondary Namenode, YARN

Installation Commands of Apache Hadoop 2.6.0 as Single Node Pseudo-Distributed mode on Ubuntu 14.10 (Step by Step)

February 16, 2015 Leave a comment

$ sudo apt-get update

$ sudo apt-get install default-jdk

$ java -version

$ sudo apt-get install ssh

$ sudo apt-get install rsync

$ ssh-keygen -t dsa -P ‘ ‘ -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

$ wget -c http://mirror.olnevhost.net/pub/apache/hadoop/common/current/hadoop-2.6.0.tar.gz

$ sudo tar -zxvf hadoop-2.6.0.tar.gz

$ sudo mv hadoop-2.6.0 /usr/local/hadoop

$ update-alternatives –config java

$ sudo gedit ~/.bashrc

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

Now apply the variables.

$ source ~/.bashrc

There are a number of xml files within the Hadoop folder that require editing which are:

mapred-site.xml
yarn-site.xml
core-site.xml
hdfs-site.xml
hadoop-env.sh

The files can be found in /usr/local/hadoop/etc/hadoop/.First copy the mapred-site template file over and then edit it.

mapred-site.xml

Next, go to the following path.

$ cd /usr/local/hadoop/etc/Hadoop

Add the following text between the configuration tabs.

mapred-site.xml.template

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

yarn-site.xml

Add the following text between the configuration tabs.

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

core-site.xml

Add the following text between the configuration tabs.
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

hdfs-site.xml

Add the following text between the configuration tabs.

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoopuser/hadoopspace/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoopuser/hadoopspace/hdfs/namenode/datanode</value>
</property>

Note other locations can be used in hdfs by separating values with a comma, e.g.

file:/home/hadoopuser/hadoopspace/hdfs/datanode, .disk2/Hadoop/datanode, . .

hadoop-env.sh

Add an entry for JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/namenode

$ mkdir -p /home/hadoopuser/hadoopspace/hdfs/datanode

$ sudo chown hadoopuser:hadoopuser -R /usr/local/hadoop

Next format the namenode.

Issue the following commands.

./start-dfs.sh
./start-yarn.sh

Issue the jps command and verify that the following jobs are running:

At this point Hadoop has been installed and configured

type on terminal ,

firefox http://localhost:50070(namenode)

firefox http://localhost:50075(datanode)

firefox http://localhost:50090(checkpoint namenode)

firefox http://localhost:8088(Yarn Cluster)

Filed under Azure HDInsight, Hadoop Tagged with AWS EC2, DataNode, Hadoop on Ubuntu 14.10, Hadoop Single Node Installation, MapReduce, Namenode, YARN

A lap around the latest PowerBI annoucements , Socrata OData Feed & RealTime Fast Streaming Data Analytics

February 10, 2015 1 Comment

Last month, 27th february 2015 , there are some new awesome features connected with Microsoft PowerBI, lets have a quick look at this, first of all , in this release , the powerbi comes out of office 365 & Microsoft Office veils & you can now connect your data not only from Excel workbooks /Azure but also from PowerBI Designer files, Sendgrid, SalesForce CRM, Microsoft SQL Server Analysis Service, Azure Stream Analytics(private preview).

In the first demo, I’ve collected real time data from White House Visitors Records directory using OData feed by Socrata api using this link http://open.whitehouse.gov/OData.svc/p86s-ychb from Excel -> PowerQuery-> OData Feed or Excel-> Data-> OData Feed option.

Next, import data into PowerPivot table & build out the linked tables to put out the powerview dashboard.

Also, you can sign up for PowerBI public preview dashboard here , but make sure that the preview is now available for users in United States only.

The PowerMap tour is compiled along with the latest features introduced as Custom Maps in PowerMap & rich set of effects. The powermap tour on White House Visitors records index analysis is available on Youtube.

Upload the excel PowerView Dashboard workbook on PowerBI public preview portal & you can view the amazing experience including PowerQ&A without the environment of Office 365.

In new powerbi public preview portal , lots of option by which you can import data like SQL Server Analysis Service, Excel workbook, PowerBI Designer files, SendGrid, SalesForce CRM, Microsoft Dynamics, Marketo, GitHub, ZenDesk etc.

The new powerbi designer file is available for free download by this link & some spectacular views have been introduced in the designer preview like Tree charts, Gauge, Combo, Tabular etc.

In the next demo, I extracted real time 9-1-1 call records index data from http://data.seattle.gov/ & analysed 911 call records index over 2 days , possible report locations, types of reports all over US & of course over greater Seattle Area.

Filed under Microsoft PowerBI, Microsoft PowerBI Visuals Tagged with Azure Data Analytics, Excel DAX, HDInsight, Microsoft PowerBI, MobileBI, PowerBI Public Preview, PowerMap, PowerQ&A, PowerQuery

← Older posts

Newer posts →

Anindita's Blog

Pushing realtime Sensors data into ASA & visualize into Near Real-Time (NRT) PowerBI dashboard– frontier of IoT

Microsoft IoT Foundation: Realtime Tweets Streaming into Azure Stream Analytics with PowerBI & PowerBI Designer Preview

Deployment of Apache Oozie 4.1.0 in Hadoop Cluster & schedule a MR job with Oozie

Deployment of Hortonworks Data Platform (HDP 2.2.4) using Apache Ambari 2.0 on Azure Linux VM

Worldwide EarthQuake Data Analysis (Nepal EarthQuake)- Microsoft PowerBI

Deployment of Apache Hadoop 2.7.0 on Ubuntu Vivid 15.04 on Azure Linux VM

Hortonworks Data Platform Administration: Deployment of Hortonworks HDP 2.2 using Ambari 1.7, Ambari Views, add-ons & Configuration of Yarn Capacity & Fair Scheduler.

Hadoop Administration: Installation scripts of Apache Hadoop (2.6.0) on Ubuntu Unicorn as Multi-Node cluster

Installation Commands of Apache Hadoop 2.6.0 as Single Node Pseudo-Distributed mode on Ubuntu 14.10 (Step by Step)

A lap around the latest PowerBI annoucements , Socrata OData Feed & RealTime Fast Streaming Data Analytics

Archives

Categories

Like on Facebook

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Archives

Categories

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger