September | 2014 | Anindita's Blog

Predictive Analytics of UK Electoral Decisions using PowerBI for Office 365

September 21, 2014 Leave a comment

There was significant breaking update over last few days regarding Scotland voting referendum 2014, while in social media magnificently came up millions of tweets, likes , shares & overall big sentiment & prediction details about Scotland’s next future declaration. In this demo, we would roll over quite a similar social ramp-up of predictive analysis of Voting results of UK over 2014 & 2009 using Microsoft PowerBI & Office 365.

First, throughout the demo, I used the powerbi components like PowerPivot, PowerQuery, PowerView & PowerMap along with PowerQ&A integrated with office 365. Lets start to consume the dataset from ‘online search‘ feature of PowerQuery. Searched here coined the term as ‘UK parliament elections prediction’ & selected the related OData feed URL.

Using PowerQuery editor, analyse & transform the data for processing & feeding into data-model.

Next, after building the data-model , featuring appropriate keys with datasets, first build -up the sample powerpivot dashboard.

To figure-out powerview reports , simply click on PowerView tab & start build Prediction analysis results of UK electoral decisions over 2014 & 2009.

The predictive analytics of UK electoral decisions on 2014 & 2009 has been depicted with respected with representations data & key value of data differentiation which displays analysis through stacked bar & data representations key over entire electoral regions.

Next, Click on ‘Map’ icon & select ‘Launch Power Map‘ to build up PowerMap of 3D visualization on predicted analysed result set over the regions of United Kingdom.

Create first a new ‘Tour’ & add layer to start move over 3D visualization with realistic dashboard views. For this demo, I used ‘electoral regions’ as ‘country‘ field to locate the geography on map.

I created a video presentation of the powermap 3D visualization tour of predictive analytics results of UK over 2014 & 2009.

Next, Check on PowerBI on office 365, you need to have either E3 /E4 subscription of Office 365 tenant or otherwise go for a trial account provisioning from here.

After provisioning PowerBI for Office 365, you need to add permissions for SharePoint users. Add ‘PowerBI for Office 365’ tenant under your subscription & move to ‘sites‘ category & click on ‘team site‘ app.

Next, inside ‘team site’ portal , you will be able to see the option ‘site content‘ , clicking on it jump to ‘PowerBI‘ section for the office 365 site.

Next, after entering into PowerBI tab , add/drag your excel 2013 workbook containing PowerView , PowerMap dashboards into Office 365 portal.

Now, add some natural language enhanced Power Q & A on your analytics dashboard , click on option ‘Add to PowerQ&A‘ & start frame up relative questions to build up real time analytics dashboard on office 365.

For example, in this demo, I utilized the sample queryset as ‘show representations on 2014 by representation in 2009‘ on powerQ&A query bar.

‘Show Representations by Electoral Regions on 2014’ used as a search term & portrayed the predicted result as like this.

Also, visualizing the PowerBI site on o365 is overwhelming in terms of real time analysis all over the dataset & collaborating with the team.

Lastly, to access the real time predictive analytics report on PowerBI is accessible through PowerBI app on Windows Store which leverages to share , collaborate your analytics results on any device & enables to view it anywhere , anytime .

Filed under Microsoft PowerBI, Microsoft PowerBI Visuals Tagged with Azure, Machine Learning Visualization, Microsoft PowerBI, Office 365, PowerMap, PowerQ&A, PowerQuery, Predictive analytics

An Overview of Latest Components of Azure HDInsight – Apache Tez, Yarn (MapReduce 2.0) Apache Storm & Kafka with HDP 2.1

September 14, 2014 Leave a comment

Azure HDInsight 3.1 built on Hortonworks HDP 2.1 consists of lots of important components of hadoop 2.x like highly data-streaming component ‘Apache Tez’, the next generation(ngen) mapreduce 2.0 or ‘YARN’ node on top of HDFS along with realtime data streaming engine ‘Apache Storm’ & distributed message processing framework ‘Apache Kafka’. In this demo, we’ll check a little configuration info on each components running on Azure HDI cluster(3.1).

First, provision a HBase type HDI cluster through Azure PowerShell .

Next, you can check the provisioned Hbase HDI cluster on Azure Portal & enable RDP on it.

Next, On HDI cluster, first check the hadoop-components by browsing the directory ‘C:\apps\dist‘ where , you should see all components of HDP2.1 is prepared except Apache Storm.

Now, Tez -0.4.0.2.1.5.0-2057 is configured itself with HDI 3.1 Hbase cluster so, can check the hadoop-config page to run hive queries with Tez. For that, on cluster desktop, check the Yarn config page which clarifies the Yarn node status.

Now, Similarly, check the tez-site.xml , for configuration level & DAG node status purpose.

Next, jump back to previous directory ‘C:\apps\‘ & write in search-pane on windows explorer ‘Storm‘. Copy the ‘storm-0.9.1.2.1.5.0-2057.zip‘ & paste it into ‘C:\apps\dist\‘ & then unzip it. Under .\bin directory find the Storm.cmd file which is needed for running Storm-Zookeeper, Storm-Nimbus, Storm-Supervisor & UI daemons.

First, configure the Storm.yaml with IPV4 address of HDI cluster then start executing first Storm-zookeeper nodes, master & slave daemon.

Start the Supervisor (Worker) daemon job.

And, at last start the UI job.

Storm-UI can be viewed via web interface through browser on port 8080.

Next, to configure Apache Kafka for distributed message processing, we need to first download the stable version of kafka, I used here Kafka-0.8. You can download it from github repository as .zip https://github.com/apache/kafka

Now, after unzipping it , paste to same directory ‘C:\apps\dist\‘ with other components & start installation of Apache Kafka 0.8 on Azure HDI.

Before to do it, replace the windows .bat files under ‘C:\apps\dist\kafka-0.8\bin\windows\‘ with the latest kafka batch files for windows which can be downloaded from here.

Set the Java ClassPath on Hadoop command line or PowerShell as ‘Set Path=C:\apps\dist\java\bin‘

Next, update the scala & packages through the following commands.

.\sbt.bat update

& then the list of commands :

.\sbt.bat package
.\sbt.bat assembly-package-dependency

After that, start the Zookeeper-server before starting Kafka-server.

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties



Now, Start the Kafka server by running the following command.

.\bin\windows\kafka-server-start.bat .\config\server.properties

Next, Create a Topic to post messages using the following command.

.\bin\windows\kafka-create-topic.bat --zookeeper localhost:2181 --replica 1 --partition 1 --topic test


You can check the list of topics by using the following command.

.\bin\windows\kafka-list-topic.bat --zookeeper localhost:2181

On Getting Success message, next start to post message on kafka cluster.Before that , start the console-producer by using the command.

.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test




Next, start the console-consumer by executing the following command.

.\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic test --from-beginning

The following screenshot displays the demo of running Apache Kafka-0.8 clusters(Producers & Consumers) on Azure Hbase HDI 3.1 cluster.

Filed under Azure HDInsight, Azure PowerShell, Microsoft Azure Tagged with Apache Kafka, Apache Kafka on Azure HDInsight, Apache Storm on HDI, Apache Tez, Apache Tez HDInsight, Azure HBase HDInsight, Elastic-Search Hadoop, HBase, HDI, HDP 2.1, Scala Azure HDI, Storm-Kafka Azure HDInsight, Yarn HDInsight

An OverView of HDInsight (Hadoop+HBase) with Integrated PowerShell along with R

September 5, 2014 Leave a comment

Recently, while started the work with Predictive Analytic s with Machine Learning & R , felt the necessity of integration of Azure HDInsight-HBase with Azure ML features. In this demo, we ‘ll go through few basic understandings of operations on HDInsight(Hadoop) on Azure with PowerShell 0.8.6.

To start with, first we need to create an azure storage account which must be in same datacenter (e.g SouthEast Asia for this demo) of HDInsight cluster.

You need also create a blob container & storage context object in order to copy raw data (e.g Click Stream data, log data, machine-sensor data) to local drive to azure storage account.

To Copy data from local drive to Azure Storage container , use the following script.

Next, we need to provision the HDInsight cluster , for that need to execute the following script.

Upon, executing the script, the cluster provisioning is started from accept, configuring , provisioning phase. You need to assign the username & password manually.

Next, check in Azure management portal after few mins, the provisioning have been started.

Details of HDInsight cluster provisioning along with running HQL queries is stored in my github repository. You can get it here.

Now, HBase columnar storage is available as a part of hadoop cluster from HDInsight offerings, so while provisioning cluster from portal , you need the corresponding cluster type – HBase or Hadoop.

Both of cluster type(either HBase or Hadoop) of HDInsight 3.1 is completely based of pure Hortonworks HDP 2.1 clusters which contains the hadoop components of the following version.

Apache Hadoop 2.4
Apache HBase 0.98.0
Apache Pig 0.12.1
Apache Hive 0.13.0
Apache Tez 0.4
Apache ZooKeeper 3.4.5
Hue 2.3.1
Storm 0.9.1
Apache Oozie 4.0.0
Apache Falcon 0.5
Apache Sqoop 1.4.4
Apache Knox 0.4
Apache Flume 1.4.0
Apache Accumulo 1.5.1
Apache Phoenix 4.0.0
Apache Avro 1.7.4
Apache Mahout 0.9.0
Third party components:
- Ganglia 3.5.0
- Ganglia Web 3.5.7
- Nagios 3.5.0
For Big Data analytics world , one of the most fine-grained language that supports now with Azure ML is R. You can install R official packages for Windows, Linux & OS X, also for official project perspective , use R IDE.

R Packages:

R packages are self-contained units of R functionality that can be invoked as functions. A good analogy would be a .jar file in Java. There is a vast library of
R packages available for a very wide range of operations ranging from statistical operations and machine learning to rich graphic visualization and plotting. Every package will consist of one or more R functions. An R package is a re-usable entity that can be shared and used by others. R users can install the package that contains the functionality they are looking for and start calling the functions in the package. A comprehensive list of these packages can be found at http://cran.r-project.org/ called Comprehensive R Archive Network (CRAN).

Data Modelling with R:

Regression: In statistics, regression is a classic technique to identify the scalar relationship between two or more variables by fitting the state line on the
variable values. That relationship will help to predict the variable value for future events. For example, any variable y can be modeled as linear function
of another variable x with the formula y = mx+c. Here, x is the predictor variable, y is the response variable, m is slope of the line, and c is the
intercept. Sales forecasting of products or services and predicting the price of stocks can be achieved through this regression. R provides this regression
feature via the lm method, which is by default present in R.
Classification: This is a machine-learning technique used for labeling the set of observations provided for training examples. With this, we can classify
the observations into one or more labels. The likelihood of sales, online fraud detection, and cancer classification (for medical science) are common
applications of classification problems. Google Mail uses this technique to classify e-mails as spam or not. Classification features can be served by glm,
glmnet, ksvm, svm, and randomForest in R.
• Clustering: This technique is all about organizing similar items into groups from the given collection of items. User segmentation and image
compression are the most common applications of clustering. Market segmentation, social network analysis, organizing the computer clustering,
and astronomical data analysis are applications of clustering. Google News uses these techniques to group similar news items into the same category.
Clustering can be achieved through the knn, kmeans, dist, pvclust, and Mclust methods in R.

Recommendation: The recommendation algorithms are used in recommender systems where these systems are the most immediately recognizable machine learning techniques in use today. Web content recommendations may include similar websites, blogs, videos, or related content. Also, recommendation of online items can be helpful for cross-selling and up-selling. We have all seen online shopping portals that attempt to recommend books, mobiles, or any items that can be sold on the Web based on the user’s past behavior. Amazon is a well-known e-commerce portal that generates 29 percent of sales through recommendation systems. Recommender systems can be implemented via Recommender()with the recommenderlab package in R.

Filed under Azure HDInsight, Azure PowerShell, Microsoft Azure Tagged with Apache Hadoop Hive on Windows Azure, Azure HDInsight, Azure ML, Couchbase, HBase, Predictive analytics, R for Analytics

Anindita's Blog

Predictive Analytics of UK Electoral Decisions using PowerBI for Office 365

An Overview of Latest Components of Azure HDInsight – Apache Tez, Yarn (MapReduce 2.0) Apache Storm & Kafka with HDP 2.1

An OverView of HDInsight (Hadoop+HBase) with Integrated PowerShell along with R

Archives

Categories

Like on Facebook

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger

Anindita's Blog

Predictive Analytics of UK Electoral Decisions using PowerBI for Office 365

Share this:

An Overview of Latest Components of Azure HDInsight – Apache Tez, Yarn (MapReduce 2.0) Apache Storm & Kafka with HDP 2.1

Share this:

An OverView of HDInsight (Hadoop+HBase) with Integrated PowerShell along with R

Share this:

Archives

Categories

Like on Facebook

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger