Anindita's Blog

Mastering in Data Science

September 13, 2019 Leave a comment

The following technical blogs are coming to be covered in Data Science, Machine Learning & Analysis , visualization track. Be an enterprise Data Scientist by following the Data Scientist fast track modules: STAY TUNED!!

A lap around MACHINE LEARNING
Supervised and unsupervised learning
Kernel based methods
Text mining techniques
Performance evaluation

Exploring CATEGORICAL DATA ANALYSIS
Types of categorical data
Generalized linear models
Contingency tables
Simple and multinomial logistic regression models

Evaluation of STOCHASTIC PROCESSES AND SIMULATION
Random Variables and Distributions
Monte Carlo Simulation
Discrete Event Simulation
Variance Reduction Techniques

Data OPTIMIZATION Techniques
Linear Programming
Integer Programming
Multi-criteria Optimization
Goal Programming
AHP (Analytic Hierarchy Process)
Data Envelopment Analysis (DEA)

ECONOMETRIC METHODS in Data Science
Time Series Analysis
GARCH Models
Fixed Effects Estimation
Random Effects Estimation

STATISTICS for DATA SCIENCE
Probability Theory
Statistical Inference
Sampling Theory
Hypothesis Testing
Regression Analysis

Real World Case Studies in Data Science

Social Media Mining with R & Microsoft PowerBI

Experimentation interactive R based visuals with Shiny apps

What’s next with Julia

Filed under Data Science, Data Visualization, Microsoft Azure, Microsoft PowerBI Tagged with Categorical Data Analysis, Data Science, Data Science Fast Track, Data Science Statistics, Econometric models for Data Science, Machine Learning, Random Variables, Social Media Mining, Stochastic Process & Simulation, Word Embeddings

Multicloud Journey- Service comparison of AWS, Azure, GCP

June 21, 2020 Leave a comment

Service Name	AWS Service	Azure Service	GCP Service	Description
Marketplace	AWS Marketplace	Azure Marketplace	GCP Marketplace	Easy-to-deploy and automatically configured third-party applications, including single virtual machine or multiple virtual machine solutions
Compute (Virtual Servers)	EC2 instances	Virtual Machines	Compute Engine	Virtual servers allow users ti provision, manage, maintain OS & server software based on Pay-as-you-go/
Compute (Virtual Servers)	AWS Batch	Azure Batch	GCP Batch	Execute large scale parallel & high performance computing applications.
Compute (Virtual Servers)	AWS Auto-scaling	Azure VM Scale Sets	GCP Compute Engine Managed Instance Groups	Allows you to automatically scale the number of VM instances, based on defined metrices/thresholds scale out or scale in.
Compute (Virtual Servers)	VMWare on AWS	Azure VMWare by CloudSimple	VMware as a service	Redeploy & extend the VMware-based enterprise workloads to Azure by CloudSimple.
Compute (Virtual Servers)	Parallel Cluster	CycleCloud		Create, manage , optimize HPC & big compute clusters at scale.
Containers & Container Orchestrators	Elastic Container Service(ECS) AWS Fargate	Azure Container Instances(ACI)	Cloud Run	ACI is the flastest & Simplest way to run containers in Azure.
Containers & Container Orchestrators	Elastic Container Registry(ECR)	Azure Container Registry (ACR)	Container Registry Artifact Registry	Allows customers to store Docker formatted images. Used to create all types of container deployments on Azure.
Containers & Container Orchestrators	Elastic Kubernetes Service (EKS)	Azure Kubernetes Service (AKS)	Google Kubernetes Engine (GKE)	Deploy orchestrated containerized apps with CNCF Kubernetes at scale.
Containers & Container Orchestrators	AWS App Mesh	Azure Service Fabric Mesh	Anthos Service Mesh	Fully managed service that enables developers to deploy microservices applications without managing virtual machines, storage, or networking.
Containers & Container Orchestrators	EKS & Kubernetes Container Insights Metrices	Azure Monitor for containers	Kubernetes Engine Monitoring	Azure Monitor for containers is designed to monitor the performance of container workloads deployed to AKS, AKS Engine, ACI, Azure Stack.
Serverless (Functions)	AWS Lambda	Azure Functions	Cloud Functions	Provides FaaS (Function as a service) integrating systems & run backend processes in response to events without provisioning compute servers.
Database (Relational DB)	RDS	Azure SQL DB Azure Database for MySQL Azure Database for PostgreSQL	Cloud SQL (SQL Server, MySQL, PostgreSQL)	Managed relational database where scale, security, resiliency are handled by the platform
NoSQL/Document	DynamoDB SimpleDB Amazon DocumentDB	Azure Cosmos DB	Cloud Spanner	Managed relational db service with dynamic schema, security, scale, maintenance are handled by the cloud platform
NoSQL (PaaS)		Azure Cosmos DB	Cloud BigTable Cloud Firestore Firebase Realtime Database	Globally distributed multi-model db which natively supports multiple data-models, key-value, documents, graphs etc.
Caching	AWS ElastiCache	Azure Cache for Redis	Cloud Memorystore Redis Enterprise Cloud	An in-memory based, distributed caching service provides a high performance store typically store used to offload non-transactional work from a database.
Database migration	AWS DMS	Azure DMS	Open Source database Migration Tool/SQL Server Database Migration Tool	end to end migration of database migration schema & data from on-premise to cloud platform.
Networking Cloud Virtual Networking	AWS VPC	Azure Virtual Network (VNET)	GCP Virtual Private Network (VPC)	Provides an isolated, private environment in the cloud. Users have control over their virtual networking environment, including selection of their own IP address range, adding/updating address ranges, creation of subnets, and configuration of route tables and network gateways.
DNS Management	AWS Route 53	Azure DNS Azure Traffic Manager	Cloud DNS	Managing DNS records using the same credentials & billing and support contracts.
Dedicated Network (Hybrid Connectivity)	AWS Direct Connect	Azure ExpressRoute	Cloud Interconnect	Establishes a private network connection from a location to the cloud provider (not over the Internet).
Load Balancing	Network Load Balancer	Azure Load Balancer	Network Load Balancing	Azure Load Balancer load-balances traffic at layer 4 (TCP or UDP).
Load Balancing in Application layer	Application Load Balancer	Application Gateway Azure Front door Azure Traffic Manager	Global Load Balancing	Application Gateway is a layer 7 load balancer. IT takes backends with any IP that is reachable. It supports SSL termination, cookie-based session affinity, and round robin for load-balancing traffic.
Cross-premises connectivity	AWS VPN Gateway	Azure VPN Gateway Azure Virtual WAN	Cloud VPN Gateway	Connects Azure virtual networks to other Azure virtual networks, or customer on-premises networks (site-to-site). Allows end users to connect to Azure services through VPN tunneling (point-to-site).
Hybrid Connectivity	AWS Virtual Private Gateway	Azure VNET Gateway	Cloud Router	Enables dynamic routes exchange
CDN	AWS CloudFront	Azure CDN	Cloud CDN	A content delivery network (CDN) is a distributed network of servers that can efficiently deliver web content to users.
Firewall	AWS WAF	Azure WAF	Cloud Armor	Azure Web Application Firewall (WAF) provides centralized protection of your web applications from common exploits and vulnerabilities.
NAT Gateway	AWS NAT Gateway	Azure Virtual Network NAT	Cloud NAT	Virtual Network NAT (network address translation) provides outbound NAT translations for internet connectivity for virtual networks.
Private Connectivity to PaaS	AWS Private Link	Azure Private Link	VPC Service controls	Provides private connectivity between VPCs, AWS/Azure/GCP services, on-prem apps, securely on the network
Telemetry	VPC Flow Logs	NSG Flow Logs	VPC Flow Logs	Network security group (NSG) flow logs are a feature of Network Watcher that allows you to view information about ingress and egress IP traffic through an NSG.
Telemetry Network logs	VPC Flow Logs	NSG Flow Logs	Firewall Rules Logging	NSG logs are feature of Network Watcher that allows you to view info about traffic ingress & egress.
Telemetry (Monitoring)	AWS CloudWatch, X-Ray	Azure Monitor	Operations	Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Network Watcher	AWS CloudWatch	Azure Network Watcher	Network Intelligence Center	Azure Network Watcher provides tools to monitor, diagnose, view metrics, and enable or disable logs for resources in an Azure virtual network.
Security & IAM	AWS IAM	Azure AD	Cloud IAM	Allows users to securely control access to services and resources while offering data security and protection. Create and manage users and groups and use permissions to allow and deny access to resources.
IAM (Authentication & Authorization)	AWS IAM	Azure RBAC	Cloud IAM	Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to
Multi-factor Authentication	AWS MFA	Azure AD MFA	GCP MFA	Safeguard access to data and applications while meeting user demand for a simple sign-in process.
Auth & Authoriation & Management	AWS Organizations	Azure Management Groups + RBAC	Resource Manager	Structure to organize and manage assets in Azure.
AD Domain Services	AWS Directory Service	Azure AD Domain Services	Managed Service for Microsoft Active Directory (AD)	Provides managed domain services such as domain join, group policy, LDAP, and Kerberos/NTLM authentication that are fully compatible with Windows Server Active Directory.
Identity managed service	AWS Cognito	Azure AD B2C	Firebase Authentication	A highly available, global, identity management service for consumer-facing applications that scales to hundreds of millions of identities.
Management Group	AWS Organizations	Azure Policy Azure Management Groups	Service Account
Encryption	Server side Encryption AWS S3 KMS	Azure Storage Service Encryption	Encryption by default at rest	Azure Storage Service Encryption helps you protect and safeguard your data and meet your organizational security and compliance commitments.
Hardware Security Module (HSM)	CloudHSM, KMS	Azure Key Vault	Cloud KMS	Provides security solution and works with other services by providing a way to manage, create, and control encryption keys stored in hardware security modules (HSM).
Security	AWS Inspector	Azure Security Center	Security Command Center	Automated Security assessment service provides security & compliance of applications.
Web Security with Certificates	AWS Certificate Manager	Azure App Service certificates	Web Security Scanner
Advanced Threat Management	AWS GuardDuty	Azure Advanced Threat Protection	Event Threat Protection	Detect and investigate advanced attacks on-premises and in the cloud.
Auditing	AWS Artifact	Service Trust Portal		Provides access to audit reports, compliance guides, and trust documents from across cloud services.
DDoS Protection	AWS Shield	Azure DDos Protection Service	DDoS Security with GCP Armor	Provides cloud services with protection from distributed denial of services (DDoS) attacks.
Storage (Object)	AWS S3	Azure Blob Storage	Cloud Storage	Object storage service, for use cases including cloud applications, content distribution, backup, archiving, disaster recovery, and big data analytics.
Storage (VHD)	AWS EBS	Azure Managed Disks	Persistant Disk Local SSD	SSD storage optimized for I/O intensive read/write operations. For use as high-performance Azure virtual machine storage.
Storage (File)	AWS EFS	Azure Files, Azure NetApp Files	GCP Filestore	File based storage and hosted NetApp Appliance Storage.
Data Archive	S3 Infrequent Access (IA)	Storage cool tier	Nearline
Deep Data Archive	S3 Glacier, Deep Archive	Storage archive access tier	Coldline	Archive storage has the lowest storage cost and higher data retrieval costs compared to hot and cool storage.
Data Backup	AWS Backup	Azure Backup	GCP Backup	Back up and recover files and folders from the cloud, and provide offsite protection against data loss.
Big Data & Analytics	Redshift	Azure Synapse Analytics (Formerly SQL DW)	GCP BigQuery	Cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data.
Data warehouse & Lake	Lake Formation	Azure Data Share	Looker	Big data sharing service
Big Data Transformations	EMR	Azure Databricks	Cloud DataFlow	Managed Apache Spark-based analytics platform.
Big Data Transformations	EMR	HDInsight	GCP Dataproc	Managed Hadoop service.
Big Data Transformations	EMR	Azure Data Lake Storage Gen2	BigQuery	Massively scalable, secure data lake functionality built on Azure Blob Storage.
ETL/Data Orchestration	Data Pipeline, Glue	Azure Data Factory	Google Data Fusion	Processes and moves data between different compute and storage services, as well as on-premises data sources at specified intervals. Create, schedule, orchestrate, and manage data pipelines.
Enterprise Data discovery	AWS Glue	Azure Data Catalog	Cloud Data Catalog	A fully managed service that serves as a system of registration and system of discovery for enterprise data sources
NoSQL db	Dynamo DB	Azure Table Storage, Cosmos DB	Cloud Datastore	NoSQL key-value store for rapid development using massive semi-structured datasets.
Visualization & data Streaming	Kinesis Analytics AWS Athena	Azure Stream Analytics ADLA (Data Lake Analytics) ADLS Gen2	BigQuery	Storage and analysis platforms that create insights from large quantities of data, or data that originates from many sources.
Full text searching capability	Cloud Search	Cognitive Search Azure Search	Cloud Search	Delivers full-text search and related search analytics and capabilities.
BI tool for Visualization	Quicksight	PowerBI	Datastudio Looker	Business intelligence tools that build visualizations, perform ad hoc analysis, and develop business insights from data.
AI Hub	AWS SageMaker	Azure Machine Learning	AI Hub	A cloud service to train, deploy, automate, and manage machine learning models.
Bot Capability	Alexa Skills kit	Azure Bot Framework	Dialogflow	Build and connect intelligent bots that interact with your users using text/SMS, Skype, Teams, Slack, Office 365 mail, Twitter, and other popular services.
Conversational AI (Speech)	Lex	Speech Services	AI Building blocks- Conversation	API capable of converting speech to text, understanding intent, and converting text back to speech for natural responsiveness.
Conversational AI (NLP)	Lex	Azure LUIS	AI Building blocks -Language	A machine learning-based service to build natural language understanding into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve.
Conversational AI(Speech to Text & vice versa)	Polly, Transcribe	Speech Services	AI Building blocks – Conversations	Enables both Speech to Text, and Text into Speech capabilities.
Enterprise AI (Computer Vision) (Face, Emotions detections)	Rekognition	Azure Cognitive Services	AI Building Blocks – Cloud AutoML AI Building Blocks – Sight	Customize and embed state-of-the-art computer vision for specific domains. Build frictionless customer experiences, optimize manufacturing processes, accelerate digital marketing campaigns-and more. No machine learning expertise is required.
Deep Learning	TensorFlow with SageMaker	ONNX ML.NET	TensorFlow	open source and cross-platform machine learning framework for both machine learning & AI
Data Science/Deep Learning VM	AWS Deep Learning AMIs	Azure DSVM	Deep Learning VM Image	Pre-Configured environments in the cloud for Data Science and AI Development.
Notebooks	AWS SageMaker Notebook instances	Azure Notebooks	AI Platform Notebooks	Develop and run code from anywhere with Jupyter notebooks on Azure.
Deep Learning Containers	AWS Deep Learning Containers	GPU Support on AKS	Deep Learning Containers	Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads.
Automated Data Labeling	Automate Data Labeling with SageMaker	Azure ML – Data Labeling	Data Labeling Service	A central place to create, manage, and monitor labeling projects (public preview). Use it to coordinate data, labels, and team members to efficiently manage labeling tasks
ML Platform compute	AWS SageMaker ML Instance Types	Azure ML Compute Targets	AI Platform Training	Designated compute resource/environment where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource
ML Service Deployments	SageMaker Hosting Services-Model Deployment	Azure ML – Deployments	AI Platform Predictions	Deploy your machine learning model as a web service in the Azure cloud or to Azure IoT Edge devices
Monitor data drift	SageMaker Model Monitor	Azure ML – Data Drift	Continuous Evaluation	Monitor for data drift between the training dataset and inference data of a deployed model
TPU	AWS Inferencia	Azure ML – FPGA	Cloud TPU	FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects. The interconnects allow these blocks to be configured in various ways after manufacturing.
ML Ops	MLOps with SageMaker	Azure MLOps	GCP KubeFlow	MLOps, or DevOps for machine learning, enables data science and IT teams to collaborate and increase the pace of model development and deployment via monitoring, validation, and governance of machine learning models.
DevOps & App Monitoring	CloudWatch, X-Ray	Azure Monitor	Operations	Maximizes the availability and performance of your applications and services by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Code collaborations	Code Build CodeDeploy CodeCommit CodePipeline	Azure DevOps (Azure Board, Azure Pipeline, Azure Build & Release, Azure Repos)	Cloud Source Repositories	A cloud service for collaborating on code development.
Automation	OpsWorks	Azure Automations	Cloud Composer	Automation gives you complete control during deployment, operations, and decommissioning of workloads and resources.
Automated Infra Provisioning	CloudFormation	Azure Resource Manager VM extensions Azure Automation	Cloud Deployment Manager	Provides a way for users to automate the manual, long-running, error-prone, and frequently repeated IT tasks.
CLI, SDK interface	AWS CLI	Azure CLI, PowerShell	PowerShell on GCP GCloud SDK	Built on top of the native REST API across all cloud services, various programming language-specific wrappers provide easier ways to create solutions.
Building of Code	AWS CodeBuild	DevOps Build	Cloud Build	Fully managed build service that supports continuous integration and deployment
Managed Artifacts Repository	AWS CodeArtifact	Azure DevOps Artifacts	Artifact Registry	Add fully integrated package management to your continuous integration/continuous delivery (CI/CD) pipelines with a single click.
IoT Service	AWS IoT	Azure IoT Hub Azure Event Hub	Cloud IoT Core	A cloud gateway for managing bidirectional communication with billions of IoT devices, securely and at scale.
IoT data processing	AWS Kinesis Firehose, Kinesis Streams	Azure Event Hubs Azure Stream Analytics HDInsight Kafka	Cloud IoT core Cloud Pub/Sub GCP Dataflow	Process and route streaming data to subsequent processing engine or storage or database platform.
IoT on Edge	AWS GreenGrass	Azure IoT Edge	Edge TPU	Deploy cloud intelligence directly on IoT devices to run in on-premises scenarios.
IoT Things Graph/Digital Twins	IoT Things Graph	Azure Digital Twins	Device Registry	Create spatial intelligence graphs to model the relationships and interactions between people, places, and devices. Query data from a physical space rather than disparate sensors.
Messaging Storage	AWS SQS	Azure Queue Storage	Cloud Pub/Sub	Provides a managed message queueing service for communicating between decoupled application components.
Reliable Messaging	SQS	Service Bus Queue	Cloud Pub/Sub	Supports a set of cloud-based, message-oriented middleware technologies including reliable message queuing and durable publish/subscribe messaging
Messaging with notification	AWS SNS	Azure Event Grid	Cloud Pub/Sub	A fully managed event routing service that allows for uniform event consumption using a publish/subscribe model.
Cloud Management Advisory	Trusted Advisor	Advisor Azure Security Center	GCP Recommender	Provides analysis of cloud resource configuration and security so subscribers can ensure they’re making use of best practices and optimum configurations.
Billing API	AWS Usage & Billing Report AWS Budgets	Azure Billing API	Cloud Billing	Services to help generate, monitor, forecast, and share billing data for resource usage by time, organization, or product resources
Migrate on-prem workloads	Application Discovery Services	Azure Migrate	Assessment & Migration tool	Assesses on-premises workloads for migration to Azure, performs performance-based sizing, and provides cost estimations.
Telemetry Analysis of lift-shoft	EC2 Systems Manager	Azure Monitor	Operations (formerly StackDriver)	Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Trace	CloudTrail	Azure Monitor	Cloud Trace	Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Logging & Performance Monitoring	CloudWatch	Azure Application Insights	StackDriver Debugging/Logging	Application Insights, is an extensible Application Performance Management (APM) service for developers and DevOps professionals.
Cost Management	AWS Cost explorer	Azure Cost Management	GCP Cost Management	Optimize cloud costs while maximizing cloud potential.
Mobile Service	Mobile Hub Mobile SDK Mobile Analytics	Azure Xamarin Apps, App Center	GCP App Engine	Provides backend mobile services for rapid development of mobile solutions, identity management, data synchronization, and storage and notifications across devices.
Device Farm	AWS Device Farm	Azure App Center	Firebase Test Lab	Provides services to support testing mobile applications.
Bulk Data Transfer	Import/Export Disk,	Azure Import/Export	Transfer appliance	A data transport solution that uses secure disks and appliances to transfer large amounts of data. Also offers data protection during transit.
Petabyte to exabyte level data transfer to Cloud	Import/Export Snowball, SnowballEdge, Snowball Mobile	Azure DataBox	Transfer Appliance	Petabyte- to exabyte-scale data transport solution that uses secure data storage devices to transfer large amounts of data to and from Azure
Storage Gateway	AWS Storage Gateway	Azure StoreSimple	Google Cloud Storage	Integrates on-premises IT environments with cloud storage. Automates data management and storage, plus supports disaster recovery.
Data Sync	AWS Data Sync	Azure File Sync	Cloud Data Transfer	Data sync services
Serverless Workflow	AWS SWF	Azure Logic Apps	GCP Composer	Serverless technology for connecting apps, data and devices anywhere, whether on-premises or in the cloud for large ecosystems of SaaS and cloud-based connectors.
Hybrid	AWS Outposts	Azure Stack Azure ARC	GCP Anthos	For customers who want to simplify complex and distributed environments across on-premises, edge and multi-cloud
Media	AWS Elemental Media Convert Elastic Transcoder	Azure Media Services	GCP Anvato Zync Render Game Servers	Cloud-based media workflow platform to index, package, protect, and stream video at scale.
BlockChain	AWS BlockChain	Azure BlockChain Service	Digital Asset	Azure Blockchain Service is a fully managed ledger service that enables users the ability to grow and operate blockchain networks at scale in Azure
App Services	AWS ELB (Elastic BeanStalk)	Azure App Service	GCP App Engine	Managed hosting platform providing easy to use services for deploying and scaling web applications and services.
API Services	API Gateway	Azure API Management	Apigee API platform, API Analytics	A turnkey solution for publishing APIs to external and internal consumers.
Deploy Web apps	LightSail	Azure App Service	Cloud Run, App Engine	Build, deploy, and scale web apps on a fully managed platform.
Backend Serverless computation	AWS STEP Function	Azure Logic Apps	App Engine	Connect apps, data and devices on-premises or in the cloud.

Filed under Azure SQL Database / SQL Server, Microsoft Azure, Multicloud Tagged with BlockChain. Cloud Security, CI/CD, Cloud Infra, Cloud Media Services, Data Analytics, Database, DevOps, Multicloud

PowerShell Script for Assigning Public IP to Azure Virtual Machine

July 24, 2016 Leave a comment

Recently the azure virtual machines provisioned with ‘Resource Manager/Classic‘ deployment model is provisioned with Virtual IP & dynamic IP except Public IP address. Public IP address is necessary for accessing services deployed on VM from local browser.

For example, in Azure Linux VM, if you provision Apache Hadoop/Hortonworks Data Platform/Cloudera Hadoop Distribution , you may need to access the hadoop services from local browser in those addresses like hdfs service : http://<ip address of VM>:50070, http://<ip address of VM>:50075 , MapReduce Service http://<ip address of VM>:8080 etc.

In order to assign public ip for an azure Linux VM , you need to execute the following powershell script.

Get-AzureVM -ServiceName ‘azfilestorage’ -Name ‘azfilestorage’ | Set-AzurePublicIP -PublicIPName ‘linuxvmip’ | Update-AzureVM

Next, update the VM with SSH Port 22 so that , the VM can be accessed through SSH.

Get-AzureVM -Name ‘azfilestorage’ -ServiceName ‘azfilestorage’ | Set-AzureEndpoint -Protocol tcp -Name ‘SSH’ -PublicPort 22 -LocalPort 22 | Update-AzureVM

Next, you can check the public IP address on Azure Portal under ‘IP Address’ section of Azure Virtual Machine like as the screenshot.

Filed under Azure PowerShell, Microsoft Azure

R with PowerBI – A step by step guide approach

June 5, 2016 Leave a comment

A lot of interests are visible everywhere how to integrate R scripts with Microsoft PowerBI dashboards. Here goes a step by step guidance on this.

Lets assume, you have some couple of readymade R code available, for example , with ggplot2 library. Lets find the following scripts performing analytics using CHOL data.

Open R studio or R Package (CRAN) & install ggplot2 library first.
Paste the following R script & execute it.

install.packages(‘ggplot2’)
library(ggplot2)
chol <- read.table(url(“http://assets.datacamp.com/blog_assets/chol.txt”), header = TRUE)
#Take the column “AGE” from the “chol” dataset and make a histogram it
qplot(chol$AGE , geom = “histogram”)
ggplot(data-chol, aes(chol$AGE)) + geom_histogram()

you should be able to see the visuals output like this.

Histogram

3. Next, execute the following pieces of R code to find out the binwidth argument using ‘qplot()‘ function.

qplot(chol$AGE,
geom = “histogram”,
binwidth = 0.5)

4. Lets take help of hist() function in R.

#Lets take help from hist() function
qplot(chol$AGE,
geom=”histogram”,
binwidth = 0.5,
main = “Histogram for Age”,
xlab = “Age”,
fill=I(“blue”))

5. Now, add I() function where nested color.

#Add col argument, I() function where nested color.
qplot(chol$AGE,
geom=”histogram”,
binwidth = 0.5,
main = “Histogram for Age”,
xlab = “Age”,
fill=I(“blue”),
col=I(“red”))

I func.JPG

6. Next, adjust ggplot2 little by the following code.

#Adjusting ggplot
ggplot(data=chol, aes(chol$AGE)) +
geom_histogram(breaks=seq(20, 50, by = 2),
col=”red”,
fill=”green”,
alpha = .2) +
labs(title=”Histogram for Age”) +
labs(x=”Age”, y=”Count”) +
xlim(c(18,52)) +
ylim(c(0,30))

adjustggplot

7. Plot a bar graph with this following code.

#Plotting Bar Graph
qplot(chol$AGE,
geom=”bar”,
binwidth = 0.5,
main = “Bar Graph for Mort”,
xlab = “Mort”,
fill=I(“Red”))

8. Next, open PowerBI desktop tool. You can download it free from this link. Now, click on Get Data tab to start exploring & connect with R dataset.

If you already have R installed in the same system building PowerBI visuals , you just need to paste the R scripts next in the code pen otherwise , you need to install R in the system where you are using the PowerBI desktop like this.

Rexe

9. Next, you can also choose the ‘custom R visual’ in PowerBI desktop visualizations & provide the required R scripts to build visuals & finally click ‘Run’.

10. Build all the R function visuals by following the same steps & finally save the dashboard.

Dashboard

11.You can refresh an R script in Power BI Desktop. When you refresh an R script, Power BI Desktop runs the R script again in the Power BI Desktop environment.

Filed under Hadoop, Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals Tagged with Big Data, Data Science, Office 365, PowerBI, R

Quick Installation of Single node Datazen Server in Azure Cloud Service & Sample Dashboards

May 1, 2016 Leave a comment

The demo provides step by step guidance on quick setup of datazen server on single node server & connecting with publisher app to build custom visuals.

Pre-requisites for the demo :

An active Azure subscription.
Windows 10 store(for installation of Datazen publisher)

Detailed steps are depicted as follows:

Steps	Screen Shot
1. Go to https://manage.windowsazure.com/ 2. Login with your Live ID. 3. Click +New. 4. Select Compute -> Virtual Machine -> From Gallery. (Fig. 2)
5. Select the Windows Server 2012 R2 Datacenter image. 6. Click the next arrow at the bottom right.
7. Enter the required information for Virtual machine configuration. o Virtual machine name o Choose Basic or Standard Tier (recommended) o Choose A4 as the machine size. A4 has 8 cores, which is the minimum number of cores for a single machine setup. o Enter the machine admin username/password. o Click the next arrow.
8. Enter the required information for Virtual machine configuration. o Select option Create a new cloud service. Make sure the Cloud Service DNS Name is available. o Choose the subscription it should be billed to, the region it should be deployed. Choose the one closest to your location o Leave the Storage Account and Availability Set as is. o Add an HTTP endpoint at a minimum. You may need to scroll to add the endpoint. o Click the next arrow.
9. Select Install VM Agent and leave other unchecked. 10. Click the checkmark to start the deployment process. You’ll see it start the provisioning process in the list of the virtual machines you are responsible for in Azure.
11. Wait for the status to change from Starting to Running in virtual machines.
12. Select your VM then click Connect.
13. Save the RDP file to your local machine.
14. Open the saved Remote Desktop Connection file, then click Connect.
15. Connect to the VM via Remote Desktop and enter the admin username/password.
16. Click Yes to connect to Server.
17. Click Configure this local server in the Server Manager dashboard that appears when you login.
18. Then change the IE Enhanced Security Configuration to Off for Administrators. You can always change it back if you really want to when you’re done. 19. Close the Server Manager.

Section 2: Install the Datazen Server

1. Navigate to the following link and download the Datazen server software onto the VM. You may need to turn off IE Enhanced Security on the server to do so. 2. The Datazen server files download as a zipped file. Extract all the files 3. Open Datazen Enterprise Server.3.0.2562. 4. Click run to start the install process.
5. In the Datazen Enterprise Server Setup click Next.
6. Click Next in the Setup wizard, accepting the terms in the License Agreement and moving through each screen.
7. Click Next on the Features page.
8. Click Next on the Core Service Credentials page.
9. Once you get to the Admin Password page, type a Password for the Datazen admin user. (Fig. 20) This doesn’t have to be the same password as you used for the server. 10. Click Next.
11. On the Authentication page, leave the Authentication Mode as Default. 12. Click Next.
13. On the Repository Encryption page, select Copy to Clipboard, then paste the key into a Notepad file. 14. Save the Notepad file to a safe location. 15. Click Next.
16. On the Instance Id page, select Copy to Clipboard then paste the Id into a Notepad file. 17. Save the Notepad file to a safe location. 18. Click Next.
19. On the Data Acquisition Service Credentials page, leave the credentials as is, then click Next.
20. On the Web Applications IIS Settings page leave the default settings, then click Next.
21. On the Control Panel Email Settings page, leave the default values since this is a test server. 22. Click Next.
23. On the Ready to Install page, click Install and wait until the installation is complete. This might take a few minutes.

Section 3: Configure the Datazen Server

1. Open your browser in your local machine 2. Navigate to http://mycloudservicename.cloudapp.net/cp. Make sure you replace the yourcloudservicename with the name of your cloud service. 3. If you can successfully connect, you should see the Control Panel Log In screen. 4. Enter the username admin and the password you entered in the Setup wizard, then select Log In.
5. You will need to create a new user to start creating dashboard hubs, since you need each hub to have an owner. The owner can NOT be the admin user. Click Create User to create your first user.
6. Enter a value in the top three fields (the email address can be fake if you want) and select Create User.
7. You will now see a new option to Create BI Hub.
8. Enter Hub name whatever you’d like, but make sure you enter the username of the user you just created for owner username. 9. Enter a maximum number of users that can connect to hub. 10. Click Create.
11. Finish the creation of the hub. It will be displayed in the list of available hubs.
12. The new hub will also be shown in the navigation menu at the bottom left of the screen.
13. Click the Server Users link on the left-hand side of the screen.
14. Click Create User.
15. Fill in the fields under Required Info. 16. Click Create User.
17. You will see the user and a Set password link option next to the username. 18. Click on Set password link and then copy the link to your clipboard. Note: This step is only required as the email notification is not set up.
19. Logout as the admin 20. Open a new browser window and paste the URL to reset the password into the address bar. You can now finish setting up that user by entering the password for the account.
21. In the Control Panel Activate User Account screen, then enter the new password, then re-type password 22. Click Activate My Account. (Fig. 40) 23. Logout as this user and log back in as the admin before proceeding.

Section 4: Apply a Custom Branding

1. To add the Wide World Importers brand package to the server, save it locally. The package is provided with this demo.

2. Click on the Branding link on the left-hand side and upload the brand package to the server.

3. Make sure you choose the Server to upload it to.

You will see the Server icon has the Wide World Importers branding associated.

4. To make sure it was applied properly, open a new browser and navigate to the following URL (make sure you replace the mycloudservicename with whatever you named yours)

http://mycloudservicename.cloudapp.net

Your Server Login screen should look as shown on the right, now having the Wide World Importers brand package applied. (Fig. 43)

Section 5: Connect to the Datazen Server with Publisher

1. Open the Datazen Publisher app If this is the first time using the app, you will have the option of connecting the Datazen demo server. We recommend doing that, so you will have some nice demo dashboards to show immediately. 2. To add new server Right-click in the dashboard, then click Connected. (Fig. 44)
3. Click Add New Server Connection (Fig. 45)
4. Provide the following information to connect to a Datazen server. (Fig. 46) Server Address: mycloudservicename.cloudapp.net User name: user name that created Password: provide user password 5. Uncheck Use Secure Connection. 6. Click Connect. (Fig. 46)
7. When connected, you should be able to publish dashboards to your Datazen server. (Fig. 47) 8. You will see a nice dashboard with KPIs for Wide World Importers and Fabrikam Insurance. (Fig. 47)

Filed under Azure SQL Database / SQL Server, Hadoop, Microsoft PowerBI, Microsoft PowerBI Visuals

Resolution of Error: “This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is ..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props.”

March 21, 2016 Leave a comment

Today I faced this issue during compilation of an asp.net 4.5.2 webforms app on Visual Studio 2015 Enterprise which was built few months back using Visual Studio 2013 Update 4 environment from VS online(TFS).

Problem Statement:

The error was highlighted during building of the project.

“This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is ..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props.”

Resolution:

The error comes due to obsolete .nuget packages & wrong path indicated on app.csproj or .vbproj file. So, in order to solve the issue,

First clear all folders under ‘Packages’ directory of project main directory.

2. Restart the solution on VS & click on ‘manage nuget package’ from project solution explorer. Update all .nuget packages to latest version.

nuget-package-manager

3. Open the project.csproj / .vbproj file on any text editor & replace the line with the appropriate line of the ‘Microsoft.Net.Compilers.1.1.1’ available in your project directory.

csproj

4. Don’t forget to comment out the following block in the project.csproj/vbproj file like as the screenshot.

<Target Name=”EnsureNuGetPackageBuildImports” BeforeTargets=”PrepareForBuild”>
<PropertyGroup>
<ErrorText>This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText>
</PropertyGroup>
<Error Condition=”!Exists(‘..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props’))” />
<Error Condition=”!Exists(‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’))” />
<Error Condition=”!Exists(‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’))” />
</Target>

comment

~Happy Troubleshooting!!

Filed under Hadoop, OData/Code First-Entity Framework/MVC/jQuery Mobile

Azure Stream Analytics & Machine Learning Integration With RealTime Twitter Sentiment Analytics Dashboard on PowerBI

December 27, 2015 Leave a comment

Recently, it has been introduced the integration of ASA & AML available as preview update & it’s possible to add AML web service URL & API key as ‘custom function‘ with ASA input. In this demo, realtime tweets are collected based on keywords like ‘#HappyHolidays2016‘, ‘#MerryChristmas‘, ‘#HappyNewYear2016‘ & those are directly stored on a .csv file saved on OneDrive. Here goes the solution architecture diagram of the POC.

Now, add the Service Bus event hub endpoint as input to the ASA job, while deploy the ‘Twitter Predictive Sentiment Analytics Model‘ & click on ‘Open in Studio‘ to start deploy the model. Don’t forget to run the solution before deploying.

Once the model is deployed, open the ‘Web Service‘ dashboard page to get the model URL & API key, click on default endpoint -> download the excel 2010 or earlier apps. Collect the URL & API key to apply it to ASA function credentials for AML deployment.

Next, create an ASA job & add the event hub credentials where the real world tweets are getting pushed & click on ‘Functions‘ tab of ASA job to add the AML credentials. Provide model name, URL & API key of the model & Once, it’s added, click on Save.

Now, add the following ASA SQL to aggregate the realtime tweets sentiment scores coming out from predictive twitter sentiment model.

Provide the output as Azure Blob storage, add a container name & serialization type as CSV & start the ASA job. Also, start importing data into PowerBI desktop from the ASA output Azure blob storage account.

PowerBI desktop contains in-built power Query to start preparing the ASA output data & processing data types. Choose the AML model sentiment score datatype as decimal type & TweetTexts as Text(String) type.

Start building the ‘Twitter Sentiment Analytics‘ dashboard powered by @AzureStreaming & Azure Machine Learning API with realworld tweet streaming, there’re some cool custom visuals are available on PowerBI. I’ve used some visuals here like ‘wordcloud‘ chart which depicts some of the highly scored positive sentiment contained tweets with most specific keywords like ‘happynewyear2016‘, ‘MerryChristmas‘,’HappyHolidays‘ etc.

While, in the donut chart, the top 10 tweets with most positive sentiment counts are portrayed with the specific sentiment scores coming from AML predictive model experiment integrated with ASA jobs.

~Wish you HappyHolidays 2016!

Filed under Azure HDInsight, Azure Machine Learning, Azure PowerShell, Azure SQL Database / SQL Server, Cortana Analytics Suite, Data Science & Azure IoT Suite, Hadoop, Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals Tagged with Azure IoT Suite, Azure Machin, AzureStreaming, Cortana Analytics Suite, Microsoft PowerBI, PowerBI Custom Visuals, PowerBI TagCloud, Predictive analytics, R & Python with AML, Sentiment Analytics, Social Sentiment Analysis

A lap around Microsoft Azure IoT Hub with Azure Stream Analytics & IoT Analytics Suite

October 12, 2015 Leave a comment

Last month on #AzureConf 2015, the Azure IoT Suite has been announced to be available for purchase along with the GA release of Azure IoT Hub. The IoT Hub helps to control, monitor & connect thousands of devices to communicate via cloud & talk to each other using suitable protocols. You can connect to your Azure IoT Hub using the IoT Hub SDKs available in different languages like C, C#, Java, Ruby etc. Also, there’re monitoring devices available like device explorer or iothub-explorer. In this demo, Weather Data Analytics is demonstrated using Azure IoT Hub with Stream Analytics powered by Azure IoT Suite & visualized using Azure SQL database with PowerBI.

You can provision your own device into Azure IoT analytics Suite using device explorer or iothub-explorer tool & start bi-directional communication through device-cloud & cloud-device.

First, create your Azure IoT Hub from Azure Preview Portal by selecting New-> Internet of Things -> Azure IoT Hub. Provide hub name, select pricing & scale tier[F1 – free(1/subscription, connect 10 devices, 3000 messages /day), [S1 – standard (50,000 messages/day) & S2- standard(1.5 M messages/day)] for device to cloud communication. Select IoT Hub units, device to cloud partitions, resource group, subscription & finally location of deployment(currently it’s available only in three locations- ‘East Asia’, ‘East US’, ‘North Europe’.

Once the hub is created, next switch to device explorer to start creating a device, for details about to create a device & register, refer to this Github page. After registering the device, move back to ‘Data‘ tab of device explorer tool & click on ‘Monitor‘ button to start receive device-cloud events sent to Azure IoT Hub from device.

The schema for the weather dataset looks like the following data & fresh data collected from various sensors & feed into Azure IoT Hub which can be viewed using Device Explorer tool.

In order to push data from weather data sensor device to Azure IoT hub, the following code snippet needs to be used. The full code-snipped is going to be available on my Github page.

using System;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using Newtonsoft.Json;
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.FileIO;

namespace Microsoft.Azure.Devices.Client.Samples
{
class Program
{
private const string DeviceConnectionString = “Your device connection-string”;
private static int MESSAGE_COUNT = 5;
static string data = string.Empty;

static void Main(string[] args)
{
try
{
DeviceClient deviceClient = DeviceClient.CreateFromConnectionString(DeviceConnectionString);

if (deviceClient == null)
{
Console.WriteLine(“Failed to create DeviceClient!”);
}
else
{
SendEvent(deviceClient).Wait();
ReceiveCommands(deviceClient).Wait();
}

Console.WriteLine(“Exited!\n”);
}
catch (Exception ex)
{
Console.WriteLine(“Error in sample: {0}”, ex.Message);
}
}

static async Task SendEvent(DeviceClient deviceClient)
{
string[] filePath = Directory.GetFiles(@”\Weblog\”,”*.csv”);
string csv_file_path = string.Empty;
int size = filePath.Length;
for(int i=0; i< size; i++)
{
Console.WriteLine(filePath[i]);
csv_file_path = filePath[i];
}

DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine(“Rows count:” + csvData.Rows.Count);
DataTable table = csvData;
foreach(DataRow row in table.Rows)
{
foreach(var item in row.ItemArray)
data = item.ToString();
Console.Write(data);

try
{
foreach(DataRow rows in table.Rows)
{
var info = new WeatherData
{
weatherDate = rows.ItemArray[0].ToString(),
weatherTime = rows.ItemArray[1].ToString(),
apperantTemperature = rows.ItemArray[2].ToString(),
cloudCover = rows.ItemArray[3].ToString(),
dewPoint = rows.ItemArray[4].ToString(),
humidity = rows.ItemArray[5].ToString(),
icon = rows.ItemArray[6].ToString(),
pressure = rows.ItemArray[7].ToString(),
temperature = rows.ItemArray[8].ToString(),
timeInterval = rows.ItemArray[9].ToString(),
visibility = rows.ItemArray[10].ToString(),
windBearing = rows.ItemArray[11].ToString(),
windSpeed = rows.ItemArray[12].ToString(),
latitude = rows.ItemArray[13].ToString(),
longitude = rows.ItemArray[14].ToString()
};

var serializedString = JsonConvert.SerializeObject(info);
var message = data;
Console.WriteLine(“{0}> Sending events: {1}”, DateTime.Now.ToString(), serializedString.ToString());
await deviceClient.SendEventAsync(new Message(Encoding.UTF8.GetBytes(serializedString.ToString())));
}
}

catch(Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(“{0} > Exception: {1}”, DateTime.Now.ToString(), ex.Message);
Console.ResetColor();
}
// Task.Delay(200);

}

Console.WriteLine(“Press Ctrl-C to stop the sender process”);
Console.WriteLine(“Press Enter to start now”);
Console.ReadLine();

//string dataBuffer;

//Console.WriteLine(“Device sending {0} messages to IoTHub…\n”, MESSAGE_COUNT);

//for (int count = 0; count < MESSAGE_COUNT; count++)
//{
// dataBuffer = Guid.NewGuid().ToString();
// Message eventMessage = new Message(Encoding.UTF8.GetBytes(dataBuffer));
// Console.WriteLine(“\t{0}> Sending message: {1}, Data: [{2}]”, DateTime.Now.ToLocalTime(), count, dataBuffer);

// await deviceClient.SendEventAsync(eventMessage);
//}
}

private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
string data = string.Empty;
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { “,” });
csvReader.HasFieldsEnclosedInQuotes = true;

//read column names
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();

for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == “”)
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);

}
}
}
catch (Exception ex)
{
Console.WriteLine(“Exception” + ex.Message);
}
return csvData;
}

static async Task ReceiveCommands(DeviceClient deviceClient)
{
Console.WriteLine(“\nDevice waiting for commands from IoTHub…\n”);
Message receivedMessage;
string messageData;

while (true)
{
receivedMessage = await deviceClient.ReceiveAsync(TimeSpan.FromSeconds(1));

if (receivedMessage != null)
{
messageData = Encoding.ASCII.GetString(receivedMessage.GetBytes());
Console.WriteLine(“\t{0}> Received message: {1}”, DateTime.Now.ToLocalTime(), messageData);

await deviceClient.CompleteAsync(receivedMessage);
}
}
}
}
}

You could check output to events sending from device to cloud on console.

Next, start pushing the device data into Azure IoT Hub & monitor the events receiving process through device explorer. Now, start provisioning an Azure Stream Analytics Job on Azure portal. Provide ‘Azure IoT Hub‘ as an input to the job like as the followings.

Now provide Azure Stream Analytics Query to connect incoming unstructured datasets from device to cloud to pass into Azure SQL database. So, first, provision a SQL database on Azure & connect to as output to Stream Analytics job.

create table input(
weatherDate nvarchar(max),
weatherTime datetime,
apperantTemperature nvarchar(max),
cloudCover nvarchar(max),
dewPoint nvarchar(max),
humidity nvarchar(max),
icon nvarchar(max),
pressure nvarchar(max),
temperature nvarchar(max),
timeInterval nvarchar(max),
visibility nvarchar(max),
windBearing nvarchar(max),
windSpeed nvarchar(max),
latitude nvarchar(max),
longitude nvarchar(max)
)
select input.weatherDate, input.weatherTime,input.apperantTemperature,input.cloudCover,
input.dewPoint, input.humidity,input.icon,input.pressure,count(input.temperature) as avgtemperature, input.timeInterval, input.visibility, input.windBearing,
input.windSpeed,input.latitude,input.longitude

into weathersql
from input
group by input.weatherDate, input.weatherTime, input.apperantTemperature,input.cloudCover,
input.dewPoint, input.humidity,input.icon, input.pressure,input.timeInterval,input.visibility, input.windBearing,
input.windSpeed,input.latitude,input.longitude, TumblingWindow(second,2)

Specify the output of ‘WeatherIoT’ ASA job as ‘Azure SQL Database‘, alternatively, you can select any of the rest of the connectors like ‘Event Hub’, ‘DocumentDB’ etc.

Make sure that , to create the necessary database & table first on SQL before adding as output to ASA job. For this demo, I have created the ‘weatheriot‘ table on Azure SQL database. The t-sql query looks like this.

Next, start the ASA job & receive the final Azure IoT hub(device to cloud) data processed to IoT hub ->ASA -> Azure SQL database pipeline. Once you receive data on your Azure SQL table. Start building the PowerBI ‘Weather IoT Data Analytics’ dashboard for visualization & to leverage the power of Azure IoT momentum.

Connect to PowerBI connected through same account of Azure subscription where you provisioned the ASA job & start importing data from Azure SQL database. Create stunning reports using funnel, donut, global map charts with live data refresh.

For this demo, I’ve populated charts on average weather temperature, pressure, humidity, dew point forecasting analysis over specific areas based on latitude & longitude values, plotted & pinned into PowerBI ‘Weather Data Azure IoT Analytics’ dashboard.

Filed under Azure HDInsight, Azure SQL Database / SQL Server, Hadoop, Microsoft Azure, Microsoft PowerBI Tagged with Azure IoT Analytics, Azure IoT Hub, Azure IoT Stream Analytics, Azure IoT Suite, Internet of your Things, Microsoft PowerBI

Deployment of Cloudera Enterprise 5.4.4(CDH 5) on Microsoft Azure Virtual Machine & Running Impala shell as single node cluster

September 20, 2015 Leave a comment

Deployment of Cloudera Enterprise (CDH) 5.4.4 can be implemented directly on Microsoft Azure Virtual Machines & we can start working on Impala shell & Hue itself.

The hosting process is super easy, just need to make sure the following prerequisites & troubleshooting steps should be taken care off.

Prerequisites :

SELinux should be disabled,

Before disabling SELinux you may try sysctl -w vm.swappiness=0.

You have to add the line below in /etc/sysctl.conf to keep your change permanently:

vm.swappiness = 10

Change the root password
Change the hostname in /etc/hosts file
Add ports 7180, 7182, 9000, 9001 open
Passwordless sudo user authentication
Change the /etc/hosts file , the hostname from hosts IP address by $ifconfig

Issue: Cloudera Manager site is not opening on browser after installation & the following error shows on log

cloudera-scm-server dead but pid file exists

Follow the steps:

# service cloudera-scm-server stop

# service cloudera-scm-server-db stop

# rm /var/run/cloudera-scm-server.pid

# service cloudera-scm-server-db start

# service cloudera-scm-server start

Details about the step by step process of deployment of CDH 5 on MS Azure Virtual Machine(RHEL 6.x) can be viewed on YouTube channel.

Filed under Hadoop, Microsoft Azure Tagged with Azure Virtual Network, CDH Impala Azure, Cloudera Enterprise, Cloudera Impala, Cloudera Manager SCM Server, Microsoft Azure

What’s new in Azure Data Catalog

July 15, 2015 Leave a comment

The Azure Data Catalog (aka previously PowerBI Data Catalog) has released in public preview on last monday(July 13th) @WPC15, which typically reveals a new world of storing & connecting #Data across on-prem & azure SQL database. Lets hop into a quick jumpstart on it.

Connect through Azure Data Catalog through this url https://www.azuredatacatalog.com/ by making sure you are logging with your official id & a valid Azure subscription. Currently , it’s free for first 50 users & upto 5000 registered data assets & in standard edition, upto 100 users & available upto 1M registered data assets.

Lets start with the signing of the official id into the portal.

Once it’s provisioned, you will be redirected to this page to launch a windows app of Azure Data Catalog.

It would start downloading the app from clickonce deployed server.

After it downloaded & would prompt to select server , at this point it has capacity to select data from SQL Server Analysis service, Reporting Service, on-prem/Azure SQL database & Oracle db.

For this demo, we used on-prem SQL server database to connect to Azure Data Catalog.

We selected here ‘AdventureWorksLT’ database & pushed total 8 tables like ‘Customer’, ‘Product’, ‘ProductCategory’, ‘ProductDescription’,’ProductModel’, ‘SalesOrderDetail’ etc. Also, you can tags to identify the datasets on data catalog portal.

Next, click on ‘REGISTER’ to register the dataset & optionally, you can include a preview of the data definition as well.

Once the object registration is done, it would allow to view on portal. Click on ‘View Portal’ to check the data catalogs.

Once you click , you would be redirected to data catalog homepage where you can search for your data by object metaname.

in the data catalog object portal, all of the registered metadata & objects would be visible with property tags.

You can also open the registered object datasets in excel to start importing into PowerBI.

Click on ‘Excel’ or ‘Excel(Top 1000)’ to start importing the data into Excel. The resultant data definition would in .odc format.

Once you open it in Excel, it would be prompted to enable custom extension. Click on ‘Enable’.

From Excel, the dataset is imported to latest Microsoft PowerBI Designer Preview app to build up a custom dashboard.

Import the .pbix file on ‘AdventureWorks’ customer details & product analytics to powerbi reports & built up a dashboard.

The PowerBI preview portal dashboard has some updates on tile details filter like extension of custom links.

The PowerBI app for Android is available now, which is useful for quick glance of real-time analytics dashboards specially connected with Stream analytics & updating real time.

Filed under Azure HDInsight, Azure IoT & Data Science, Azure PowerShell, Azure SQL Database / SQL Server, Hadoop, Microsoft Azure, Microsoft PowerBI, Microsoft PowerBI Visuals, Windows Phone 8, Windows Store Universal Apps Tagged with Azure Active Directory, Azure Data Catalog, Azure Data Platform, Microsoft PowerBI, Office 365, PowerBI gateway

← Older posts

Anindita's Blog

PowerShell Script for Assigning Public IP to Azure Virtual Machine

R with PowerBI – A step by step guide approach

Quick Installation of Single node Datazen Server in Azure Cloud Service & Sample Dashboards

Section 2: Install the Datazen Server

Section 3: Configure the Datazen Server

Section 4: Apply a Custom Branding

Section 5: Connect to the Datazen Server with Publisher

A lap around Microsoft Azure IoT Hub with Azure Stream Analytics & IoT Analytics Suite

Deployment of Cloudera Enterprise 5.4.4(CDH 5) on Microsoft Azure Virtual Machine & Running Impala shell as single node cluster

What’s new in Azure Data Catalog

Archives

Categories

Like on Facebook

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger

Share this:

Share this:

Share this:

Share this:

Section 2: Install the Datazen Server

Section 3: Configure the Datazen Server

Section 4: Apply a Custom Branding

Section 5: Connect to the Datazen Server with Publisher

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Archives

Categories

The Cloud

Recent Posts

Follow me on Twitter

Blog Traffic

Blog Stats

Follow Blog via Email

Proud to be an Indiblogger

Most Valuable Blogger