Mastering in Data Science


The following technical blogs are coming to be covered in Data Science, Machine Learning & Analysis , visualization track. Be an enterprise Data Scientist by following the Data Scientist fast track modules: STAY TUNED!!

A lap around MACHINE LEARNING
Supervised and unsupervised learning
Kernel based methods
Text mining techniques
Performance evaluation

Exploring CATEGORICAL DATA ANALYSIS
Types of categorical data
Generalized linear models
Contingency tables
Simple and multinomial logistic regression models

Evaluation of STOCHASTIC PROCESSES AND SIMULATION
Random Variables and Distributions
Monte Carlo Simulation
Discrete Event Simulation
Variance Reduction Techniques

Data OPTIMIZATION Techniques
Linear Programming
Integer Programming
Multi-criteria Optimization
Goal Programming
AHP (Analytic Hierarchy Process)
Data Envelopment Analysis (DEA)

ECONOMETRIC METHODS in Data Science
Time Series Analysis
GARCH Models
Fixed Effects Estimation
Random Effects Estimation

STATISTICS for DATA SCIENCE
Probability Theory
Statistical Inference
Sampling Theory
Hypothesis Testing
Regression Analysis

Real World Case Studies in Data Science

  • Social Media Mining with R & Microsoft PowerBI
  • Experimentation interactive R based visuals with Shiny apps
  • What’s next with Julia

Multicloud Journey- Service comparison of AWS, Azure, GCP


Service NameAWS ServiceAzure ServiceGCP ServiceDescription
MarketplaceAWS MarketplaceAzure MarketplaceGCP MarketplaceEasy-to-deploy and automatically configured third-party applications, including single virtual machine or multiple virtual machine solutions
Compute (Virtual Servers)EC2 instancesVirtual MachinesCompute EngineVirtual servers allow users ti provision, manage, maintain OS & server software based on Pay-as-you-go/
Compute (Virtual Servers)AWS BatchAzure BatchGCP BatchExecute large scale parallel & high performance computing applications.
Compute
(Virtual Servers)
AWS Auto-scalingAzure VM Scale SetsGCP Compute Engine Managed Instance GroupsAllows you to automatically scale the number of VM instances, based on defined metrices/thresholds scale out or scale in.
Compute
(Virtual Servers)
VMWare on AWSAzure VMWare by CloudSimpleVMware as a serviceRedeploy & extend the VMware-based enterprise workloads to Azure by CloudSimple.
Compute
(Virtual Servers)
Parallel ClusterCycleCloudCreate, manage , optimize HPC & big compute clusters at scale.
Containers & Container OrchestratorsElastic Container Service(ECS)

AWS Fargate
Azure Container Instances(ACI)Cloud RunACI is the flastest & Simplest way to run containers in Azure.
Containers & Container OrchestratorsElastic Container Registry(ECR)Azure Container Registry (ACR)Container Registry

Artifact Registry
Allows customers to store Docker formatted images. Used to create all types of container deployments on Azure.
Containers & Container OrchestratorsElastic Kubernetes Service (EKS)Azure Kubernetes Service (AKS)Google Kubernetes Engine (GKE)Deploy orchestrated containerized apps with CNCF Kubernetes at scale.
Containers & Container OrchestratorsAWS App MeshAzure Service Fabric MeshAnthos Service MeshFully managed service that enables developers to deploy microservices applications without managing virtual machines, storage, or networking.
Containers & Container OrchestratorsEKS & Kubernetes Container Insights MetricesAzure Monitor for containers Kubernetes Engine MonitoringAzure Monitor for containers is designed to monitor the performance of container workloads deployed to AKS, AKS Engine, ACI, Azure Stack.
Serverless (Functions)AWS LambdaAzure FunctionsCloud FunctionsProvides FaaS (Function as a service) integrating systems & run backend processes in response to events without provisioning compute servers.
Database
(Relational DB)
RDSAzure SQL DB
Azure

Database for MySQL
Azure

Database for PostgreSQL
Cloud SQL (SQL Server, MySQL, PostgreSQL)Managed relational database where scale, security, resiliency are handled by the platform
NoSQL/DocumentDynamoDB

SimpleDB

Amazon DocumentDB
Azure Cosmos DBCloud SpannerManaged relational db service with dynamic schema, security, scale, maintenance are handled by the cloud platform
NoSQL
(PaaS)
Azure Cosmos DBCloud BigTable
Cloud Firestore
Firebase Realtime Database
Globally distributed multi-model db which natively supports multiple data-models, key-value, documents, graphs etc.
CachingAWS ElastiCacheAzure Cache for RedisCloud Memorystore

Redis Enterprise Cloud
An in-memory based, distributed caching service provides a high performance store typically store used to offload non-transactional work from a database.
Database migrationAWS DMSAzure DMSOpen Source database Migration Tool/SQL Server Database Migration Toolend to end migration of database migration schema & data from on-premise to cloud platform.
Networking
Cloud Virtual Networking
AWS VPCAzure Virtual Network (VNET)GCP Virtual Private Network (VPC)Provides an isolated, private environment in the cloud. Users have control over their virtual networking environment, including selection of their own IP address range, adding/updating address ranges, creation of subnets, and configuration of route tables and network gateways.
DNS ManagementAWS Route 53Azure DNS


Azure Traffic Manager
Cloud DNSManaging DNS records using the same credentials & billing and support contracts.
Dedicated Network
(Hybrid Connectivity)
AWS Direct ConnectAzure ExpressRouteCloud InterconnectEstablishes a private network connection from a location to the cloud provider (not over the Internet).
Load BalancingNetwork Load BalancerAzure Load BalancerNetwork Load BalancingAzure Load Balancer load-balances traffic at layer 4 (TCP or UDP).
Load Balancing in Application layerApplication Load BalancerApplication Gateway

Azure Front door

Azure Traffic Manager
Global Load BalancingApplication Gateway is a layer 7 load balancer. IT takes backends with any IP that is reachable. It supports SSL termination, cookie-based session affinity, and round robin for load-balancing traffic.
Cross-premises connectivityAWS VPN GatewayAzure VPN Gateway


Azure Virtual WAN
Cloud VPN GatewayConnects Azure virtual networks to other Azure virtual networks, or customer on-premises networks (site-to-site). Allows end users to connect to Azure services through VPN tunneling (point-to-site).
Hybrid ConnectivityAWS Virtual Private GatewayAzure VNET GatewayCloud RouterEnables dynamic routes exchange
CDN AWS CloudFrontAzure CDNCloud CDNA content delivery network (CDN) is a distributed network of servers that can efficiently deliver web content to users.
FirewallAWS WAFAzure WAFCloud ArmorAzure Web Application Firewall (WAF) provides centralized protection of your web applications from common exploits and vulnerabilities.
NAT GatewayAWS NAT GatewayAzure Virtual Network NATCloud NATVirtual Network NAT (network address translation) provides outbound NAT translations for internet connectivity for virtual networks.
Private Connectivity to PaaSAWS Private LinkAzure Private LinkVPC Service controlsProvides private connectivity between VPCs, AWS/Azure/GCP services, on-prem apps, securely on the network
Telemetry VPC Flow LogsNSG Flow LogsVPC Flow LogsNetwork security group (NSG) flow logs are a feature of Network Watcher that allows you to view information about ingress and egress IP traffic through an NSG.
Telemetry Network logsVPC Flow LogsNSG Flow LogsFirewall Rules LoggingNSG logs are feature of Network Watcher that allows you to view info about traffic ingress & egress.
Telemetry (Monitoring)AWS CloudWatch, X-RayAzure MonitorOperationsComprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Network WatcherAWS CloudWatchAzure Network WatcherNetwork Intelligence CenterAzure Network Watcher provides tools to monitor, diagnose, view metrics, and enable or disable logs for resources in an Azure virtual network.
Security & IAMAWS IAMAzure ADCloud IAMAllows users to securely control access to services and resources while offering data security and protection. Create and manage users and groups and use permissions to allow and deny access to resources.
IAM (Authentication & Authorization)AWS IAMAzure RBACCloud IAMRole-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to
Multi-factor AuthenticationAWS MFAAzure AD MFAGCP MFASafeguard access to data and applications while meeting user demand for a simple sign-in process.
Auth & Authoriation & ManagementAWS OrganizationsAzure Management Groups + RBACResource ManagerStructure to organize and manage assets in Azure.
AD Domain ServicesAWS Directory ServiceAzure AD Domain ServicesManaged Service for Microsoft Active Directory (AD)Provides managed domain services such as domain join, group policy, LDAP, and Kerberos/NTLM authentication that are fully compatible with Windows Server Active Directory.
Identity managed serviceAWS CognitoAzure AD B2CFirebase AuthenticationA highly available, global, identity management service for consumer-facing applications that scales to hundreds of millions of identities.
Management GroupAWS OrganizationsAzure Policy

Azure Management Groups
Service Account
EncryptionServer side Encryption AWS S3 KMSAzure Storage Service EncryptionEncryption by default at restAzure Storage Service Encryption helps you protect and safeguard your data and meet your organizational security and compliance commitments.
Hardware Security Module (HSM)CloudHSM, KMSAzure Key VaultCloud KMSProvides security solution and works with other services by providing a way to manage, create, and control encryption keys stored in hardware security modules (HSM).
SecurityAWS InspectorAzure Security CenterSecurity Command CenterAutomated Security assessment service provides security & compliance of applications.
Web Security with CertificatesAWS Certificate ManagerAzure App Service certificatesWeb Security Scanner
Advanced Threat ManagementAWS GuardDutyAzure Advanced Threat ProtectionEvent Threat ProtectionDetect and investigate advanced attacks on-premises and in the cloud.
AuditingAWS ArtifactService Trust PortalProvides access to audit reports, compliance guides, and trust documents from across cloud services.
DDoS ProtectionAWS ShieldAzure DDos Protection ServiceDDoS Security with GCP ArmorProvides cloud services with protection from distributed denial of services (DDoS) attacks.
Storage
(Object)
AWS S3Azure Blob StorageCloud StorageObject storage service, for use cases including cloud applications, content distribution, backup, archiving, disaster recovery, and big data analytics.
Storage (VHD)AWS EBSAzure Managed DisksPersistant Disk

Local SSD
SSD storage optimized for I/O intensive read/write operations. For use as high-performance Azure virtual machine storage.
Storage
(File)
AWS EFSAzure Files, Azure NetApp FilesGCP FilestoreFile based storage and hosted NetApp Appliance Storage.
Data ArchiveS3 Infrequent Access (IA)Storage cool tierNearline
Deep Data ArchiveS3 Glacier, Deep ArchiveStorage archive access tierColdline Archive storage has the lowest storage cost and higher data retrieval costs compared to hot and cool storage.
Data BackupAWS BackupAzure BackupGCP BackupBack up and recover files and folders from the cloud, and provide offsite protection against data loss.
Big Data & AnalyticsRedshiftAzure Synapse Analytics (Formerly SQL DW)GCP BigQueryCloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data.
Data warehouse & LakeLake FormationAzure Data ShareLookerBig data sharing service
Big Data TransformationsEMRAzure DatabricksCloud DataFlowManaged Apache Spark-based analytics platform.
Big Data TransformationsEMRHDInsight

GCP Dataproc
Managed Hadoop service.
Big Data TransformationsEMRAzure Data Lake Storage Gen2BigQueryMassively scalable, secure data lake functionality built on Azure Blob Storage.
ETL/Data OrchestrationData Pipeline, GlueAzure Data FactoryGoogle Data FusionProcesses and moves data between different compute and storage services, as well as on-premises data sources at specified intervals. Create, schedule, orchestrate, and manage data pipelines.
Enterprise Data discoveryAWS GlueAzure Data CatalogCloud Data CatalogA fully managed service that serves as a system of registration and system of discovery for enterprise data sources
NoSQL dbDynamo DBAzure Table Storage,

Cosmos DB
Cloud DatastoreNoSQL key-value store for rapid development using massive semi-structured datasets.
Visualization & data StreamingKinesis Analytics

AWS Athena
Azure Stream Analytics

ADLA (Data Lake Analytics)

ADLS Gen2
BigQueryStorage and analysis platforms that create insights from large quantities of data, or data that originates from many sources.
Full text searching capabilityCloud Search
Cognitive Search
Azure Search
Cloud SearchDelivers full-text search and related search analytics and capabilities.
BI tool for VisualizationQuicksightPowerBIDatastudio

Looker
Business intelligence tools that build visualizations, perform ad hoc analysis, and develop business insights from data.
AI HubAWS SageMakerAzure Machine LearningAI HubA cloud service to train, deploy, automate, and manage machine learning models.
Bot CapabilityAlexa Skills kitAzure Bot FrameworkDialogflowBuild and connect intelligent bots that interact with your users using text/SMS, Skype, Teams, Slack, Office 365 mail, Twitter, and other popular services.
Conversational AI (Speech)LexSpeech ServicesAI Building blocks- ConversationAPI capable of converting speech to text, understanding intent, and converting text back to speech for natural responsiveness.
Conversational AI (NLP)LexAzure LUIS AI Building blocks -Language A machine learning-based service to build natural language understanding into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve.
Conversational AI(Speech to Text & vice versa)Polly, TranscribeSpeech ServicesAI Building blocks – ConversationsEnables both Speech to Text, and Text into Speech capabilities.
Enterprise AI (Computer Vision)
(Face, Emotions detections)
RekognitionAzure Cognitive ServicesAI Building Blocks – Cloud AutoML

AI Building Blocks – Sight
Customize and embed state-of-the-art computer vision for specific domains. Build frictionless customer experiences, optimize manufacturing processes, accelerate digital marketing campaigns-and more. No machine learning expertise is required.
Deep LearningTensorFlow with SageMakerONNX
ML.NET
TensorFlowopen source and cross-platform machine learning framework for both machine learning & AI
Data Science/Deep Learning VM AWS Deep Learning AMIsAzure DSVMDeep Learning VM ImagePre-Configured environments in the cloud for Data Science and AI Development.
NotebooksAWS SageMaker Notebook instancesAzure NotebooksAI Platform NotebooksDevelop and run code from anywhere with Jupyter notebooks on Azure.
Deep Learning ContainersAWS Deep Learning ContainersGPU Support on AKSDeep Learning ContainersGraphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads.
Automated Data LabelingAutomate Data Labeling with SageMakerAzure ML – Data LabelingData Labeling ServiceA central place to create, manage, and monitor labeling projects (public preview). Use it to coordinate data, labels, and team members to efficiently manage labeling tasks
ML Platform compute AWS SageMaker ML Instance TypesAzure ML Compute TargetsAI Platform TrainingDesignated compute resource/environment where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resource
ML Service DeploymentsSageMaker Hosting Services-Model DeploymentAzure ML – DeploymentsAI Platform PredictionsDeploy your machine learning model as a web service in the Azure cloud or to Azure IoT Edge devices
Monitor data drift SageMaker Model MonitorAzure ML – Data DriftContinuous EvaluationMonitor for data drift between the training dataset and inference data of a deployed model
TPUAWS InferenciaAzure ML – FPGACloud TPUFPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects. The interconnects allow these blocks to be configured in various ways after manufacturing.
ML OpsMLOps with SageMakerAzure MLOpsGCP KubeFlowMLOps, or DevOps for machine learning, enables data science and IT teams to collaborate and increase the pace of model development and deployment via monitoring, validation, and governance of machine learning models.
DevOps & App MonitoringCloudWatch, X-RayAzure MonitorOperationsMaximizes the availability and performance of your applications and services by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Code collaborations
Code Build
CodeDeploy

CodeCommit

CodePipeline
Azure DevOps

(Azure Board, Azure Pipeline, Azure Build & Release,
Azure Repos)
Cloud Source Repositories

A cloud service for collaborating on code development.
AutomationOpsWorksAzure AutomationsCloud ComposerAutomation gives you complete control during deployment, operations, and decommissioning of workloads and resources.
Automated Infra ProvisioningCloudFormationAzure Resource Manager

VM extensions

Azure Automation
Cloud Deployment ManagerProvides a way for users to automate the manual, long-running, error-prone, and frequently repeated IT tasks.
CLI, SDK interfaceAWS CLIAzure CLI,

PowerShell
PowerShell on GCP

GCloud SDK
Built on top of the native REST API across all cloud services, various programming language-specific wrappers provide easier ways to create solutions.
Building of CodeAWS CodeBuildDevOps BuildCloud BuildFully managed build service that supports continuous integration and deployment
Managed Artifacts Repository AWS CodeArtifact Azure DevOps ArtifactsArtifact RegistryAdd fully integrated package management to your continuous integration/continuous delivery (CI/CD) pipelines with a single click.
IoT ServiceAWS IoTAzure IoT Hub

Azure Event Hub
Cloud IoT CoreA cloud gateway for managing bidirectional communication with billions of IoT devices, securely and at scale.
IoT data processingAWS Kinesis Firehose, Kinesis StreamsAzure Event Hubs

Azure Stream Analytics
HDInsight Kafka
Cloud IoT core
Cloud Pub/Sub
GCP Dataflow
Process and route streaming data to subsequent processing engine or storage or database platform.
IoT on EdgeAWS GreenGrassAzure IoT EdgeEdge TPUDeploy cloud intelligence directly on IoT devices to run in on-premises scenarios.
IoT Things Graph/Digital TwinsIoT Things GraphAzure Digital TwinsDevice RegistryCreate spatial intelligence graphs to model the relationships and interactions between people, places, and devices. Query data from a physical space rather than disparate sensors.
Messaging StorageAWS SQSAzure Queue StorageCloud Pub/SubProvides a managed message queueing service for communicating between decoupled application components.
Reliable MessagingSQSService Bus QueueCloud Pub/SubSupports a set of cloud-based, message-oriented middleware technologies including reliable message queuing and durable publish/subscribe messaging
Messaging with notificationAWS SNSAzure Event GridCloud Pub/SubA fully managed event routing service that allows for uniform event consumption using a publish/subscribe model.
Cloud Management AdvisoryTrusted AdvisorAdvisor

Azure Security Center
GCP RecommenderProvides analysis of cloud resource configuration and security so subscribers can ensure they’re making use of best practices and optimum configurations.
Billing APIAWS Usage & Billing Report

AWS Budgets
Azure Billing APICloud BillingServices to help generate, monitor, forecast, and share billing data for resource usage by time, organization, or product resources
Migrate on-prem workloads Application Discovery ServicesAzure MigrateAssessment & Migration toolAssesses on-premises workloads for migration to Azure, performs performance-based sizing, and provides cost estimations.
Telemetry Analysis of lift-shoftEC2 Systems ManagerAzure MonitorOperations
(formerly StackDriver)
Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
TraceCloudTrailAzure MonitorCloud TraceComprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Logging & Performance MonitoringCloudWatchAzure Application InsightsStackDriver Debugging/LoggingApplication Insights, is an extensible Application Performance Management (APM) service for developers and DevOps professionals.
Cost ManagementAWS Cost explorerAzure Cost ManagementGCP Cost ManagementOptimize cloud costs while maximizing cloud potential.
Mobile ServiceMobile Hub

Mobile SDK

Mobile Analytics
Azure Xamarin Apps,

App Center
GCP App EngineProvides backend mobile services for rapid development of mobile solutions, identity management, data synchronization, and storage and notifications across devices.
Device FarmAWS Device FarmAzure App CenterFirebase Test LabProvides services to support testing mobile applications.
Bulk Data TransferImport/Export Disk,Azure Import/ExportTransfer applianceA data transport solution that uses secure disks and appliances to transfer large amounts of data. Also offers data protection during transit.
Petabyte to exabyte level data transfer to CloudImport/Export Snowball, SnowballEdge, Snowball MobileAzure DataBoxTransfer AppliancePetabyte- to exabyte-scale data transport solution that uses secure data storage devices to transfer large amounts of data to and from Azure
Storage GatewayAWS Storage GatewayAzure StoreSimpleGoogle Cloud StorageIntegrates on-premises IT environments with cloud storage. Automates data management and storage, plus supports disaster recovery.
Data Sync AWS Data SyncAzure File SyncCloud Data TransferData sync services
Serverless Workflow AWS SWFAzure Logic AppsGCP ComposerServerless technology for connecting apps, data and devices anywhere, whether on-premises or in the cloud for large ecosystems of SaaS and cloud-based connectors.
Hybrid AWS OutpostsAzure Stack

Azure ARC
GCP AnthosFor customers who want to simplify complex and distributed environments across on-premises, edge and multi-cloud
MediaAWS Elemental Media Convert

Elastic Transcoder
Azure Media ServicesGCP Anvato

Zync Render

Game Servers
Cloud-based media workflow platform to index, package, protect, and stream video at scale.
BlockChainAWS BlockChainAzure BlockChain ServiceDigital AssetAzure Blockchain Service is a fully managed ledger service that enables users the ability to grow and operate blockchain networks at scale in Azure
App Services AWS ELB (Elastic BeanStalk)Azure App ServiceGCP App EngineManaged hosting platform providing easy to use services for deploying and scaling web applications and services.
API ServicesAPI GatewayAzure API ManagementApigee API platform, API AnalyticsA turnkey solution for publishing APIs to external and internal consumers.
Deploy Web appsLightSailAzure App ServiceCloud Run,
App Engine
Build, deploy, and scale web apps on a fully managed platform.
Backend Serverless computationAWS STEP FunctionAzure Logic AppsApp Engine Connect apps, data and devices on-premises or in the cloud.

PowerShell Script for Assigning Public IP to Azure Virtual Machine


Recently the azure virtual machines provisioned with ‘Resource Manager/Classic‘ deployment model is provisioned with Virtual IP & dynamic IP except Public IP address. Public IP address is necessary for accessing services deployed on VM from local browser.

For example, in Azure Linux VM, if you provision Apache Hadoop/Hortonworks Data Platform/Cloudera Hadoop Distribution , you may need to access the hadoop services from local browser in those addresses like hdfs service : http://<ip address of VM>:50070, http://<ip address of VM>:50075 , MapReduce Service http://<ip address of VM>:8080 etc.

In order to assign public ip for an azure Linux VM , you need to execute the following powershell script.

Get-AzureVM -ServiceName ‘azfilestorage’ -Name ‘azfilestorage’ | Set-AzurePublicIP -PublicIPName ‘linuxvmip’ | Update-AzureVM

Next, update the VM with SSH Port 22 so that , the VM can be accessed through SSH.

Get-AzureVM -Name ‘azfilestorage’ -ServiceName ‘azfilestorage’ | Set-AzureEndpoint -Protocol tcp -Name ‘SSH’ -PublicPort 22 -LocalPort 22 | Update-AzureVM

PS

 

Next, you can check the public IP address on Azure Portal under ‘IP Address’ section of Azure Virtual Machine like as the screenshot.

IP-addr

R with PowerBI – A step by step guide approach


A lot of interests are visible everywhere how to integrate R scripts with Microsoft PowerBI dashboards. Here goes a step by step guidance on this.

Lets assume, you have some couple of readymade R code available, for example , with ggplot2 library. Lets find the following scripts performing analytics using CHOL data.

  1. Open R studio or R Package (CRAN) & install ggplot2 library first.

  2. Paste the following R script & execute it.

install.packages(‘ggplot2’)
library(ggplot2)
chol <- read.table(url(“http://assets.datacamp.com/blog_assets/chol.txt&#8221;), header = TRUE)
#Take the column “AGE” from the “chol” dataset and make a histogram it
qplot(chol$AGE , geom = “histogram”)
ggplot(data-chol, aes(chol$AGE)) + geom_histogram()

you should be able to see the visuals output like this.

Histogram

3. Next, execute the following pieces of R code to find out the binwidth argument using ‘qplot()‘ function.

qplot(chol$AGE,
geom = “histogram”,
binwidth = 0.5)

qplot.JPG

4. Lets take help of hist() function in R.

#Lets take help from hist() function
qplot(chol$AGE,
geom=”histogram”,
binwidth = 0.5,
main = “Histogram for Age”,
xlab = “Age”,
fill=I(“blue”))

hist.JPG

5. Now, add I() function where nested  color.

#Add col argument, I() function where nested color.
qplot(chol$AGE,
geom=”histogram”,
binwidth = 0.5,
main = “Histogram for Age”,
xlab = “Age”,
fill=I(“blue”),
col=I(“red”))

I func.JPG

6. Next, adjust ggplot2 little by the following code.

#Adjusting ggplot
ggplot(data=chol, aes(chol$AGE)) +
geom_histogram(breaks=seq(20, 50, by = 2),
col=”red”,
fill=”green”,
alpha = .2) +
labs(title=”Histogram for Age”) +
labs(x=”Age”, y=”Count”) +
xlim(c(18,52)) +
ylim(c(0,30))

adjustggplot

7. Plot a bar graph with this following code.

#Plotting Bar Graph
qplot(chol$AGE,
geom=”bar”,
binwidth = 0.5,
main = “Bar Graph for Mort”,
xlab = “Mort”,
fill=I(“Red”))

bargraph.JPG

8. Next, open PowerBI desktop tool. You can download it free from this link. Now, click on Get Data tab to start exploring & connect with R dataset. Rscript.JPG

If you already have R installed in the same system building PowerBI visuals , you just need to paste the R scripts next in the code pen otherwise , you need to install R in the system where you are using the PowerBI desktop like this.

Rexe

9. Next, you can also choose the ‘custom R visual’ in PowerBI desktop visualizations & provide the required R scripts to build visuals & finally click ‘Run’.

RPBI.JPG

 

10. Build all the R function visuals by following the same steps & finally save the dashboard.

Dashboard

11.You can refresh an R script in Power BI Desktop. When you refresh an R script, Power BI Desktop runs the R script again in the Power BI Desktop environment.

 

 

 

Quick Installation of Single node Datazen Server in Azure Cloud Service & Sample Dashboards


The demo provides step by step guidance on quick setup of datazen server on single node server & connecting with publisher app to build custom visuals.

Pre-requisites for the demo : 

  1. An active Azure subscription.
  2. Windows 10 store(for installation of Datazen publisher)

Detailed steps are depicted as follows:

Steps Screen Shot
1.     Go to https://manage.windowsazure.com/

2.     Login with your Live ID.

3.     Click +New.

4.     Select Compute -> Virtual Machine -> From Gallery.
(Fig. 2)

VM-provision.JPG
5.     Select the Windows Server 2012 R2 Datacenter image.

6.     Click the next arrow at the bottom right.

 

 

7.     Enter the required information for Virtual machine configuration.

o   Virtual machine name

o   Choose Basic or Standard Tier (recommended)

o   Choose A4 as the machine size.  A4 has 8 cores, which is the minimum number of cores for a single machine setup.

o   Enter the machine admin username/password.

o   Click the next arrow.

 

 
8.     Enter the required information for Virtual machine configuration.

o   Select option Create a new cloud service.

Make sure the Cloud Service DNS Name is available.

o   Choose the subscription it should be billed to, the region it should be deployed.

Choose the one closest to your location

o   Leave the Storage Account and Availability Set as is.

o   Add an HTTP endpoint at a minimum.

You may need to scroll to add the endpoint.

o   Click the next arrow.

 
9.     Select Install VM Agent and leave other unchecked.

10.  Click the checkmark to start the deployment process.

You’ll see it start the provisioning process in the list of the virtual machines you are responsible for in Azure.

 

 
11.   Wait for the status to change from Starting to Running in virtual machines.  
12.   Select your VM then click Connect.  
13.   Save the RDP file to your local machine.  
14.   Open the saved Remote Desktop Connection file, then click Connect.  
15.   Connect to the VM via Remote Desktop and enter the admin username/password. VM.png
16.   Click Yes to connect to Server.   
17.   Click Configure this local server in the Server Manager dashboard that appears when you login.  
18.  Then change the IE Enhanced Security Configuration to Off for Administrators.

You can always change it back if you really want to when you’re done.

19.  Close the Server Manager.

Server Manager.png

 

 

Section 2: Install the Datazen Server

1.     Navigate to the following link and download the Datazen server software onto the VM. You may need to turn off IE Enhanced Security on the server to do so.

2.     The Datazen server files download as a zipped file. Extract all the files

3.     Open Datazen Enterprise Server.3.0.2562.

4.     Click run to start the install process.

Datazen Server.JPG
5.     In the Datazen Enterprise Server Setup click Next.  Finish-DataStudio.JPG
6.     Click Next in the Setup wizard, accepting the terms in the License Agreement and moving through each screen.

 

 
7.     Click Next on the Features page. 

 

 
8.     Click Next on the Core Service Credentials page.  
9.     Once you get to the Admin Password page, type a Password for the Datazen admin user.  (Fig. 20)

This doesn’t have to be the same password as you used for the server.

10.  Click Next. 

ControlPanel-email.JPG
11.  On the Authentication page, leave the Authentication Mode as Default.

12.  Click Next.

 
13.  On the Repository Encryption page, select Copy to Clipboard, then paste the key into a Notepad file.

14.  Save the Notepad file to a safe location.

 

15.  Click Next. 

 
16.  On the Instance Id page, select Copy to Clipboard then paste the Id into a Notepad file.

17.  Save the Notepad file to a safe location.

 

18.  Click Next. 

 
19.  On the Data Acquisition Service Credentials page, leave the credentials as is, then click Next.  
20.  On the Web Applications IIS Settings page leave the default settings, then click Next.   
21.  On the Control Panel Email Settings page, leave the default values since this is a test server.

22.  Click Next. 

 
23.  On the Ready to Install page, click Install and wait until the installation is complete.
This might take a few minutes.
 

Section 3: Configure the Datazen Server

1.   Open your browser in your local machine

2.   Navigate to http://mycloudservicename.cloudapp.net/cp.

Make sure you replace the yourcloudservicename with the name of your cloud service.

3.   If you can successfully connect, you should see the Control Panel Log In screen.

4.   Enter the username admin and the password you entered in the Setup wizard, then select  Log In.

Login.JPG
5.   You will need to create a new user to start creating dashboard hubs, since you need each hub to have an owner.  The owner can NOT be the admin user.  Click Create User to create your first user.

 

 

 

UserCreate.JPG
6.   Enter a value in the top three fields (the email address can be fake if you want) and select Create User.  
7.   You will now see a new option to Create BI Hub.  activate-user.JPG
8.   Enter Hub name whatever you’d like, but make sure you enter the username of the user you just created for owner username.

9.   Enter a maximum number of users that can connect to hub.

10.  Click Create. 

BI Hub Created.JPG
11.   Finish the creation of the hub. It will be displayed in the list of available hubs.  
12.  The new hub will also be shown in the navigation menu at the bottom left of the screen.  
13.  Click the Server Users link on the left-hand side of the screen.

 

Server Users.png
14.  Click Create User.   
15.  Fill in the fields under Required Info.

16.  Click Create User.

 
17.  You will see the user and a Set password link option next to the username.

18.  Click on Set password link and then copy the link to your clipboard.
Note: This step is only required as the email notification is not set up.

 
19.  Logout as the admin

20.  Open a new browser window and paste the URL to reset the password into the address bar.

You can now finish setting up that user by entering the password for the account.

 
21.  In the Control Panel Activate User Account screen, then enter the new password, then re-type password

22.  Click Activate My Account. (Fig. 40)

23.  Logout as this user and log back in as the admin before proceeding.

activate-user.JPG

 

Section 4: Apply a Custom Branding

1.  To add the Wide World Importers brand package to the server, save it locally. The package is provided with this demo.

2.  Click on the Branding link on the left-hand side and upload the brand package to the server.

Branding.png
3.  Make sure you choose the Server to upload it to.

You will see the Server icon has the Wide World Importers branding associated.

 

 
4.  To make sure it was applied properly, open a new browser and navigate to the following URL (make sure you replace the mycloudservicename with whatever you named yours)

http://mycloudservicename.cloudapp.net

Your Server Login screen should look as shown on the right, now having the Wide World Importers brand package applied. (Fig. 43)

Server Login

 

Section 5: Connect to the Datazen Server with Publisher

1.  Open the Datazen Publisher app

If this is the first time using the app, you will have the option of connecting the Datazen demo server.  We recommend doing that, so you will have some nice demo dashboards to show immediately.

2.  To add new server Right-click in the dashboard, then click Connected. (Fig. 44)

 
3.  Click Add New Server Connection (Fig. 45) Demo
4.  Provide the following information to connect to a Datazen server. (Fig. 46)

Server Address: mycloudservicename.cloudapp.net
User name:
user name that created
Password:
provide user password

5.  Uncheck Use Secure Connection.

6.  Click Connect. (Fig. 46)

Datazen Server Login.png
7.  When connected, you should be able to publish dashboards to your Datazen server. (Fig. 47)

8.  You will see a nice dashboard with KPIs for Wide World Importers and Fabrikam Insurance. (Fig. 47)

Datazen Screen

 

 

 

 

 

Resolution of Error: “This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is ..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props.”


Today I faced this issue during compilation of an asp.net 4.5.2  webforms app on Visual Studio 2015 Enterprise which was built few months back using Visual Studio 2013 Update 4  environment from VS online(TFS).

Problem Statement:

The error was highlighted during building of the project.

“This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them.  For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is ..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props.”

Resolution: 

The error comes due to obsolete .nuget packages & wrong path indicated on app.csproj or .vbproj file. So, in order to solve the issue,

  1. First clear all folders under ‘Packages’ directory of project main directory. Packages

2. Restart the solution on VS & click on ‘manage nuget package’ from project solution explorer. Update all .nuget packages to latest version.

nuget-package-manager

 

3. Open the project.csproj / .vbproj file on any text editor & replace the line with the appropriate line of the ‘Microsoft.Net.Compilers.1.1.1’ available in your project directory.

csproj

4. Don’t forget to comment out the following block in the project.csproj/vbproj file like as the screenshot.

<Target Name=”EnsureNuGetPackageBuildImports” BeforeTargets=”PrepareForBuild”>
<PropertyGroup>
<ErrorText>This project references NuGet package(s) that are missing on this computer. Use NuGet Package Restore to download them. For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText>
</PropertyGroup>
<Error Condition=”!Exists(‘..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.1.0.0\build\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.props’))” />
<Error Condition=”!Exists(‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’))” />
<Error Condition=”!Exists(‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’)” Text=”$([System.String]::Format(‘$(ErrorText)’, ‘..\packages\Microsoft.Net.Compilers.1.0.0\build\Microsoft.Net.Compilers.props’))” />
</Target>

comment

~Happy Troubleshooting!!

Azure Stream Analytics & Machine Learning Integration With RealTime Twitter Sentiment Analytics Dashboard on PowerBI


Recently, it has been introduced the integration of ASA & AML available as preview update & it’s possible to add AML web service URL & API key as ‘custom function‘ with ASA input. In this demo, realtime tweets are collected based on keywords like ‘#HappyHolidays2016‘, ‘#MerryChristmas‘, ‘#HappyNewYear2016‘ & those are directly stored on a .csv file saved on OneDrive. Here goes the solution architecture diagram of the POC.

SolutionArc

 

 

Now, add the Service Bus event hub endpoint as input to the ASA job, while deploy the ‘Twitter Predictive Sentiment Analytics Model‘  & click on ‘Open in Studio‘ to start deploy the model. Don’t forget to run the solution before deploying.

AML

 

Once the model is deployed, open the ‘Web Service‘ dashboard page to get the model URL & API key, click on default endpoint -> download the excel 2010 or earlier apps. Collect the URL & API key to apply it to ASA function credentials for AML deployment.

DeployedAML

Next, create an ASA job & add the event hub credentials where the real world tweets are getting pushed & click on ‘Functions‘ tab of ASA job to add the AML credentials. Provide model name, URL & API key of the model & Once, it’s added, click on Save.

ASA-Functions

 

Now, add the following ASA SQL to aggregate the realtime tweets sentiment scores coming out from predictive twitter sentiment model.

Query

 

Provide the output as Azure Blob storage, add a container name & serialization type as CSV & start the ASA job. Also, start importing data into PowerBI desktop from the ASA output Azure blob storage account.

Output

 

 

PowerBI desktop contains in-built power Query to start preparing the ASA output data & processing data types. Choose the AML model sentiment score datatype as decimal type & TweetTexts as Text(String) type.

PBI-AML

 

Start building the ‘Twitter Sentiment Analytics‘ dashboard powered by @AzureStreaming & Azure Machine Learning API with realworld tweet streaming, there’re some cool custom visuals are available on PowerBI.  I’ve used some visuals here like ‘wordcloud‘ chart which depicts some of the highly scored positive sentiment contained tweets with most specific keywords like ‘happynewyear2016‘, ‘MerryChristmas‘,’HappyHolidays‘ etc.

PBI-visuals

 

While, in the donut chart, the top 10 tweets with most positive sentiment counts are portrayed with the specific sentiment scores coming from AML predictive model experiment integrated with ASA jobs.

PBI-dashboard

~Wish you HappyHolidays 2016!

A lap around Microsoft Azure IoT Hub with Azure Stream Analytics & IoT Analytics Suite


Last month on #AzureConf 2015, the Azure IoT Suite has been announced to be available for purchase along with the GA release of Azure IoT Hub. The IoT Hub helps to control, monitor & connect thousands of devices to communicate via cloud & talk to each other using suitable protocols. You can connect to your Azure IoT Hub using the IoT Hub SDKs available in different languages like C, C#, Java, Ruby etc. Also, there’re monitoring devices available like device explorer or iothub-explorer. In this demo, Weather Data Analytics is demonstrated using Azure IoT Hub with Stream Analytics powered by Azure IoT Suite & visualized using Azure SQL database with PowerBI.

You can provision your own device into Azure IoT analytics Suite using device explorer or iothub-explorer tool & start bi-directional communication through device-cloud & cloud-device.

First, create your Azure IoT Hub from Azure Preview Portal  by selecting New-> Internet of Things -> Azure IoT Hub. Provide hub name, select pricing & scale tier[F1 – free(1/subscription, connect 10 devices, 3000 messages /day), [S1 – standard (50,000 messages/day) & S2- standard(1.5 M messages/day)] for device to cloud communication. Select IoT Hub units, device to cloud partitions, resource group, subscription & finally location of deployment(currently it’s available only in three locations- ‘East Asia’, ‘East US’, ‘North Europe’.

 

IoThubcreate

 

Once the hub is created, next switch to device explorer to start creating a device, for details about to create a device & register, refer to this Github page. After registering the device, move back to  ‘Data‘ tab of device explorer tool & click on ‘Monitor‘ button to start receive device-cloud events sent to Azure IoT Hub from device.

DeviceExplorer

 

The schema for the weather dataset looks like the following data & fresh data collected from various sensors & feed into Azure IoT Hub which can be viewed using Device Explorer tool.

DataSchema

 

In order to push data from weather data sensor device to Azure IoT hub, the following code snippet needs to be used. The full code-snipped is going to be available on my Github page.

 

using System;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using Newtonsoft.Json;
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.FileIO;

namespace Microsoft.Azure.Devices.Client.Samples
{
class Program
{
private const string DeviceConnectionString = “Your device connection-string”;
private static int MESSAGE_COUNT = 5;
static string data = string.Empty;

static void Main(string[] args)
{
try
{
DeviceClient deviceClient = DeviceClient.CreateFromConnectionString(DeviceConnectionString);

if (deviceClient == null)
{
Console.WriteLine(“Failed to create DeviceClient!”);
}
else
{
SendEvent(deviceClient).Wait();
ReceiveCommands(deviceClient).Wait();
}

Console.WriteLine(“Exited!\n”);
}
catch (Exception ex)
{
Console.WriteLine(“Error in sample: {0}”, ex.Message);
}
}

static async Task SendEvent(DeviceClient deviceClient)
{
string[] filePath = Directory.GetFiles(@”\Weblog\”,”*.csv”);
string csv_file_path = string.Empty;
int size = filePath.Length;
for(int i=0; i< size; i++)
{
Console.WriteLine(filePath[i]);
csv_file_path = filePath[i];
}

DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine(“Rows count:” + csvData.Rows.Count);
DataTable table = csvData;
foreach(DataRow row in table.Rows)
{
foreach(var item in row.ItemArray)
data = item.ToString();
Console.Write(data);

try
{
foreach(DataRow rows in table.Rows)
{
var info = new WeatherData
{
weatherDate = rows.ItemArray[0].ToString(),
weatherTime = rows.ItemArray[1].ToString(),
apperantTemperature = rows.ItemArray[2].ToString(),
cloudCover = rows.ItemArray[3].ToString(),
dewPoint = rows.ItemArray[4].ToString(),
humidity = rows.ItemArray[5].ToString(),
icon = rows.ItemArray[6].ToString(),
pressure = rows.ItemArray[7].ToString(),
temperature = rows.ItemArray[8].ToString(),
timeInterval = rows.ItemArray[9].ToString(),
visibility = rows.ItemArray[10].ToString(),
windBearing = rows.ItemArray[11].ToString(),
windSpeed = rows.ItemArray[12].ToString(),
latitude = rows.ItemArray[13].ToString(),
longitude = rows.ItemArray[14].ToString()
};

var serializedString = JsonConvert.SerializeObject(info);
var message = data;
Console.WriteLine(“{0}> Sending events: {1}”, DateTime.Now.ToString(), serializedString.ToString());
await deviceClient.SendEventAsync(new Message(Encoding.UTF8.GetBytes(serializedString.ToString())));
}
}

catch(Exception ex)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(“{0} > Exception: {1}”, DateTime.Now.ToString(), ex.Message);
Console.ResetColor();
}
// Task.Delay(200);

}

Console.WriteLine(“Press Ctrl-C to stop the sender process”);
Console.WriteLine(“Press Enter to start now”);
Console.ReadLine();

//string dataBuffer;

//Console.WriteLine(“Device sending {0} messages to IoTHub…\n”, MESSAGE_COUNT);

//for (int count = 0; count < MESSAGE_COUNT; count++)
//{
// dataBuffer = Guid.NewGuid().ToString();
// Message eventMessage = new Message(Encoding.UTF8.GetBytes(dataBuffer));
// Console.WriteLine(“\t{0}> Sending message: {1}, Data: [{2}]”, DateTime.Now.ToLocalTime(), count, dataBuffer);

// await deviceClient.SendEventAsync(eventMessage);
//}
}

private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
string data = string.Empty;
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { “,” });
csvReader.HasFieldsEnclosedInQuotes = true;

//read column names
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();

for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == “”)
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);

}
}
}
catch (Exception ex)
{
Console.WriteLine(“Exception” + ex.Message);
}
return csvData;
}

static async Task ReceiveCommands(DeviceClient deviceClient)
{
Console.WriteLine(“\nDevice waiting for commands from IoTHub…\n”);
Message receivedMessage;
string messageData;

while (true)
{
receivedMessage = await deviceClient.ReceiveAsync(TimeSpan.FromSeconds(1));

if (receivedMessage != null)
{
messageData = Encoding.ASCII.GetString(receivedMessage.GetBytes());
Console.WriteLine(“\t{0}> Received message: {1}”, DateTime.Now.ToLocalTime(), messageData);

await deviceClient.CompleteAsync(receivedMessage);
}
}
}
}
}

You could check output to events sending from device to cloud on console.

dataconsole

Next, start pushing the device data into Azure IoT Hub & monitor the events receiving process through device explorer. Now, start provisioning an Azure Stream Analytics Job on Azure portal. Provide ‘Azure IoT Hub‘ as an input to the job like as the followings.

SAJob

 

input

IoTHubinput

Now provide Azure Stream Analytics Query to connect incoming unstructured datasets from device to cloud to pass into Azure SQL database. So, first, provision a SQL database on Azure & connect to as output to Stream Analytics job.

create table input(
weatherDate nvarchar(max),
weatherTime datetime,
apperantTemperature nvarchar(max),
cloudCover nvarchar(max),
dewPoint nvarchar(max),
humidity nvarchar(max),
icon nvarchar(max),
pressure nvarchar(max),
temperature nvarchar(max),
timeInterval nvarchar(max),
visibility nvarchar(max),
windBearing nvarchar(max),
windSpeed nvarchar(max),
latitude nvarchar(max),
longitude nvarchar(max)
)
select input.weatherDate, input.weatherTime,input.apperantTemperature,input.cloudCover,
input.dewPoint, input.humidity,input.icon,input.pressure,count(input.temperature) as avgtemperature, input.timeInterval, input.visibility, input.windBearing,
input.windSpeed,input.latitude,input.longitude

into weathersql
from input
group by input.weatherDate, input.weatherTime, input.apperantTemperature,input.cloudCover,
input.dewPoint, input.humidity,input.icon, input.pressure,input.timeInterval,input.visibility, input.windBearing,
input.windSpeed,input.latitude,input.longitude, TumblingWindow(second,2)

ASA-sql

Specify the output of ‘WeatherIoT’ ASA job as ‘Azure SQL Database‘, alternatively, you can select any of the rest of the connectors like ‘Event Hub’, ‘DocumentDB’ etc.

SAOutput

 

Make sure that , to create the necessary database & table first on SQL before adding as output to ASA job. For this demo, I have created the ‘weatheriot‘ table on Azure SQL database. The t-sql query looks like this.

iotsql

 

Next, start the ASA job & receive the final Azure IoT hub(device to cloud) data processed to IoT hub ->ASA -> Azure SQL database pipeline. Once you receive data on your Azure SQL table. Start building the PowerBI ‘Weather IoT Data Analytics’ dashboard for visualization & to leverage the power of Azure IoT momentum.

SQLoutput

Connect to PowerBI connected through same account of Azure subscription where you provisioned the ASA job & start importing data from Azure SQL database. Create stunning reports using funnel, donut, global map charts with live data refresh.

WeatherData

For this demo, I’ve populated charts on average weather temperature, pressure, humidity, dew point forecasting analysis over specific areas based on latitude & longitude values, plotted & pinned into PowerBI ‘Weather Data Azure IoT Analytics’ dashboard.

WeatherData-analysis

 

Deployment of Cloudera Enterprise 5.4.4(CDH 5) on Microsoft Azure Virtual Machine & Running Impala shell as single node cluster


Deployment of Cloudera Enterprise (CDH) 5.4.4 can be implemented directly on Microsoft Azure Virtual Machines  & we can start working on Impala shell & Hue itself.

The hosting process is super easy, just need to make sure the following prerequisites & troubleshooting steps should be taken care off.

Prerequisites :

  1. SELinux should be disabled,

Before disabling SELinux you may try sysctl -w vm.swappiness=0.

You have to add the line below in /etc/sysctl.conf to keep your change permanently:

vm.swappiness = 10

  1.  Change the root password
  2. Change the hostname in /etc/hosts file
  3. Add ports 7180, 7182, 9000, 9001 open
  4. Passwordless sudo user authentication
  5. Change the /etc/hosts file , the hostname from hosts IP address by $ifconfig

 Issue: Cloudera Manager site is not opening on browser after installation & the following error shows on log

cloudera-scm-server dead but pid file exists

Follow the steps:

# service cloudera-scm-server stop

# service cloudera-scm-server-db stop

# rm /var/run/cloudera-scm-server.pid

# service cloudera-scm-server-db start

# service cloudera-scm-server start

Details about the step by step process of deployment of CDH 5 on MS Azure Virtual Machine(RHEL 6.x) can be viewed on YouTube channel.

What’s new in Azure Data Catalog


The Azure Data Catalog (aka previously PowerBI Data Catalog) has released in public preview on last monday(July 13th) @WPC15, which typically reveals a new world of storing & connecting #Data across on-prem & azure SQL database. Lets hop into a quick jumpstart on it.

Connect through Azure Data Catalog through this url  https://www.azuredatacatalog.com/ by making sure you are logging with your official id & a valid Azure subscription. Currently , it’s free for first 50 users & upto 5000 registered data assets & in standard edition, upto 100 users & available upto 1M registered data assets.

Provision

 

Lets start with the signing of the official id into the portal.

Signin

Once it’s provisioned, you will be redirected to this page to launch a windows app of Azure Data Catalog.

AzureDC

 

It would start downloading the app from clickonce deployed server.

ADCapp

 

After it downloaded & would prompt to select server , at this point it has capacity to select data from SQL Server Analysis service, Reporting Service, on-prem/Azure SQL database & Oracle db.

Servers

For this demo, we used on-prem SQL server database to connect to Azure Data Catalog.

Catalog

We selected here ‘AdventureWorksLT’ database & pushed total 8 tables like ‘Customer’, ‘Product’, ‘ProductCategory’, ‘ProductDescription’,’ProductModel’, ‘SalesOrderDetail’ etc. Also, you can tags to identify the datasets on data catalog portal.

metadata-tag

Next, click on ‘REGISTER’ to register the dataset & optionally, you can include a preview of the data definition as well.

Object-registration

 

Once the object registration is done, it would allow to view on portal. Click on ‘View Portal’ to check the data catalogs.

Portal

Once you click , you would be redirected to data catalog homepage where you can search for your data by object metaname.

Search

 

SearchData

in the data catalog object portal, all of the registered metadata & objects would be visible with property tags.

Properties

You can also open the registered object datasets in excel to start importing into PowerBI.

opendata

Click on ‘Excel’ or ‘Excel(Top 1000)’ to start importing the data into Excel. The resultant data definition would in .odc format.

SaveCustomer

 

Once you open it in Excel, it would be prompted to enable custom extension. Click on ‘Enable’.

Security

From Excel, the dataset is imported to latest Microsoft PowerBI Designer Preview app to build up a custom dashboard.

ADC-PowerBI

Login into https://app.powerbi.com & click to ‘File’ to get data from .pbix file.

PowerBI

Import the .pbix file on ‘AdventureWorks’ customer details & product analytics to powerbi reports & built up a dashboard.Uploading

The PowerBI preview portal dashboard has some updates on tile details filter like extension of custom links.

PowerBI-filter

 

The PowerBI app for Android is available now, which is useful for quick glance of real-time analytics dashboards specially connected with Stream analytics & updating  real time.

WP_20150715_14_07_48_Pro

WP_20150715_14_13_33_Pro

AdventureWorks-ADC