how to execute Cloudera Hadoop(CDH4) Quickstart VM(CentOS 6.2) in Windows with VMware workstation


In order to execute Cloudera Hadoop CDH4 VM in Windows , you need to download the quickstart VM from here according to your VM version(i.e VMware/VirtualBox/KVM). It requires a 64 bit host OS. This VM runs CentOS 6.2 and includes CDH4.3, Cloudera Manager 4.6, Cloudera Impala 1.0.1 and Cloudera Search .9 Beta.

For this demo, I have used VMware version of Cloudera Quickstart VM for running on Windows 8 64 bit host OS.

Few points to ponder:

  • This is a 64-bit VM, and requires a 64-bit host OS and a virtualization product that can support a
    64-bit guest OS.
  • This VM uses 4 GB of total RAM. The total system memory required varies depending on the size
    of your data set and on the other processes that are running.
  • The demo VM file is approximately 2 GB. Feel free to mirror internally or externally to minimize
    bandwidth usage.
  • To use the VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won’t be available.
  • After downloading the Cloudera VM , extract it & select the virtual machine configuration (.vmx) file.

vmware

  • Open the .vmx file by vmware workstation & start the VM.

cloudera boot

 

 

cloudera-quickstart-demo-vm-2013-08-10-21-10-52

 

  • To start work on hadoop console, click on hue & login with default Id ‘admin’ & password ‘admin’

cloudera-quickstart-demo-vm-2013-08-10-21-12-44

 

cloudera-quickstart-demo-vm-2013-08-10-21-15-36

 

  • Similarly, login to cloudera manager console with default user id ‘admin‘ & password ‘admin‘ in order to check hadoop cluster’s health.

cloudera-quickstart-demo-vm-2013-08-10-21-18-07

 

  • You can check the hadoop namenode cluster details with summary along with namenode logs along with HDFS clusters.

cloudera-quickstart-demo-vm-2013-08-10-21-20-47

 

cloudera-quickstart-demo-vm-2013-08-10-21-23-15

 

  • Cloudera Hadoop (CDH4) VM contains inbuilt eclipse integrated with apache hadoop to write mapreduce jobs with ease.

cloudera-quickstart-demo-vm-2013-08-10-21-25-00

 

  • Once you open the eclipse integrated with hadoop, a default MapReduce java project is available which runs on Java SE 1.6

cloudera-quickstart-demo-vm-2013-08-10-21-28-26

Advertisements

About Anindita
Anindita Basak is working as Big Data Cloud Consultant in Microsoft. Worked in multiple MNCs as Developer & Senior Developer on Microsoft Azure, Data Platform, IoT & BI , Data Visualization, Data warehousing & ETL & of course in Hadoop platform.She played both as FTE & v- employee in Azure platform teams of Microsoft.Passionate about .NET , Java, Python & Data Science. She is also an active Big Data & Cloud Trainer & would love share her experience in IT Training Industry. She is an author, forum contributor, blogger & technical reviewer of various books on Big Data Hadoop, HDInsight, IoT & Data Science, SQL Server PDW & PowerBI.

6 Responses to how to execute Cloudera Hadoop(CDH4) Quickstart VM(CentOS 6.2) in Windows with VMware workstation

  1. romaintech says:

    Nice blog post and great description!

    For information, the Job Tracker and Namenode have a much nicer interface in the Hue Job Browser and File Browser apps. Some screenshots are on the official website: http://gethue.com!

  2. abhisg2007 says:

    Hi Anindita,

    I am very new to Hadoop and I don’t have any idea how to install the Cloudera QuickStart VM on my windows 8.0.

    Can you please let me know what all the things required for installation as I am not aware about VMWare Workstations and all.

    This will be great help from your side If you send me the step by steps.

    Thanks,
    abhi.sg2007

  3. shafitrumboo says:

    Hi Anindita,

    Thanks for your post!
    I’m .net Guy with 8 yearsexp. but will not hesitate to learn other technologies that fits best for any concept. It will be appreciated if you can help me how I can start working on Hadoop.As I found Hdinsight , Cloudera etc. Can you help me in choosing the platform.

    • imcuteani says:

      Hello,

      Thanks for the comment.

      As you mentioned , as .net developer , you must be aware of Azure platform , in the sense to start with, the best product would be Azure HDInsight which hortonworks hdp installed in Windows server 2012 cluster. It’s a pre-configured hadoop platform on windows.

      Also, if you would like to play with real-world usecases on hortonworks sandbox, you can try hdp sandbox for vmware/hyper-v, Download : http://hortonworks.com/products/hortonworks-sandbox/#install

      If you are bit confident with linux(preferably, ubuntu , go ahead with cloudera CDH)..
      After a confidence on hadoop, try using apache -hadoop cluster in ubuntu (installation & configuration).

      Hope this helps.

      Thanks,
      Anindita

  4. Smita Bajaj says:

    Hi Anindita,
    I am trying to launch quickstart VM CDH5.4.0 but it boots to a blank page and I am not sure of the solution.

    I have checked that Virtualization is enabled in BIOS settings.
    I have tried using VMware workstation as well but the result is the same.

    Can you help me resolve this please ?

    Regards,
    Smita

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: