Monday, October 1, 2018

Cloudera 5.15

UPDATED FOR 5.15

Cloudera has released 5.15 ..and as usual we will update the installation info based on this.

19th July 2018 => PrathamOS Treebeard V2 is released.So updating content accordingly...
__________________________________________________________________________

Read about Capacity Planning HERE
__________________________________________________________________________

Lets now understand Node Enlistment.
Download below shown Excel Sheet HERE


Check the video below for reference.Direct Link HERE


We will setup a 4 node cluster in this simulation although the process will remain same as above.

Host Machine will be PrathamOS Adyah BaseLine with Updates 1.2+

To showcase PrathamOS TreeBeard V2, 2 Nodes will be VMware Based & Rest 2 will be VirtualBox Images.

Run below commands to extract the downloaded .7z files.

7z x Release_Jul2018_TreeBeard_V2_CentOS7_64BIT_VirtualBox.7z -o.
7z x Release_Jul2018_TreeBeard_V2_CentOS7_64BIT_VMware.7z -o.

Extracted folders copies can be saved for faster accessibility next time around...

Get Latest RepoMaker...

rm -f PrathamOSBigDataRepoMaker && wget http://bit.ly/PrathamOSBigDataRepoMaker && sudo rm -f /opt/essentials/RepoMaker.sh && sudo mv PrathamOSBigDataRepoMaker /opt/essentials/RepoMaker.sh && sudo chmod 777 /opt/essentials/RepoMaker.sh





















Check the video below for Installation



Testing the Installation


sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100 

BENCHMARK


* Teragen
* Terasort
* Teravalidate
* TestDFSIO

Terasort Benchmark Suite: Hadoop benchmark.
Combines testing the HDFS and MapReduce layers .Three MapReduce programs.

1. Generating the input data via TeraGen.
2. Running the actual TeraSort on the input data.
3. Validating the sorted output data via TeraValidate.

Teragen:
row size 100 byte, to generate 1GB of data, num of rows value is 10000000
To change the block size of the generated data, you can pass the argument “–D dfs.block.size=sizeinbytes “

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen 10000000 /hadoop/teragen

Terasort:
Data from Teragen is input for Terasort.

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar terasort /hadoop/teragen /hadoop/terasort

Teravalidate:
TeraValidate validates the sorted output of Terasort and to ensure that the keys are sorted within each file.

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teravalidate /hadoop/terasort /hadoop/teravalidate

TestDFSIO:
The TestDFSIO benchmark is a I/O test, i.e, read and write test for HDFS.
It’s useful in identifying the read/write speed of your hdfs i.e all the datanodes disk and understand how fast the cluster is in terms of I/O.

Write Test:
3 files with each of size of 100 MB, i.e total 300MB file is written in HDFS.
TestDFSIO will write the files to /benchmarks/TestDFSIO on hdfs and the benchmark results are stored in local file, TestDFSIO_results.log.

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-test-2.6.0-mr1-cdh5.15.0.jar TestDFSIO -write -nrFiles 3 -size 100MB

Read Test:

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-test-2.6.0-mr1-cdh5.15.0.jar TestDFSIO -read -nrFiles 3 -size 100MB
__________________________________________________________________________