UPDATED FOR 5.15
Cloudera has released 5.15 ..and as usual we will update the installation info based on this.19th July 2018 => PrathamOS Treebeard V2 is released.So updating content accordingly...
__________________________________________________________________________
Read about Capacity Planning HERE
__________________________________________________________________________
Lets now understand Node Enlistment.
Download below shown Excel Sheet HERE
Check the video below for reference.Direct Link HERE
We will setup a 4 node cluster in this simulation although the process will remain same as above.
Host Machine will be PrathamOS Adyah BaseLine with Updates 1.2+
To showcase PrathamOS TreeBeard V2, 2 Nodes will be VMware Based & Rest 2 will be VirtualBox Images.
Run below commands to extract the downloaded .7z files.
7z x Release_Jul2018_TreeBeard_V2_CentOS7_64BIT_VirtualBox.7z -o.
7z x Release_Jul2018_TreeBeard_V2_CentOS7_64BIT_VMware.7z -o.
Extracted folders copies can be saved for faster accessibility next time around...
Get Latest RepoMaker...
rm -f PrathamOSBigDataRepoMaker && wget http://bit.ly/PrathamOSBigDataRepoMaker && sudo rm -f /opt/essentials/RepoMaker.sh && sudo mv PrathamOSBigDataRepoMaker /opt/essentials/RepoMaker.sh && sudo chmod 777 /opt/essentials/RepoMaker.sh
Check the video below for Installation
Testing the Installation
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
BENCHMARK
* Teragen
* Terasort
* Teravalidate
* TestDFSIO
Terasort Benchmark Suite: Hadoop benchmark.
1. Generating the input data via TeraGen.
2. Running the actual TeraSort on the input data.
3. Validating the sorted output data via TeraValidate.
Teragen:
row size 100 byte, to generate 1GB of data, num of rows value is 10000000
To change the block size of the generated data, you can pass the argument “–D dfs.block.size=sizeinbytes “
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen 10000000 /hadoop/teragen
Terasort:
Data from Teragen is input for Terasort.
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar terasort /hadoop/teragen /hadoop/terasort
Teravalidate:
TeraValidate validates the sorted output of Terasort and to ensure that the keys are sorted within each file.
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teravalidate /hadoop/terasort /hadoop/teravalidate
TestDFSIO:
The TestDFSIO benchmark is a I/O test, i.e, read and write test for HDFS.
It’s useful in identifying the read/write speed of your hdfs i.e all the datanodes disk and understand how fast the cluster is in terms of I/O.
Write Test:
3 files with each of size of 100 MB, i.e total 300MB file is written in HDFS.
TestDFSIO will write the files to /benchmarks/TestDFSIO on hdfs and the benchmark results are stored in local file, TestDFSIO_results.log.
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-test-2.6.0-mr1-cdh5.15.0.jar TestDFSIO -write -nrFiles 3 -size 100MB
Read Test:
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-test-2.6.0-mr1-cdh5.15.0.jar TestDFSIO -read -nrFiles 3 -size 100MB
__________________________________________________________________________



No comments:
Post a Comment