Monday, July 31, 2017

Procedure to change Journal Node edit directory Hadoop - Name Node issue

Procedure to change Journal Node directory:

1. Create/add drive/disk (/data06)  and /data06/dfs/jn folders for journal node only iin all Nodes where journal node service installed.

2. Update hdfs-site.xml file in all Nodes where journal node service installed
 
    dfs.journalnode.edits.dir
    /data06/dfs/jn
 

3.    Copy existing edit folder from /data01/dfs/jn/mas (considering this folder having old edit logs) to new directory in all Nodes where journal node service installed ( this will take some time to copy edit logs) and change the permissions as below
cp -ar /data01/dfs/jn/*  /data06/dfs/jn /.
chown -R hdfs:hdfs /data06
chmod 700 /data06/dfs

4.  Stop standby name node and journal node
service hadoop-hdfs-namenode stop
service hadoop-hdfs-journalnode stop

5.         Start journal node and standby name node. Both services should be up and running
service hadoop-hdfs-journalnode start
service hadoop-hdfs-namenode start

6.         Stop active name node and journal node. Stand by should come up now
service hadoop-hdfs-namenode stop
service hadoop-hdfs-journalnode stop

7.         Start journal node and active name node
service hadoop-hdfs-journalnode start
service hadoop-hdfs-namenode start

8.        Restart 3rd journal node
service hadoop-hdfs-journalnode stop
service hadoop-hdfs-journalnode start

Hive Table not found even table exist

If we get table not found in hive table even if you have drop and create before inserting data into table. you can fix with below two options

option1: set SYNC_DDL=1;  only if you are using impala
option2: create table with select statement

HDFS Balancer and DataNode Space Usage Considerations:

HDFS Balancer and DataNode Space Usage Considerations:
Symptoms
It may take days or weeks to re-distribute data among DataNodes.
Newly added nodes use less space compared to existing DataNodes.
Some DataNodes use more space than other DataNodes in a cluster.

Resolution:
The balancer is useful immediately after adding some new nodes to the cluster. You might also want to consider running this as a cron job every week or so.
sudo -u hdfs hdfs balancer -threshold 10

You can start the balancer in the background by running one of the following commands:
/usr/lib/hadoop/bin/start-balancer.sh

Hadoop NameNode down or crash randomly

ERROR /Symptoms:
2017-07-27 18:50:32,405 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 5605ms to send a batch of 12 ed
its (841 bytes) to remote journal 206.46.37.113:8485
2017-07-27 18:50:38,693 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 6212ms to send a batch of 18 ed
its (1245 bytes) to remote journal 206.46.37.113:8485
2017-07-27 18:50:48,318 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 9492ms to send a batch of 4 edi
ts (1049 bytes) to remote journal 206.46.37.112:8485
2017-07-27 18:51:16,909 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 11283ms to send a batch of 8 ed
its (1672 bytes) to remote journal 206.46.37.112:8485
2017-07-27 18:51:22,765 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 206.46.37.114:8485 fa
iled to write txns 32683796-32683796. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 34 is less than the last promised epoch 35
…..
…..
2017-07-27 18:51:30,993 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [206.46.37.112:8485, 206.46.37.113:8485, 206.46.37.114:8485], stream=QuorumOutputStream starting at txid 32683792))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:
206.46.37.114:8485: IPC's epoch 34 is less than the last promised epoch 35

Workaround:
dfs.qjournal.write-txns.timeout.ms default value is 20000 - Write timeout in milliseconds when writing to a quorum of remote journals.

increase dfs.qjournal.write-txns.timeout.ms default value to 30000

Procedure to apply dfs.qjournal.write-txns.timeout.ms property
1. Add below property in hdfs-site.xml file where NameNode installed (mostly core worker 1 and core worker 2)

dfs.qjournal.write-txns.timeout.ms
30000

2. Restart Standby NameNode. Standby NameNode should come up and running
service hadoop-hdfs-namenode stop
service hadoop-hdfs-namenode start
service hadoop-hdfs-namenode status

3. Restart active NameNode if Standby is Running
service hadoop-hdfs-namenode stop
service hadoop-hdfs-namenode start
service hadoop-hdfs-namenode status

Friday, July 21, 2017

Tune Yarn - CPU assignment by YARN should limit CPU number for each job

Tune Yarn

You can set CPU limit using yarn.scheduler.maximum-allocation-vcores and yarn.nodemanager.resource.cpu-vcores in yarn-site.xml. decription is below

yarn.scheduler.maximum-allocation-vcores: controls the maximum vcores that any submitted job can request.

yarn.nodemanager.resource.cpu-vcores: controls how many vcores can be scheduled on a particular NodeManager instance.

Monday, July 10, 2017

Apache Spark Mlib - Sparse Vector

What is a Apache Spark Sparse Vector
A vector is a one-dimensional array of elements. So in a programming language, an implementation of a vector is as a one-dimensional array. A vector is said to be sparse when many elements of a have zero values. And when we write programs it will not be a good idea from storage perspective to store all these zero values in the array.
So the best way of representation of a sparse vector will be by just specifying the location and value.

Example: 3 1.2 2800 6.3 6000 10.0 50000 5.7
This denotes at position:
  • 3 the value is 1.2
  • 2800 holds value 6.3
  • 6000 holds 10.0
  • 50000 holds value 5.7
When you use the sparse vector in a programming language you will also need to specify a size. In the above example the size of the sparse vector is 4.
Representation of Sparse Vector in Spark
The Vector class of org.apache.spark.mllib.linalg has multiple methods to create your own dense and sparse Vectors.
The most simple way of creating one is by using:
sparse(int size, int[] indices, double[] values)
This method creates a sparse vector where the first argument is the size, second the indices where a value exists and the last one is the values on these indices.
Rest of the elements of this vector have values zero.
Example:
Let’s say we have to create the following vector {0.0, 5.0, 0.0, 3.0, 4.0}. By using the sparse vector API of Spark this can be created as stated below:
Vector sparseVector = Vectors.sparse(5, new int[] {1,3, 4}, new double[] {5.0,3.0, 4.0});
If the same vector can also be created using the dense vector API
Vector denseVector = Vectors.dense(0.0, 5.0, 0.0, 3.0, 4.0);