By default, if a Datanode doesn't check in for 10.5 minutes [1], that datanode is not responding the following steps are followed [2] * NameNode determines which blocks were on the failed DataNode. * NameNode locates other DataNodes with copies of these blocks. * The DataNodes with block copies are instructed to copy those blocks to other DataNodes to maintain the configured replication factor. Depending on the replica blocks held on that Datanode, would automatically replicate copies to match the replica=3 as mentioned above. dfs.namenode.heartbeat.recheck-interval (300000 ms by default) + 10 * dfs.heartbeat.interval (3 s by default). You can increase the dfs.namenode.heartbeat.recheck-interval in the "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" in Cloudera Manager and then rolling restart the NameNodes for the change to take effect. Also a better solution would be the HDFS Maintenance State[3], but for this you would need to upgrade your CDH version if our records are up to date as this feature has been introduced in CDH 5.11. Increasing this value is not usually recommended as it adds risk to maintaining replica count when a datanode is not available. I would also recommend returning the initial node back to service prior to moving to the next node to limit the amount of Cluster IO/ and Network I/O to replicate blocks from node in maintenance. LINKS: [1] dfs.namenode.stale.datanode.interval => Default time interval for marking a datanode as "stale", i.e., if the namenode has not received heartbeat msg from a datanode for more than this time interval, the datanode will be marked and treated as "stale" by default. The stale interval cannot be too small since otherwise this may cause too frequent change of stale states. We thus set a minimum stale interval value (the default value is 3 times of heartbeat interval) and guarantee that the stale interval cannot be less than the minimum value. A stale data node is avoided during lease/block recovery. It can be conditionally avoided for reads (see dfs.namenode.avoid.read.stale.datanode) and for writes (see dfs.namenode.avoid.write.stale.datanode). [2] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_dn.html#concept_y12_knh_m4 [3] https://blog.cloudera.com/blog/2017/05/hdfs-maintenance-state/
Thursday, August 17, 2017
Data node maintenance without triggering hdfs re-balancing
By default, if a Datanode doesn't check in for 10.5 minutes [1], that datanode is not responding the following steps are followed [2] * NameNode determines which blocks were on the failed DataNode. * NameNode locates other DataNodes with copies of these blocks. * The DataNodes with block copies are instructed to copy those blocks to other DataNodes to maintain the configured replication factor. Depending on the replica blocks held on that Datanode, would automatically replicate copies to match the replica=3 as mentioned above. dfs.namenode.heartbeat.recheck-interval (300000 ms by default) + 10 * dfs.heartbeat.interval (3 s by default). You can increase the dfs.namenode.heartbeat.recheck-interval in the "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" in Cloudera Manager and then rolling restart the NameNodes for the change to take effect. Also a better solution would be the HDFS Maintenance State[3], but for this you would need to upgrade your CDH version if our records are up to date as this feature has been introduced in CDH 5.11. Increasing this value is not usually recommended as it adds risk to maintaining replica count when a datanode is not available. I would also recommend returning the initial node back to service prior to moving to the next node to limit the amount of Cluster IO/ and Network I/O to replicate blocks from node in maintenance. LINKS: [1] dfs.namenode.stale.datanode.interval => Default time interval for marking a datanode as "stale", i.e., if the namenode has not received heartbeat msg from a datanode for more than this time interval, the datanode will be marked and treated as "stale" by default. The stale interval cannot be too small since otherwise this may cause too frequent change of stale states. We thus set a minimum stale interval value (the default value is 3 times of heartbeat interval) and guarantee that the stale interval cannot be less than the minimum value. A stale data node is avoided during lease/block recovery. It can be conditionally avoided for reads (see dfs.namenode.avoid.read.stale.datanode) and for writes (see dfs.namenode.avoid.write.stale.datanode). [2] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_dn.html#concept_y12_knh_m4 [3] https://blog.cloudera.com/blog/2017/05/hdfs-maintenance-state/
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment