HDFS Balancer and DataNode Space Usage Considerations:
Symptoms
It may take days or weeks to re-distribute data among DataNodes.
Newly added nodes use less space compared to existing DataNodes.
Some DataNodes use more space than other DataNodes in a cluster.
Resolution:
The balancer is useful immediately after adding some new nodes to the cluster. You might also want to consider running this as a cron job every week or so.
sudo -u hdfs hdfs balancer -threshold 10
You can start the balancer in the background by running one of the following commands:
/usr/lib/hadoop/bin/start-balancer.sh
Symptoms
It may take days or weeks to re-distribute data among DataNodes.
Newly added nodes use less space compared to existing DataNodes.
Some DataNodes use more space than other DataNodes in a cluster.
Resolution:
The balancer is useful immediately after adding some new nodes to the cluster. You might also want to consider running this as a cron job every week or so.
sudo -u hdfs hdfs balancer -threshold 10
You can start the balancer in the background by running one of the following commands:
/usr/lib/hadoop/bin/start-balancer.sh
No comments:
Post a Comment