Symptoms:
- It may take days or weeks to re-distribute data among DataNodes.
- Newly added nodes use less space compared to existing DataNodes.
- Some DataNodes use more space than other DataNodes in a cluster
Cause:
1. How do I run the HDFS Balancer?
Only 1 HDFS Balancer is needed and can be run from any node in the cluster. Commonly it is configured to run on a NameNode host.
Balancer will run until it is manually stopped or considers the cluster balanced.
Run from Cloudera Manager. Navigate to Cloudera Manager > HDFS > Instances > Balancer and run Actions > Rebalance .
Run from the command line from any HDFS node using: sudo -u hdfs hdfs balancer -threshold 10
Note: It is NOT recommended to run HDFS Balancer frequently on a cluster with HBase. Frequently running HDFS balancer will cause HBase to lose block locality and impact performance.
2. How do I schedule the HDFS Balancer to run periodically?
The balancer can not be scheduled with Cloudera Manager. To run the balancer automatically it will need to be run with the command line using an OS tool such as cron.
3. Why is one DataNode using more HDFS space than others?
There are multiple ways to bring files from outside of HDFS and distribute them across multiple nodes. Following are two examples and their impact:
A) Copying to HDFS from a DataNode host
Files are copied to a local filesystem or are located on a NFS mount on a DataNode host. From the DataNode host the files are copied to HDFS with the hdfs dfs -copyFromLocal command. Example:
hdfs dfs -copyFromLocal /mnt/myfile1 /files
When writing to HDFS the first replica of each block will be written to the local DataNode. The subsequent replicas of each block will write to other randomly selected DataNodes. As a result, much of the data will end up on the same node that is performing the operations. This is helpful for services that need data locality but may unbalance a DataNode during direct loading of data to HDFS.
B) Copying from a Node outside the HDFS Cluster (Edge Node / Gateway Node)
When writing to HDFS from an edge node, or any node with the HDFS client but not running a DataNode role, the NameNode directs the client to write to random DataNodes for each block replica providing more even distribution of the blocks across the cluster.
An edge node must have network access to all of the DataNodes. If a firewall is blocking access to some of the DataNodes, writes will timeout affecting performance and data distribution.
Note: If you have inputs arriving as individual or small batches of events, you can also look at Apache Flume to automatically collect them directly and reliably into your cluster.
4. Why are newly added nodes using less space than existing DataNodes?
When new DataNodes are added to a cluster the existing data is not automatically re-distributed between existing and newly-added DataNodes. They will only receive random writes of new block replicas to the cluster. To re-distribute data between existing and new DataNodes the HDFS balancer must be run.
5. After replacing failed disks, why does HDFS not balance data among all the disks within a DataNode?
HDFS Balancer does not re-balance data among disks within a DataNode. It only re-balances data among DataNodes. After running the HDFS Balancer, the percentage of total disk space used by your DataNodes will be balanced across cluster. Within a DataNode there may be several disks where data is not evenly distributed.
By default, the DataNodes are configured to write to disks a round-robin fashion. If disks of different capacities are used, or if disks fail and are replaced, the round-robin writing algorithm continues to write to each disk at a time resulting in different percentage capacity used on each disk.
To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space. If using Cloudera Manager:
Navigate to HDFS > Configuration > DataNode
Change DataNode Volume Choosing Policy from Round Robin to Available Space
Click Save Changes
Restart the DataNodes
The disks will not immediately balance with the Available Space choosing policy, but will distribute writes over time. Work is being done to re-balance disks within a DataNode via HDFS-1312. At the time this article is written the work is not complete and there is no estimate on when it will be available.
6. How to determine HDFS balancer's re-balancing speed?
Login to the host running the balancer and go to directory and find the Balancer directory with highest number (#####) /var/run/cloudera-scm-agent/process/#####-hdfs-BALANCER/logs
The stdout.log displays the balancer's progress.
Run the following command to extract Balancer's statistics from stdout.log files:
awk '/^[A-Z][a-zA-Z]/' stdout.log > balancer_progress.txt
This table is a sample representation of the output:
On each iteration (2nd column) the balancer determines how much data it plans to move (5th column) and decrements amount of data to be moved (4th column) as shown above. As you can see that in this case, it took ~84 hours to transfer 9.71 TB of data, which is roughly 115.6 GB/hr (32 MB/sec).
Depending on your cluster load, performance can be tuned as shown below.
7. How to tune HDFS Balancer performance?
Key Points To Remember:
HDFS Balancer role must be present. To verify navigate to Cloudera Manager > HDFS > Instances and look for "Balancer") go to must be run manually to re-distribute data among DataNodes.
HDFS Balancer does NOT re-distribute data between disks within a DataNode.
Default threshold is 10%, which means that it will ONLY re-distribute data among DataNodes when the storage usage on each DataNode differs from the overall usage of the cluster (mean usage) by more than 10%.
While running HDFS balancer you can still use the cluster and doesn't have major impact on the performance of the cluster.
Navigate to Cloudera Manager > HDFS > Configuration and search for dfs.datanode.balance.bandwidthPerSec. This determines maximum balancing bandwidth. The default is 10 MiB/sec. Cloudera recommend setting it to 100 MiB and increase as needed based on your network infrastucture. If set too high the balancer may saturate your network.
Navigate to Cloudera Manager > HDFS-> Configuration, search for DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and add the following property:
dfs.datanode.balance.max.concurrent.moves
200
Also add search for and add this property to Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
This property controls the maximum concurrent block moves performed by a DataNode.
By default, dfs.datanode.balance.max.concurrent.moves is set to 5. DataNodes limit the number of threads used for balancing so as to not eat up all the resources of the cluster/DataNode.
Increase dfs.datanode.balance.max.concurrent.moves value to higher number like 200, temporarily. Once the cluster is balanced, reduce this value to default.
Note: For changes to take effect, perform Rolling Restart of HDFS and Deploy Client Configuration. Once HDFS restarts, stop the balancer (if it was run from CLI) and re-run it.
- Uneven distribution of space usage depending upon how data is written to the cluster, and/or if new DataNodes were added to the cluster or not.
- DataNodes limit the number of threads used for balancing so as to not eat up all the resources of the cluster/DataNode. By default the number of threads is 5. This setting was not configurable prior to Apache Hadoop 2.5.0 (prior to CDH 5.2.0). HDFS-6595 allows this property to be modified to control the number of threads used for balancing from CDH 5.2.0 and later releases.
- By, default dfs.datanode.balance.bandwidthPerSec is set to 10MiB. This will limit how much network bandwidth is dedicated for balancer.
- HDFS Balancer exiting due to a known bug: HDFS-6621 Hadoop Balancer prematurely exits iterations in CDH 5.1.3 and earlier
Only 1 HDFS Balancer is needed and can be run from any node in the cluster. Commonly it is configured to run on a NameNode host.
Balancer will run until it is manually stopped or considers the cluster balanced.
Run from Cloudera Manager. Navigate to Cloudera Manager > HDFS > Instances > Balancer and run Actions > Rebalance .
Run from the command line from any HDFS node using: sudo -u hdfs hdfs balancer -threshold 10
Note: It is NOT recommended to run HDFS Balancer frequently on a cluster with HBase. Frequently running HDFS balancer will cause HBase to lose block locality and impact performance.
2. How do I schedule the HDFS Balancer to run periodically?
The balancer can not be scheduled with Cloudera Manager. To run the balancer automatically it will need to be run with the command line using an OS tool such as cron.
3. Why is one DataNode using more HDFS space than others?
There are multiple ways to bring files from outside of HDFS and distribute them across multiple nodes. Following are two examples and their impact:
A) Copying to HDFS from a DataNode host
Files are copied to a local filesystem or are located on a NFS mount on a DataNode host. From the DataNode host the files are copied to HDFS with the hdfs dfs -copyFromLocal command. Example:
hdfs dfs -copyFromLocal /mnt/myfile1 /files
When writing to HDFS the first replica of each block will be written to the local DataNode. The subsequent replicas of each block will write to other randomly selected DataNodes. As a result, much of the data will end up on the same node that is performing the operations. This is helpful for services that need data locality but may unbalance a DataNode during direct loading of data to HDFS.
B) Copying from a Node outside the HDFS Cluster (Edge Node / Gateway Node)
When writing to HDFS from an edge node, or any node with the HDFS client but not running a DataNode role, the NameNode directs the client to write to random DataNodes for each block replica providing more even distribution of the blocks across the cluster.
An edge node must have network access to all of the DataNodes. If a firewall is blocking access to some of the DataNodes, writes will timeout affecting performance and data distribution.
Note: If you have inputs arriving as individual or small batches of events, you can also look at Apache Flume to automatically collect them directly and reliably into your cluster.
4. Why are newly added nodes using less space than existing DataNodes?
When new DataNodes are added to a cluster the existing data is not automatically re-distributed between existing and newly-added DataNodes. They will only receive random writes of new block replicas to the cluster. To re-distribute data between existing and new DataNodes the HDFS balancer must be run.
5. After replacing failed disks, why does HDFS not balance data among all the disks within a DataNode?
HDFS Balancer does not re-balance data among disks within a DataNode. It only re-balances data among DataNodes. After running the HDFS Balancer, the percentage of total disk space used by your DataNodes will be balanced across cluster. Within a DataNode there may be several disks where data is not evenly distributed.
By default, the DataNodes are configured to write to disks a round-robin fashion. If disks of different capacities are used, or if disks fail and are replaced, the round-robin writing algorithm continues to write to each disk at a time resulting in different percentage capacity used on each disk.
To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space. If using Cloudera Manager:
Navigate to HDFS > Configuration > DataNode
Change DataNode Volume Choosing Policy from Round Robin to Available Space
Click Save Changes
Restart the DataNodes
The disks will not immediately balance with the Available Space choosing policy, but will distribute writes over time. Work is being done to re-balance disks within a DataNode via HDFS-1312. At the time this article is written the work is not complete and there is no estimate on when it will be available.
6. How to determine HDFS balancer's re-balancing speed?
Login to the host running the balancer and go to directory and find the Balancer directory with highest number (#####) /var/run/cloudera-scm-agent/process/#####-hdfs-BALANCER/logs
The stdout.log displays the balancer's progress.
Run the following command to extract Balancer's statistics from stdout.log files:
awk '/^[A-Z][a-zA-Z]/' stdout.log > balancer_progress.txt
This table is a sample representation of the output:
On each iteration (2nd column) the balancer determines how much data it plans to move (5th column) and decrements amount of data to be moved (4th column) as shown above. As you can see that in this case, it took ~84 hours to transfer 9.71 TB of data, which is roughly 115.6 GB/hr (32 MB/sec).
Depending on your cluster load, performance can be tuned as shown below.
7. How to tune HDFS Balancer performance?
Key Points To Remember:
HDFS Balancer role must be present. To verify navigate to Cloudera Manager > HDFS > Instances and look for "Balancer") go to must be run manually to re-distribute data among DataNodes.
HDFS Balancer does NOT re-distribute data between disks within a DataNode.
Default threshold is 10%, which means that it will ONLY re-distribute data among DataNodes when the storage usage on each DataNode differs from the overall usage of the cluster (mean usage) by more than 10%.
While running HDFS balancer you can still use the cluster and doesn't have major impact on the performance of the cluster.
Navigate to Cloudera Manager > HDFS > Configuration and search for dfs.datanode.balance.bandwidthPerSec. This determines maximum balancing bandwidth. The default is 10 MiB/sec. Cloudera recommend setting it to 100 MiB and increase as needed based on your network infrastucture. If set too high the balancer may saturate your network.
Navigate to Cloudera Manager > HDFS-> Configuration, search for DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and add the following property:
dfs.datanode.balance.max.concurrent.moves
200
Also add search for and add this property to Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
This property controls the maximum concurrent block moves performed by a DataNode.
By default, dfs.datanode.balance.max.concurrent.moves is set to 5. DataNodes limit the number of threads used for balancing so as to not eat up all the resources of the cluster/DataNode.
Increase dfs.datanode.balance.max.concurrent.moves value to higher number like 200, temporarily. Once the cluster is balanced, reduce this value to default.
Note: For changes to take effect, perform Rolling Restart of HDFS and Deploy Client Configuration. Once HDFS restarts, stop the balancer (if it was run from CLI) and re-run it.