Tuesday, July 16, 2024

HDFS - Edit logs and FsImage

 HDFS - Edit logs and FsImage

What is DistCp? - It is a tool that is used for copying a very large amount of data to and from Hadoop file systems in parallel

Default replication factor? By default, the replication factor is 3. There are no two copies that will be on the same data node. Usually, the first two copies will be on the same rack, and the third copy will be off the shelf. It is advised to set the replication factor to at least three so that one copy is always safe, even if something happens to the rack.

The two types of metadata that NameNode server stores are in Disk and RAM.

Edit logs: all transaction logs like added block, deleted blocks, replication details after last fsimage file created. The updates are periodically merged to FSImage.

FsImage: file stored in OS system and contains the complete directory stucture of hdfs. Name node reads when starting to memory


No comments: