HDFS - Edit logs and FsImage
What is DistCp? - It is a tool that is used for copying a very large amount of data to and from Hadoop file systems in parallel
Default replication factor? By default, the replication factor is 3. There are no two copies that will be on the same data node. Usually, the first two copies will be on the same rack, and the third copy will be off the shelf. It is advised to set the replication factor to at least three so that one copy is always safe, even if something happens to the rack.
The two types of metadata that NameNode server stores are in Disk and RAM.
Edit logs: all transaction logs like added block, deleted blocks, replication details after last fsimage file created. The updates are periodically merged to FSImage.
FsImage: file stored in OS system and contains the complete directory stucture of hdfs. Name node reads when starting to memory
No comments:
Post a Comment