Sangala Shekhar Reddy: Flume Master HA

Wednesday, November 9, 2016

Flume Master HA

http://archive.cloudera.com/cdh/3/flume/UserGuide/#_running_in_distributed_mode

Running in Distributed Mode:
-------------------------------------------
Distributed mode runs the Flume Master on several machines. Therefore the configuration described below should be done on every Master machine, except where noted. **

Running the Flume Master in distributed mode provides better fault tolerance than in standalone mode, and scalability for hundreds of nodes.

Configuring machines to run as part of a distributed Flume Master is nearly as simple as standalone mode. As before, flume.master.servers needs to be set, this time to a list of machines:

flume.master.servers
masterA,masterB,masterC

How many machines do I need? The distributed Flume Master will continue to work correctly as long as more than half the physical machines running it are still working and haven’t crashed. Therefore if you want to survive one fault, you need three machines (because 3-1 = 2 > 3/2). For every extra fault you want to tolerate, add another two machines, so for two faults you need five machines. Note that having an even number of machines doesn’t make the Flume Master any more fault-tolerant - four machines only tolerate one failure, because if two were to fail only two would be left functioning, which is not more than half of four. Common deployments should be well served by three or five machines.

The final property to set is not the same on every machine - every node in the Flume Master must have a unique value for flume.master.serverid.

Note. flume.master.serverid is the only Flume Master property that must be different on every machine in the ensemble. *

masterA.

flume.master.serverid
0

masterB.

flume.master.serverid
1

masterC.

flume.master.serverid
2

The value for flume.master.serverid for each node is the index of that node’s hostname in the list in flume.master.ensemble, starting at 0. For example masterB has index 1 in that list. The purpose of this property is to allow each node to uniquely identify itself to the other nodes in the Flume Master.

This is all the configuration required to start a three-node distributed Flume Master. To test this out, we can start the Master process on all three machines:

[flume@masterA] flume master

[flume@masterB] flume master

[flume@masterC] flume master
Each Master process will initially try and contact all other nodes in the ensemble. Until more than half (in this case, two) nodes are alive and contactable, the configuration store will be unable to start, and the Flume Master will not be able to read or write configuration data.

You can check the current state of the ensemble by inspecting the web page for any of the Flume Master machines which by default will be found at, for example, http://masterA:35871.

Sangala Shekhar Reddy

Wednesday, November 9, 2016

Flume Master HA

No comments:

Popular Posts