Friday, April 27, 2018

How to control the number of Reducers for a particular Hive query to optimize query performance

Hive employs a simple algorithm for determining the number of Reducers to use based on the total size of the input data set.  The number of Reducers can be also be set explicitly by the developer.

-- In order to change the average load for a reducer (in bytes)
-- CDH uses a default of 67108864 (64MB)
-- That is, if the total size of the data set is 1 GB then 16 reducers will be used
set hive.exec.reducers.bytes.per.reducer=;

-- In order to set a constant number of reducers
-- Typically set to a prime close to the number of available hosts
-- Default value is -1
-- By setting this property to -1, Hive will automatically figure out what should be the number of reducers
set mapreduce.job.reduces=;

-- Hive will use this as the maximum number of reducers when automatically determining the number of reducers
-- Default value is 1099
set hive.exec.reducers.max=;

No comments: