Wednesday, November 16, 2016

Hive Filter to skip .tmp files while quering

1. Create class implements PathFilter
package com.hivefilter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
public class FileFilterExcludeTmpFiles implements PathFilter {
        public boolean accept(Path p) {
                String name = p.getName();
                return !name.startsWith("_") && !name.startsWith(".") &&!name.endsWith(".tmp");
        }
}

2. Create jar file and Copyjar /usr/lib/hive/lib location and add below property in /etc/hive/conf/hive-site.xml file.

 
          mapred.input.pathFilter.class
          com.alu.spm.hivefilter.FileFilterExcludeTmpFiles

No comments: