Wednesday, November 9, 2016

scheduling workflows using oozie coordinator

1. Create folder and create lib under new folder in hdfs
hadoop fs -mkdir /user/oozie/shekhar
hadoop fs -mkdir /user/oozie/shekhar/lib
2. Copy workflow.xml and coordinator.xml file to hdfs location /user/oozie/shekhar
hadoop fs -put workflow.xml /user/oozie/shekhar/
3. Copy your jars to hdfs location /user/oozie/shekhar/lib
hadoop fs -put sparkanalitics-1.jar /user/oozie/shekhar/lib/
4. Create coordinatorjob.properties and simpleSparkTest.sh file in current dir (not in HDFS)
5. Run using oozie command in current dir
$ oozie job -oozie http://localhost:11000/oozie -config coordinatorjob.properties -submit
job: 0000673-120823182447665-oozie-hado-C

logs will generates in /root/oozie-oozi/0000673-120823182447665-oozie-hado-C folder

Suspending the coordinator job
$ oozie job -oozie http://localhost:11000/oozie -suspend 0000673-120823182447665-oozie-hado-C
Resuming a Coordinator Job
$ oozie job -oozie http://localhost:11000/oozie -resume 0000673-120823182447665-oozie-hado-C
Killing a Coordinator Job
$ oozie job -oozie http://localhost:11000/oozie -kill 0000673-120823182447665-oozie-hado-C
Rerunning a Coordinator Action or Multiple Actions
$ oozie job -rerun 0000673-120823182447665-oozie-hado-C [-nocleanup]
[-refresh][-action 1,3-5] [-date 2012-01-01T01:00Z::2012-05-31T23:59Z, 2012-11-10T01:00Z, 2012-12-31T22:00Z]
-action or -date is required to rerun. If neither -action nor -date is given, the exception will be thrown.

Checking the Status of a Coordinator/Workflow job or a Coordinator Action
$ oozie job -oozie http://localhost:11000/oozie -info 0000673-20823182447665-oozie-hado-C

The info option can display information about a workflow job or coordinator job or coordinator action.
============================
workflow.xml


   
   
       
            root@cooper.alcatel.com
            /var/lib/hadoop-hdfs/simpleSparkTest.sh
       
       
       
   
   
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
   
   

=============================
coordinator.xml




${workflowPath}


       
============================
simpleSparkTest.sh

cd /var/lib/hadoop-hdfs
spark-submit --jars ./utils-common-1.0.0.jar --master yarn --class alu.ausdc.analitics.sparkEx.log4j.Main sparkanalitics-1.jar 20-01-2015
===============================

coordinatorjob.properties

#run for every 5 sec
frequency=5
startTime=2012-08-31T20\:20Z
endTime=2013-08-31T20\:20Z
timezone=GMT+0530

nameNode=hdfs://cooper.alcatel.com:8020
jobTracker=cooper.alcatel.com:8021
queueName=default
inputDir=${nameNode}/data.in
outputDir=${nameNode}/out
user.name=training
# give workflow xml path here
workflowPath=${nameNode}/user/oozie/shekhar
# give coordinator xml path here
oozie.coord.application.path=${nameNode}/user/oozie/shekhar

No comments: