Friday, May 26, 2017

Apache Phoenix Bulk Loading

Here is an example:
Create us_population.sql file
CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
Create us_population.csv file
NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332
Create us_population_queries.sql file
SELECT state as “State”,count(city) as “City Count”,sum(population) as “Population Sum” FROM us_population GROUP BY state ORDER BY sum(population) DESC;
Execute the following command from a command terminal
/opt//psql.py us_population.sql us_population.csv us_population_queries.sql
Output:
csv columns from database.
CSV Upsert complete. 10 rows upserted
Time: 0.181 sec(s)
St City Count Population Sum
-- ---------------------------------------- ----------------------------------------
NY 1 8143197
CA 3 6012701
TX 3 4486916
IL 1 2842518
PA 1 1463281
AZ 1 1461575
Time: 0.03 sec(s)
For more information click here.

HBASE Tools

Apache phoenix 

Apache phoenix is an SQL interface on the top of HBASE. For more information click here
You can connect by executing the following command under bin.
python sqlline.py zookeeper_url 
The Phoenix client driver can be found here. You can  download and setup SQuirrel as your SQL client so you can issue adhoc SQL against your HBase cluster

HBASE Stargate  

HBASE startgate is the rest interface of HBASE. for more information click here.

Apache Phoenix Bulk Loading 

Here is an example:
Create us_population.sql file
CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
Create us_population.csv file
NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332
Create us_population_queries.sql file
SELECT state as “State”,count(city) as “City Count”,sum(population) as “Population Sum” FROM us_population GROUP BY state ORDER BY sum(population) DESC;
Execute the following command from a command terminal
/opt//psql.py us_population.sql us_population.csv us_population_queries.sql
Output:
csv columns from database.
CSV Upsert complete. 10 rows upserted
Time: 0.181 sec(s)
St City Count Population Sum
-- ---------------------------------------- ----------------------------------------
NY 1 8143197
CA 3 6012701
TX 3 4486916
IL 1 2842518
PA 1 1463281
AZ 1 1461575
Time: 0.03 sec(s)
For more information click here.

HBASE ImportTSV 

ImportTsv is a utility that will load data in TSV format into HBase. It can be used for the HBASE tables those are not managed by phoenix.
Please use this link for more information.

Enabling Jmx for Tomcat

Create (or append to) the file /opt/mas/tomcat/bin/setenv.sh
#!/bin/sh
CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote.port=1701 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false \
 -Dcom.sun.management.jmxremote.password.file=/opt/mas/jmx/tomcat.password -Dcom.sun.management.jmxremote.access.file=/opt/mas/jmx/jmx.access"
then run the following commands:
chown tomcat:tomcat /opt/mas/jmx/tomcat.password /opt/mas/tomcat/bin/setenv.sh
/etc/init.d/tomcat_8080 stop
/etc/init.d/tomcat_8080 start

Enable JMX metrics for Spark.

Steps to enable JMX metrics for Spark.

  1. For basic JVM metrics, append the following line to the file /etc/spark/conf.mas/spark-env.sh:
# Enable HistoryServer JMX connections on port 9991
HADOOP_JOB_HISTORYSERVER_OPTS="-Dcom.sun.management.jmxremote.port=9991 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/mas/jmx/mapred.password -Dcom.sun.management.jmxremote.access.file=/opt/mas/jmx/jmx.access"


  1. For Spark Mater webUI update spark-env.sh
           export SPARK_MASTER_WEBUI_PORT=18080
       3.   For more metrics update metrics.properties
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Wednesday, May 17, 2017

Configuring Zabbix Monitoring For All Hadoop Services (Zookeeper,Spark, namenode, datanode , job history server , hdfs journal node, hive and HBase)

Below info document Zabbix monitoring configuration for all Hadoop services like Zookeeper,Spark, namenode, datanode , job history server , hdfs journal node, hive and HBase with respect file changes.

For Zookeeper:
vi /etc/default/zookeeper
export SERVER_JVMFLAGS="-Dcom.sun.management.jmxremote.port=1703 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/zookeeper.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"
---------------------------------
Spark:
vi /etc/spark/conf/metrics.properties
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
---------------------------------
For namenode, datanode and job history server and hdfs journal node:
vi /etc/hadoop/conf/hadoop-env.sh

# Use YARN for all hadoop commands
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

# Enable NameNode JMX connections on port 9980
HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=9980 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/hdfs.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"

# Enable DateNode JMX connections on port 9981
HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote.port=9981 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/hdfs.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"

# Enable JournalNode JMX connections on port 1704
HADOOP_JOURNALNODE_OPTS="${HADOOP_JOURNALNODE_OPTS} -Dcom.sun.management.jmxremote.port=1704 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/hdfs.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"

# Enable Job History Server JMX connections on port 9991
HADOOP_JOB_HISTORYSERVER_OPTS="${HADOOP_JOB_HISTORYSERVER_OPTS} -Dcom.sun.management.jmxremote.port=9991 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/mapred.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"

----------------------------
For resourcemanager, nodemanager:
vi /etc/hadoop/conf/yarn-env.sh
YARN_RESOURCEMANAGER_OPTS="-Dcom.sun.management.jmxremote.port=9983 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/yarn.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"

YARN_NODEMANAGER_OPTS="-Dcom.sun.management.jmxremote.port=9984 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/yarn.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"
---------------------------------
For Hive

vi /etc/hive/conf/hive-env.sh
JMX_OPTS="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/hive.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"
if [ "$SERVICE" = "hiveserver2" ]; then
  export HADOOP_OPTS="$HADOOP_OPTS $JMX_OPTS -Dcom.sun.management.jmxremote.port=1705 -Dproc_hiveserver"
elif [ "$SERVICE" = "metastore" ]; then
  export HADOOP_OPTS="$HADOOP_OPTS $JMX_OPTS -Dcom.sun.management.jmxremote.port=1706 -Dproc_metastore"
fi
----------------------------------
For HBase:

vi /etc/hbase/conf/hbase-env.sh
HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.password.file=/opt/shekhar/jmx/hbase.password -Dcom.sun.management.jmxremote.access.file=/opt/shekhar/jmx/jmx.access"
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
#export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"


Phoenix 4.7 integration issue with Spark 2.1.0

Steps to integrate Phoenix 4.7 with spark2.1.0
step 1. coping below files into /usr/lib/spark/lib
phoenix-core-4.7.0-HBase-1.0.jar
phoenix-server-4.7.0-HBase-1.0.jar
phoenix-spark-4.7.0-HBase-1.0.jar
phoenix-HBase-client.jar
Step2. creating soft links in /usr/lib/spark/lib
phoenix-HBase-client.jar -> /usr/lib/spark/lib/phoenix-server-client-4.7.0-HBase-1.0.jar
phoenix-spark-HBase.jar -> /usr/lib/spark/lib/phoenix-spark-4.7.0-HBase-1.0.jar
I had integration issue with jackson-core-XXX.jar files. As Spark 2.1 using 2.6.5 version of Jackson apis but Phonenix-Core using 1.9.13. phoneix-core module pom.xml having true so the output of MANIFEST file has below content and Class-Path configured with older version of jackson api and start getting exception like some of the methods not available. I solved the issue by removing true and building Phonenix-Core module then Class-path attribute not generated in Manifest file so it's started using Spark version of jars in run time.
Manifest file with true:
Manifest-Version: 1.0
Implementation-Title: Phoenix Core
Implementation-Version: 4.7.0-HBase-1.0
Archiver-Version: Plexus Archiver
Built-By: mas_user
Specification-Vendor: Apache Software Foundation
Specification-Title: Phoenix Core
Implementation-Vendor-Id: org.apache.phoenix
Class-Path: tephra-api-0.7.0.jar tephra-core-0.7.0.jar gson-2.2.4.jar
guice-3.0.jar javax.inject-1.jar aopalliance-1.0.jar guice-assistedin
ject-3.0.jar libthrift-0.9.0.jar fastutil-6.5.6.jar twill-common-0.6.
0-incubating.jar twill-core-0.6.0-incubating.jar twill-api-0.6.0-incu
bating.jar asm-all-5.0.2.jar twill-discovery-api-0.6.0-incubating.jar
twill-discovery-core-0.6.0-incubating.jar twill-zookeeper-0.6.0-incu
bating.jar metrics-core-3.1.0.jar tephra-hbase-compat-1.0-0.7.0.jar a
ntlr-3.5.jar ST4-4.0.7.jar antlr-runtime-3.5.jar stringtemplate-3.2.1
.jar antlr-2.7.7.jar jline-2.11.jar sqlline-1.1.8.jar guava-13.0.1.ja
r joda-time-1.6.jar annotations-1.3.2.jar jackson-core-asl-1.9.13.jar
jackson-mapper-asl-1.9.13.jar junit-4.12.jar hamcrest-core-1.3.jar p
rotobuf-java-2.5.0.jar httpclient-4.0.1.jar httpcore-4.0.1.jar common
s-logging-1.2.jar log4j-1.2.17.jar slf4j-api-1.6.4.jar snappy-0.3.jar
htrace-core-3.1.0-incubating.jar netty-all-4.0.23.Final.jar commons-
codec-1.7.jar commons-collections-3.2.1.jar commons-csv-1.0.jar jsr30
5-2.0.1.jar slf4j-log4j12-1.7.5.jar hadoop-auth-2.6.0-cdh5.4.8.jar ap
acheds-kerberos-codec-2.0.0-M15.jar apacheds-i18n-2.0.0-M15.jar api-a
sn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar curator-framework-2.7.1.
jar hadoop-core-2.6.0-mr1-cdh5.4.8.jar hsqldb-1.8.0.10.jar core-3.1.1
.jar hadoop-hdfs-2.6.0-cdh5.4.8.jar commons-daemon-1.0.13.jar leveldb
jni-all-1.8.jar hadoop-hdfs-2.6.0-cdh5.4.8-tests.jar findbugs-annotat
ions-1.3.9-1.jar jersey-client-1.9.jar commons-cli-1.2.jar commons-ma
th-2.1.jar commons-lang-2.6.jar hbase-common-1.0.0-cdh5.4.8.jar hbase
-annotations-1.0.0-cdh5.4.8.jar commons-io-2.4.jar jetty-util-6.1.26.
cloudera.4.jar hbase-protocol-1.0.0-cdh5.4.8.jar hbase-client-1.0.0-c
dh5.4.8.jar netty-3.6.6.Final.jar zookeeper-3.4.5-cdh5.4.8.jar jcodin
gs-1.0.8.jar hbase-server-1.0.0-cdh5.4.8.jar commons-httpclient-3.1.j
ar jersey-core-1.9.jar jersey-server-1.9.jar asm-3.1.jar jetty-6.1.26
.cloudera.4.jar jetty-sslengine-6.1.26.cloudera.4.jar jsp-2.1-6.1.14.
jar jsp-api-2.1-6.1.14.jar servlet-api-2.5-6.1.14.jar jasper-compiler
5.5.23.jar jasper-runtime-5.5.23.jar jamon-runtime-2.3.1.jar jersey
json-1.9.jar jettison-1.1.jar jaxb-impl-2.2.3-1.jar jaxb-api-2.1.jar
activation-1.1.jar hbase-server-1.0.0-cdh5.4.8-tests.jar hadoop-commo
n-2.6.0-cdh5.4.8.jar commons-math3-3.1.1.jar xmlenc-0.52.jar commons-
net-3.1.jar servlet-api-2.5.jar jsp-api-2.1.jar commons-el-1.0.jar je
ts3t-0.9.0.jar java-xmlbuilder-0.4.jar commons-configuration-1.6.jar
commons-digester-1.8.jar commons-beanutils-1.7.0.jar commons-beanutil
s-core-1.8.0.jar avro-1.7.6-cdh5.4.8.jar paranamer-2.3.jar jsch-0.1.4
2.jar curator-client-2.7.1.jar curator-recipes-2.7.1.jar htrace-core-
3.0.4.jar commons-compress-1.4.1.jar xz-1.0.jar hadoop-annotations-2.
6.0-cdh5.4.8.jar hadoop-mapreduce-client-core-2.6.0-cdh5.4.8.jar hado
op-yarn-common-2.6.0-cdh5.4.8.jar jersey-guice-1.9.jar guice-servlet-
3.0.jar hadoop-yarn-api-2.6.0-cdh5.4.8.jar joni-2.1.2.jar
Implementation-Vendor: Apache Software Foundation
Created-By: Apache Maven 3.3.9
Build-Jdk: 1.8.0_65
Specification-Version: 4.7.0-HBase-1.0
Manifest file without true:
Manifest-Version: 1.0
Implementation-Title: Phoenix Core
Implementation-Version: 4.7.0-HBase-1.0
Archiver-Version: Plexus Archiver
Built-By: mas_user
Specification-Vendor: Apache Software Foundation
Specification-Title: Phoenix Core
Implementation-Vendor-Id: org.apache.phoenix
Implementation-Vendor: Apache Software Foundation
Created-By: Apache Maven 3.3.9
Build-Jdk: 1.8.0_65
Specification-Version: 4.7.0-HBase-1.0
thanks,
Shekhar Reddy