If you are like me, who loves to have everything you are developing against working locally in a mini-integration environment, read on
Here, we attempt to get some pretty heavy-weight stuff working locally on your mac, namely
Tip 2: Don't accidentally, name your Spark install dir, SPARK_HOME, Hive does things with it, which you may not like.
Here, we attempt to get some pretty heavy-weight stuff working locally on your mac, namely
- Hadoop (Hadoop2/HDFS)
- YARN (So you can submit MR jobs)
- Spark (We will illustrate with Spark Shell, but should work on YARN mode as well)
- Hive (So we can create some tables and play with it)
We will use the latest stable Cloudera distribution, and work off the jars. Most of the methodology is borrowed from here, we just link the four pieces together nicely in this blog.
Download Stuff
First off all, make sure you have Java 7/8 installed, with JAVA_HOME variable setup to point to the correct location. You have to download the CDH tarballs for Hadoop, Zookeeper, Hive from the tarball page (CDH 5.4.x page) and untar them under a folder (refered to as CDH_HOME going forward) as hadoop, zookeeper
While you are at it, also grab what version of Spark (pre-built for Hadoop 2.6x) from here, and untar to a directory like below, which we will call $SPARK_INSTALL
You may also want to setup a bunch of variables early on, to be of use later
$ ls $HOME/bin/cdh/5.4.7 hadoop hadoop-2.6.0-cdh5.4.7.tar.gz hive-1.1.0-cdh5.4.7 hive-1.1.0-cdh5.4.7.tar.gz zookeeper zookeeper-3.4.5-cdh5.4.7.tar.gz
While you are at it, also grab what version of Spark (pre-built for Hadoop 2.6x) from here, and untar to a directory like below, which we will call $SPARK_INSTALL
$ ls $HOME/bin/spark-1.5.0-bin-hadoop2.6/ CHANGES.txt LICENSE NOTICE R README.md RELEASE bin conf data ec2 examples lib python sbin
You may also want to setup a bunch of variables early on, to be of use later
CDH="5.4.7" export HADOOP_HOME="$HOME/bin/cdh/${CDH}/hadoop" export ZK_HOME="$HOME/bin/cdh/${CDH}/zookeeper" export SPARK_INSTALL="$HOME/bin/spark-1.5.0-bin-hadoop2.6" export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:${SPARK_INSTALL}/bin:${PATH}
Tip 1: If you are using jenv to manage your versions, then you might need the following additional lines in your .bashrc/.bash_profile.
eval "$(jenv init -)" export JAVA_HOME="$HOME/.jenv/versions/`jenv version-name`" alias jenv_set_java_home='export JAVA_HOME="$HOME/.jenv/versions/`jenv version-name`"'
Tip 2: Don't accidentally, name your Spark install dir, SPARK_HOME, Hive does things with it, which you may not like.
Setup Hadoop/YARN
The page we pointed to before, is an excellent resource for doing this already, I will just point out some additional configs I had to add, as I brought in Hive, to make things easier to debug
To etc/hadoop/core-site.xml (to let Hive queries impersonate)
<property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
To etc/hadoop/yarn-site.xml (to let Hive queries leave a debuggable log)
<property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://localhost:8020/tmp/yarn-logs</value> <description>Where to aggregate logs to.</description> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>3600</value> <description>Number of seconds to retain logs for</description> </property>
Make sure, you can start HDFS & YARN locally
Setup Hive
Go into the CDH_HOME/hive-1.1.0-cdh5.4.7 folder and follow the quickstart to build Hive. Basically a command like below
Once you are past the basic steps of quickstart, make a hive-site.xml like below and copy to your hadoop install
Once this is done, you should be able to start a metastore server
Open up a cli (create table & do a small query)
Voila!! (not really a quick thing to do, but once you have done it once, then you can setup debugger etc and its all golden)
mvn clean package -Phadoop-2,dist
Once you are past the basic steps of quickstart, make a hive-site.xml like below and copy to your hadoop install
$ cat $HADOOP_HOME/etc/hadoop/hive-site.xml <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://localhost:10000</value> </property> </configuration>
Once this is done, you should be able to start a metastore server
[apache-hive-1.1.0-cdh5.4.7-bin]$ bin/hive --service metastore -p 10000
Open up a cli (create table & do a small query)
[apache-hive-1.1.0-cdh5.4.7-bin]$ bin/hive --hiveconf hive.metastore.uris=thrift://localhost:10000 readlink: illegal option -- f usage: readlink [-n] [file ...] WARNING: Hive CLI is deprecated and migration to Beeline is recommended. hive> CREATE TABLE pokes (foo INT, bar STRING); OK Time taken: 0.651 seconds
hive> select count(*) from pokes; Query ID = vinoth_20160523115454_527e550c-7318-4ffc-a49f-248ca119c5a8 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-05-23 11:54:28,958 Stage-1 map = 0%, reduce = 0% 2016-05-23 11:54:34,119 Stage-1 map = 100%, reduce = 0% 2016-05-23 11:54:39,249 Stage-1 map = 100%, reduce = 100% Ended Job = job_1464029642280_0001 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 6476 HDFS Write: 2 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 0 Time taken: 18.299 seconds, Fetched: 1 row(s) hive>
Setup Spark
Spark is super simple, just need to point Spark to the Hadoop installation, that has not only the Hadoop configs, but also the Hive config (this is why we cp-ed hive-site.xml before)$ cd $SPARK_INSTALL $ export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop $ spark-shell --driver-class-path $HADOOP_CONF_DIR scala> sqlContext.sql("show tables").show() scala> sqlContext.sql("describe pokes").show() scala> sqlContext.sql("select count(*) from pokes").show()
Voila!! (not really a quick thing to do, but once you have done it once, then you can setup debugger etc and its all golden)
iso registration in delhi
ReplyDeleteiso 22000 certification cost
ISO 9001 Certification in Noida
website designing services
SEO Service Consultant
iso certification in noida
ReplyDeleteiso certification in delhi
ce certification in delhi
iso 14001 certification in delhi
iso 22000 certification in delhi
iso consultants in noida
we have provide the best fridge repair service.
ReplyDeletefridge repair in faridabad
Videocon Fridge Repair in Faridabad
Whirlpool Fridge Repair in Faridabad
Hitachi Fridge Repair In Faridabad
Washing Machine Repair in Noida
godrej washing machine repair in noida
whirlpool Washing Machine Repair in Noida
IFB washing Machine Repair in Noida
LG Washing Machine Repair in Noida
we have provide the best ppc service.
ReplyDeleteppc company in gurgaon
website designing company in Gurgaon
PPC company in Noida
seo company in gurgaon
PPC company in Mumbai
PPC company in Chandigarh
Digital Marketing Company
Rice Bags Manufacturers
ReplyDeletePouch Manufacturers
wall putty bag manufacturers
fertilizer bag manufacturers
seed bag manufacturers
gusseted bag manufacturers
bopp laminated bags manufacturer
Lyrics with music
Exotic cart is a well known prefilled THC oil cartridge, and stoners from the west to east drift use them. ... Our cartscontain premium lab tried THC Oil which these days, is hard to find. Our pre-filled cannabis cartridges are ideal for vaping Maryjane in a hurry.
ReplyDeletemario carts
exotic carts
dankwoods for sale
brass knuckles vape
buy carts
mario carts flavors
buy pink rutz
mario carts online
mario cartridges
mario carts for sale
stiiizy pod
space monkey meds
organic smart carts
mario carts thc
smart carts
buy blue dream online
buy mario carts vape
buy runtz
buy white runtz
710 kingpin vapes
moonrock clear carts
kingpen gelato
buy Ak-47
buy skywalker og online
runtz
rove carts
cereal cart
buy weed
dankvapes
space monkey meds
dank vapes
organic smart carts
mario carts thc
smart carts
buy mario carts cartridges
710 kingpin vapes
buy afghan haze
buy og kush
buy white runtz
Casino Hotel - Jammy
ReplyDeleteEnjoy all of the fun, entertainment and indulgence you can expect from a 평택 출장안마 resort right at the center 오산 출장안마 of New Jersey. 시흥 출장샵 Enjoy the all-new casino features and 경주 출장샵 more 울산광역 출장마사지
Excellent post. I really enjoy reading and also appreciate your work. I will keep visiting this blog.Keep sharing this kind of articles, are you looking to buy weed online ? therefore;
ReplyDeletebuy shark cake strain
buy weed online
buy shoreline strain
buy joy strain
buy marijuana online
buy sherbacio strain
buy Forbidden fruit strain
buy God’s Gift strain
buy black orchid strain