1、直接下载的安装包无法读取hive的数据,需要下载源码包,然后进行编译。
下载源码包:spark-2.3.3.tgz
解压:tar -xzvf spark-2.3.3.tgz
进入目录:cd spark-2.3.3
./dev/make-distribution.sh --tgz --name with-hive -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.7 -Phive -Phive-thriftserver -DskipTests
--name:编译后的名字
-Phadoop-2.7 -Dhadoop.version=2.7.7 :对应系统安装的hadoop版本
-Phive -Phive-thriftserver:支持hive
-DskipTests:跳过测试错误
编译后的包:spark-2.3.3-bin-with-hive.tgz
2、解压编译后的包:spark-2.3.3-bin-with-hive.tgz
tar -xvzf spark-2.3.3-bin-with-hive.tgz
ln -s spark-2.3.3-bin-with-hive.tgz spark-2.3.3-bin-with-hive
3、修改变量
cd ./spark/conf/
cp spark-env.sh.template ./conf/spark-env.sh
vim spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) export JAVA_HOME=/usr/lib/jvm/default-java export CLASSPATH=$CLASSPATH:/usr/local/hive/lib export SCALA_HOME=/usr/local/scala export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/hive/lib/mysql-connector-java-5.1.40-bin.jar
4、把hive的hive-site.xml文件拷贝到spark的conf下
cp hive/conf/hive-site.xml spark/conf/
5、通过命令进入spark-shell
cd /spark/
./bin/spark-shell
scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext
表面spark已经支持读取hive中的数据了
6、开始读取hive中的数据
./bin/spark-shell
sql("show databases;").show(2)