Ubuntu 16.04 SPARK 开发环境搭建
这里首先是基于Hadoop 已经装好了情况下,安装SPARK.
具体Hadoop 安装 参考:点击打开链接
如果你没安装JDK 请安装,你在安装Hadoop 时候也必须安装JDK
这里也稍微写点初始工作:
1.安装JDK,下载jdk-8u111-linux-x64.tar.gz,解压到/opt/jdk1.8.0_111
下载地址:http://www.Oracle.com/technetwork/Java/javase/downloads/index.html
1)环境配置:
sudo vim/etc/profile/
在最后一行增加:
export JAVA_HOME=/opt/jdk1.8.0_111
export JRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
exportPATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH
2)输入:source /etc/profile 使得配置文件生效
3)验证java是否安装成功:java-version
看到java版本相关信息,则表示安装成功!
下面正式进入Spark 安装
Scala应用比较广泛,也需要安装
1. 安装scala,下载scala-2.10.4.tgz,
下载地址:
其次:
下载后, 我这解压到:
/usr/local/scala-2.12.3
解压好了,接着就是配置环境变量了,执行如下命令:
sudo gedit ~/.bashrc #scala export SCALA_HOME=/usr/local/scala-2.12.3 export PATH=$PATH:$SCALA_HOME/bin Source ~/.bashrc
第二步 安装 Spark:
下载地址:
下载后解压到:
我这里是:
/usr/local/spark-2.2.0-bin-hadoop2.7
解压后配置环境变量:
1)环境配置: sudo vim /etc/profile/ 在最后一行增加: export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop.2.6/ 2)输入:source/etc/profile 使得配置文件生效 3)测试安装结果 打开命令窗口,切换到Spark的 bin 目录: cd /opt/spark-1.6.0-bin-hadoop.2.6/bin/ 执行./spark-shell,打开Scala到Spark的连接窗口,启动过程中无错误信息,出现scala>,启动成功
看源码:
package com.xiaoming.sparkdemo; import java.util.Arrays; import java.util.regex.Pattern; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.api.java.function.VoidFunction; import scala.Tuple2; public class WordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setMaster("local").setAppName("wc"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> text = sc.textFile("hdfs://192.168.56.128:9000/user/wangxiaoming/input/bank/892/1200/20170425"); JavaRDD<String> words = text.flatMap(new FlatMapFunction<String,String>() { private static final long serialVersionUID = 1L; @Override public Iterable<String> call(String line) throws Exception { return Arrays.asList(line.split(" "));//把字符串转化成list } }); JavaPairRDD<String,Integer> pairs = words.mapToPair(new PairFunction<String,String,Integer>() { private static final long serialVersionUID = 1L; @Override public Tuple2<String,Integer> call(String word) throws Exception { return new Tuple2<String,Integer>(word,1); } }); JavaPairRDD<String,Integer> results = pairs.reduceByKey(new Function2<Integer,Integer,Integer>() { private static final long serialVersionUID = 1L; @Override public Integer call(Integer value1,Integer value2) throws Exception { return value1 + value2; } }); JavaPairRDD<Integer,String> temp = results.mapToPair(new PairFunction<Tuple2<String,Integer>,String>() { private static final long serialVersionUID = 1L; @Override public Tuple2<Integer,String> call(Tuple2<String,Integer> tuple) throws Exception { return new Tuple2<Integer,String>(tuple._2,tuple._1); } }); JavaPairRDD<String,Integer> sorted = temp.sortByKey(false).mapToPair(new PairFunction<Tuple2<Integer,String>,Integer> call(Tuple2<Integer,String> tuple) throws Exception { return new Tuple2<String,Integer>(tuple._2,tuple._1); } }); sorted.foreach(new VoidFunction<Tuple2<String,Integer>>() { private static final long serialVersionUID = 1L; @Override public void call(Tuple2<String,Integer> tuple) throws Exception { System.out.println("word:" + tuple._1 + " count:" + tuple._2); } }); sc.close(); } }
vim spark-env.sh 增加如下配置 export SPARK_MASTER_IP=192.168.56.128
cp spark-defaults.conf.template spark-defaults.conf 增加如下配置: spark.master.ip 192.168.56.128#本机ip spark.master spark://192.168.56.128:7077 spark.driver.bindAddress 192.168.56.128 spark.driver.host 192.168.56.128 cp slaves.template slaves
vim slaves 增加如下配置 192.168.56.128 #设置本地ip,即为伪分布式
sh start-master.sh