Ubuntu 16.04 SPARK 开发环境搭建-- 伪分布版 与新建一个Spark版本的WordCount

前端之家收集整理的这篇文章主要介绍了Ubuntu 16.04 SPARK 开发环境搭建-- 伪分布版 与新建一个Spark版本的WordCount前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。


Ubuntu 16.04 SPARK 开发环境搭建


这里首先是基于Hadoop 已经装好了情况下,安装SPARK.

具体Hadoop 安装 参考:点击打开链接

如果你没安装JDK 请安装,你在安装Hadoop 时候也必须安装JDK

这里也稍微写点初始工作:

1.安装JDK,下载jdk-8u111-linux-x64.tar.gz,解压到/opt/jdk1.8.0_111

下载地址:http://www.Oracle.com/technetwork/Java/javase/downloads/index.html

1)环境配置:

sudo vim/etc/profile/

在最后一行增加

export JAVA_HOME=/opt/jdk1.8.0_111

export JRE_HOME=${JAVA_HOME}/jre

exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

exportPATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH

2)输入:source /etc/profile 使得配置文件生效

3)验证java是否安装成功:java-version

看到java版本相关信息,则表示安装成功!

下面正式进入Spark 安装


Scala应用比较广泛,也需要安装


1. 安装scala,下载scala-2.10.4.tgz,

下载地址:

点击打开链接

http://www.Scala-lang.org/



其次:


下载后, 我这解压到:

/usr/local/scala-2.12.3

解压好了,接着就是配置环境变量了,执行如下命令:

sudo gedit ~/.bashrc


#scala
export SCALA_HOME=/usr/local/scala-2.12.3
export PATH=$PATH:$SCALA_HOME/bin

Source ~/.bashrc


第二步 安装 Spark:


下载地址:

点击打开链接

http://spark.apache.org/


下载后解压到:

我这里是:

/usr/local/spark-2.2.0-bin-hadoop2.7


解压后配置环境变量:

    1)环境配置:
    sudo vim  /etc/profile/
    在最后一行增加:

     export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop.2.6/
 
    2)输入:source/etc/profile   使得配置文件生效

    3)测试安装结果
 
    打开命令窗口,切换到Spark的 bin 目录:
 
    cd  /opt/spark-1.6.0-bin-hadoop.2.6/bin/
 
    执行./spark-shell,打开Scala到Spark的连接窗口,启动过程中无错误信息,出现scala>,启动成功



再看管理页面,浏览器输入:localhost:4040/


看源码:

package com.xiaoming.sparkdemo;

import java.util.Arrays;
import java.util.regex.Pattern;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;

import scala.Tuple2;

public class WordCount {

    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {

        SparkConf conf = new SparkConf().setMaster("local").setAppName("wc");
        JavaSparkContext sc = new JavaSparkContext(conf);
        
        JavaRDD<String> text = sc.textFile("hdfs://192.168.56.128:9000/user/wangxiaoming/input/bank/892/1200/20170425");
        JavaRDD<String> words = text.flatMap(new FlatMapFunction<String,String>() {
            private static final long serialVersionUID = 1L;
            @Override
            public Iterable<String> call(String line) throws Exception {
                return Arrays.asList(line.split(" "));//把字符串转化成list
            }
        });
        
        JavaPairRDD<String,Integer> pairs = words.mapToPair(new PairFunction<String,String,Integer>() {
            private static final long serialVersionUID = 1L;
            @Override
            public Tuple2<String,Integer> call(String word) throws Exception {
                return new Tuple2<String,Integer>(word,1);
            }
        });
        
        JavaPairRDD<String,Integer> results = pairs.reduceByKey(new Function2<Integer,Integer,Integer>() {            
            private static final long serialVersionUID = 1L;
            @Override
            public Integer call(Integer value1,Integer value2) throws Exception {
                return value1 + value2;
            }
        });
        
        JavaPairRDD<Integer,String> temp = results.mapToPair(new PairFunction<Tuple2<String,Integer>,String>() {
            private static final long serialVersionUID = 1L;
            @Override
            public Tuple2<Integer,String> call(Tuple2<String,Integer> tuple)
                    throws Exception {
                return new Tuple2<Integer,String>(tuple._2,tuple._1);
            }
        });
        
        JavaPairRDD<String,Integer> sorted = temp.sortByKey(false).mapToPair(new PairFunction<Tuple2<Integer,String>,Integer> call(Tuple2<Integer,String> tuple)
                    throws Exception {
                return new Tuple2<String,Integer>(tuple._2,tuple._1);
            }
        });
        
        sorted.foreach(new VoidFunction<Tuple2<String,Integer>>() {
            private static final long serialVersionUID = 1L;
            @Override
            public void call(Tuple2<String,Integer> tuple) throws Exception {
                System.out.println("word:" + tuple._1 + " count:" + tuple._2);
            }
        });
        
        sc.close();
    }
}


vim spark-env.sh

增加如下配置

export SPARK_MASTER_IP=192.168.56.128



cp spark-defaults.conf.template spark-defaults.conf

增加如下配置:

spark.master.ip                  192.168.56.128#本机ip

spark.master                     spark://192.168.56.128:7077

spark.driver.bindAddress         192.168.56.128

spark.driver.host                192.168.56.128

cp slaves.template slaves
vim slaves

增加如下配置

192.168.56.128 #设置本地ip,即为伪分布式

sh start-master.sh

原文链接:https://www.f2er.com/ubuntu/351199.html

猜你在找的Ubuntu相关文章