基于CentOS的Hadoop和Spark分布式集群搭建过程

前端之家收集整理的这篇文章主要介绍了基于CentOS的Hadoop和Spark分布式集群搭建过程前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

1. 软件版本,IP地址,修改主机名hosts文件

(1)软件版本:CentOS 7.0;JDK 8u141;Hadoop 2.7.3;Scala 2.11.8;Spark 2.2.0。

(2)IP地址:192.168.106.128(主节点);192.168.106.129(从节点);192.168.106.130(从节点)。

配置服务器IP地址,如下所示:

vim /etc/sysconfig/network-scripts/ifcfg-ens33 
BOOTPROTO=static
ONBOOT=yes #系统启动时是否激活网卡
IPADDR=192.168.106.xxx
GATEWAY=192.168.106.2
NETMASK=255.255.255.0
DNS1=192.168.106.2
NM_CONTROLLED=no #如果为yes,那么实时生效
service network restart

(3)修改主机名:hostnamectl set-hostname "hostname"。

(4)修改/etc/hosts文件:192.168.106.128 Master;192.168.106.129 Slave1;192.168.106.130 Slave2。


2. 配置SSH无密码登录

(1)yum install openssh-server;yum install openssh-clients

(2)ssh-keygen -t rsa -P ''

(3)sudo vim /etc/ssh/sshd_config,如下所示:

RSAAuthentication yes #启用RSA认证
PubkeyAuthentication yes #启用公钥私钥配对认证方式
AuthorizedKeysFile .ssh/authorized_keys #公钥文件路径

(4)拷贝公钥

scp ssw@slave1:~/.ssh/id_rsa.pub  ~/.ssh/slave1_rsa.pub
scp ssw@slave2:~/.ssh/id_rsa.pub  ~/.ssh/slave2_rsa.pub
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat slave1_rsa.pub >> authorized_keys
cat slave2_rsa.pub >> authorized_keys
scp authorized_keys ssw@slave1:~/.ssh/
scp authorized_keys ssw@slave2:~/.ssh/
(5)sudo chmod 700 ~/.ssh;sudo chmod 600 ~/.ssh/authorized_keys;service sshd restart

说明:除(4)仅在Master上操作外,其它均在Master,Slave1,Slave2上操作。


3. Java和Scala环境搭建

(1)Java环境搭建

编辑/etc/profile,如下所示:

export JAVA_HOME=/usr/local/jdk1.8.0_141
export JRE_HOME=/usr/local/jdk1.8.0_141/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH

(2)Scala环境搭建

编辑/etc/profile,如下所示:

export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH


4. Hadoop环境搭建

(1)编辑/etc/profile,如下所示:

export HADOOP_HOME=/opt/hadoop-2.7.3/
 export PATH=$PATH:$HADOOP_HOME/bin
 export PATH=$PATH:$HADOOP_HOME/sbin
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export HADOOP_ROOT_LOGGER=INFO,console
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

(2)编辑$HADOOP_HOME/etc/hadoop/hadoop-env.sh,如下所示:

export JAVA_HOME=/usr/local/jdk1.8.0_141
(3)编辑$HADOOP_HOME/etc/hadoop/slaves,如下所示:
Slave1
Slave2
(4)编辑$HADOOP_HOME/etc/hadoop/core-site.xml,如下所示:
<configuration>
	<property>
		<name>fs.defaultFS</name>
          	<value>hdfs://Master:9000</value>
     	</property>
     	<property>
          	<name>hadoop.tmp.dir</name>
          	<value>/opt/hadoop-2.7.3/tmp</value>
     	</property>
</configuration>
(5)编辑$HADOOP_HOME/etc/hadoop/hdfs-site.xml,如下所示:
<configuration>
	<property>
      		<name>dfs.namenode.secondary.http-address</name>
      		<value>Master:9001</value>
    	</property>
    	<property>
      		<name>dfs.replication</name>
      		<value>1</value>
    	</property>
    	<property>
      		<name>dfs.namenode.name.dir</name>
      		<value>file:/opt/hadoop-2.7.3/hdfs/name</value>
    	</property>
    	<property>
      		<name>dfs.datanode.data.dir</name>
      		<value>file:/opt/hadoop-2.7.3/hdfs/data</value>
    	</property>
</configuration>
(6)编辑$HADOOP_HOME/etc/hadoop/mapred-site.xml,如下所示:
<configuration>
	<property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>master:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>master:19888</value>
        </property>
</configuration>
(7)编辑$HADOOP_HOME/etc/hadoop/yarn-site.xml,如下所示:
<configuration>

<!-- Site specific YARN configuration properties -->
	<property>
        	<name>yarn.resourcemanager.hostname</name>
         	<value>Master</value>
     	</property>
     	<property>
         	<name>yarn.nodemanager.aux-services</name>
         	<value>mapreduce_shuffle</value>
     	</property>
     	<property>
        	<name>yarn.resourcemanager.address</name>
         	<value>Master:8032</value>
     	</property>
     	<property>
         	<name>yarn.resourcemanager.scheduler.address</name>
         	<value>Master:8030</value>
     	</property>
     	<property>
         	<name>yarn.resourcemanager.resource-tracker.address</name>
         	<value>Master:8031</value>
     	</property>
     	<property>
         	<name>yarn.resourcemanager.admin.address</name>
         	<value>Master:8033</value>
     	</property>
     	<property>
         	<name>yarn.resourcemanager.webapp.address</name>
         	<value>Master:8088</value>
     	</property>
</configuration>
(8)配置Slave1和Slave2
scp -r /opt/hadoop-2.7.3 root@Slave1:/opt
scp -r /opt/hadoop-2.7.3 root@Slave2:/opt
(9)启动Hadoop集群
sudo chmod -R a+w /opt/hadoop-2.7.3
hadoop namenode -format
/opt/hadoop-2.7.3/sbin/start-all.sh
(10)查看集群是否启动成功

Master:SecondaryNameNode;ResourceManager;NameNode。Slave:NodeManager;Datanode。


5. Spark环境搭建

(1)编辑/etc/profie,如下所示:

export SPARK_HOME=/opt/spark-2.2.0
export PATH=$PATH:$SPARK_HOME/bin
(2)编辑$SPARK_HOME/conf/spark-env.sh,如下所示:
export JAVA_HOME=/usr/local/jdk1.8.0_141
export SCALA_HOME=/usr/local/scala-2.11.8
export HADOOP_HOME=/opt/hadoop-2.7.3
export SPARK_HOME=/opt/spark-2.2.0
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
export SPARK_MASTER_IP=192.168.106.128
export SPARK_DRIVER_MEMORY=1G
(3)编辑$SPARK_HOME/conf/slaves,如下所示:
Master
Slave1
Slave2
(4)配置Slave1和Slave2
scp -r /opt/spark-2.2.0 root@Slave1:/opt
scp -r /opt/spark-2.2.0 root@Slave2:/opt
(5)启动Spark集群
/opt/spark-2.2.0/sbin/start-all.sh

(6)查看集群是否启动成功

Master:Master,Worker。Slave:Worker。


参考文献:

[1]Hadoop2.7.3+Spark2.1.0完全分布式集群搭建过程:http://www.cnblogs.com/zengxiaoliang/p/6478859.html

猜你在找的CentOS相关文章