用VMware虚拟机创建两个虚拟机,分别作为此次实验的master节点(主机)、slave节点(从机)。
- 先新建一个内存为2G、硬盘占用为30G、CetnOS 6.7 64位的虚拟机(master),选择NAT网络模式(之前尝试过Bridged、Host-Only模式,感觉还是NAT模式方便快捷,这也是虚拟机默认的网络模式)。测试网络没问题后,克隆master机器,并将其命名为slave。
主机名 | IP地址 |
---|---|
master | 192.168.229.130 |
slave | 192.168.229.131 |
设置hosts、hostname
master
[root@localhost@H_403_42@ ~]# vi /etc/hosts@H_403_42@
127.0@H_403_42@.0@H_403_42@.1@H_403_42@ localhost localhost.localdomain@H_403_42@ localhost4 localhost4.localdomain@H_403_42@4
::1@H_403_42@ localhost localhost.localdomain@H_403_42@ localhost6 localhost6.localdomain@H_403_42@6
127.0.0.1 localhost 192.168.229.130 master 192.168.229.131 slave
保存退出
[root@localhost@H_403_42@ ~]# vi /etc/sysconfig/network@H_403_42@
NETWORKING=yes@H_403_42@@H_403_42@@H_403_42@
HOSTNAME=localhost.localdomain@H_403_42@@H_403_42@
NETWORKING=yes@H_403_42@@H_403_42@@H_403_42@
HOSTNAME=master@H_403_42@@H_403_42@
保存退出
slave
[root@localhost@H_403_42@ ~]# vi /etc/hosts@H_403_42@
127.0@H_403_42@.0@H_403_42@.1@H_403_42@ localhost localhost.localdomain@H_403_42@ localhost4 localhost4.localdomain@H_403_42@4
::1@H_403_42@ localhost localhost.localdomain@H_403_42@ localhost6 localhost6.localdomain@H_403_42@6
127.0.0.1 localhost 192.168.229.130 master 192.168.229.131 slave
保存退出
[root@localhost@H_403_42@ ~]# vi /etc/sysconfig/network@H_403_42@
NETWORKING=yes@H_403_42@@H_403_42@@H_403_42@
HOSTNAME=localhost.localdomain@H_403_42@@H_403_42@
NETWORKING=yes@H_403_42@@H_403_42@@H_403_42@
HOSTNAME=slave@H_403_42@@H_403_42@
保存退出
关闭selinux
master
[root@master@H_403_42@ ~]# vim /etc/selinux/config@H_403_42@
SELINUX@H_403_42@=enforcing@H_403_42@
为
SELINUX@H_403_42@=disabled@H_403_42@
保存退出
[root@master@H_403_42@ ~]# getenforce@H_403_42@
Enforcing@H_403_42@
重启
slave
[root@slave@H_403_42@ ~]# vim /etc/selinux/config@H_403_42@
SELINUX@H_403_42@=enforcing@H_403_42@
为
SELINUX@H_403_42@=disabled@H_403_42@
保存退出,重启
关闭firewall
CentOS上默认是设有iptables规则的
master
[root@master@H_403_42@ ~]# iptables -F; /etc/init.d/iptables save@H_403_42@
iptables:@H_403_42@ Saving@H_403_42@ firewall rules to /etc/sysconfig/iptables:@H_403_42@[ OK@H_403_42@ ]
重启系统后,通过 iptables -nvL
命令可查看规则已清除
slave
[root@slave@H_403_42@ ~]# iptables -F; /etc/init.d/iptables save@H_403_42@
iptables:@H_403_42@ Saving@H_403_42@ firewall rules to /etc/sysconfig/iptables:@H_403_42@[ OK@H_403_42@ ]
-nvL
就是查看规则。 -F
是把当前规则清除,但这个只是临时的,重启系统或者重启 iptalbes 服务后还会加载已经保存的规则,所以需要使用 /etc/init.d/iptables save
保存一下规则,通过上边的命令输出我们也可以看到,防火墙规则保存在了/etc/sysconfig/iptables
。
免密钥登录配置
master
[root@master@H_403_42@ ~]# ssh-keygen@H_403_42@
一直回车
[root@master@H_403_42@ ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys@H_403_42@
[root@master@H_403_42@ ~]# scp ~/.ssh/authorized_keys slave:~/.ssh/@H_403_42@
slave
[root@slave@H_403_42@ ~]# ls .ssh/@H_403_42@
authorized_keys
master
[root@master@H_403_42@ ~]# ssh slave@H_403_42@
[root@slave@H_403_42@ ~]# exit@H_403_42@
[root@master@H_403_42@ ~]#@H_403_42@
- 测试从master免密钥登录到slave,第一次会需要输入yes继续连接
安装JDK
检查是否已安装正确版本的JDK
# java -version@H_403_42@
如果没有安装正确版本的JDK,先卸载自带的JDK,再安装
master
[root@master@H_403_42@ ~]# wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz?AuthParam=1480051498_4f2fdb0325a457f4c7d33a69355b3560@H_403_42@
[root@master@H_403_42@ ~]# mv jdk-7u79-linux-x64.tar.gz\?AuthParam\=1480051498_4f2fdb0325a457f4c7d33a69355b3560 jdk-7u79-linux-x64.tar.gz@H_403_42@
[root@master@H_403_42@ ~]# tar zxvf jdk-7u79-linux-x64.tar.gz@H_403_42@
[root@master@H_403_42@ ~]# mv jdk1.7.0_79 /usr/local/@H_403_42@
JDK环境变量
[root@master@H_403_42@ ~]# vi /etc/profile.d/java.sh@H_403_42@
export@H_403_42@ JAVA_HOME=/usr/local/jdk1.7.0@H_403_42@_79
export@H_403_42@ PATH=$PATH@H_403_42@:$JAVA_HOME@H_403_42@/bin
export@H_403_42@ CLASSPATH=.:$JAVA_HOME@H_403_42@/jre/lib/rt.jar:$JAVA_HOME@H_403_42@/lib/dt.jar:$JAVA_HOME@H_403_42@/lib/tools.jar
保存退出
[root@master@H_403_42@ ~]# source /etc/profile.d/java.sh@H_403_42@
[root@master@H_403_42@ ~]# java -version@H_403_42@
java version "1.7.0_79"@H_403_42@
Java@H_403_42@(TM@H_403_42@) SE@H_403_42@ Runtime@H_403_42@ Environment@H_403_42@ (build 1.7@H_403_42@.0_7@H_403_42@9-b15)
Java@H_403_42@ HotSpot@H_403_42@(TM@H_403_42@) 64@H_403_42@-Bit@H_403_42@ Server@H_403_42@ VM@H_403_42@ (build 24.79@H_403_42@-b02,mixed mode)
[root@master@H_403_42@ ~]# scp jdk-7u79-linux-x64.tar.gz slave:/root/@H_403_42@
[root@master@H_403_42@ ~]# scp /etc/profile.d/java.sh slave:/etc/profile.d/@H_403_42@
slave
[root@slave@H_403_42@ ~]# tar zxvf jdk-7u79-linux-x64.tar.gz@H_403_42@
[root@slave@H_403_42@ ~]# mv jdk1.7.0_79 /usr/local/@H_403_42@
JDK环境变量
[root@slave@H_403_42@ ~]# source /etc/profile.d/java.sh@H_403_42@
[root@slave@H_403_42@ ~]# java -version@H_403_42@
java version "1.7.0_79"@H_403_42@
Java@H_403_42@(TM@H_403_42@) SE@H_403_42@ Runtime@H_403_42@ Environment@H_403_42@ (build 1.7@H_403_42@.0_7@H_403_42@9-b15)
Java@H_403_42@ HotSpot@H_403_42@(TM@H_403_42@) 64@H_403_42@-Bit@H_403_42@ Server@H_403_42@ VM@H_403_42@ (build 24.79@H_403_42@-b02,mixed mode)
安装Hadoop
master
[root@master@H_403_42@ ~]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz@H_403_42@
[root@master@H_403_42@ ~]# tar zxvf hadoop-2.7.2.tar.gz@H_403_42@
[root@master@H_403_42@ ~]# mv hadoop-2.7.2 /usr/local/@H_403_42@
[root@master@H_403_42@ ~]# ls /usr/local/@H_403_42@
bin games include@H_403_42@ lib libexec share etc hadoop-2.7@H_403_42@.2@H_403_42@ jdk1.7.0_79@H_403_42@ lib64 sbin src
[root@master@H_403_42@ ~]# ls /usr/local/hadoop-2.7.2/@H_403_42@
bin include@H_403_42@ libexec NOTICE@H_403_42@.txt sbin etc lib LICENSE@H_403_42@.txt README@H_403_42@.txt share
[root@master@H_403_42@ ~]# mkdir /usr/local/hadoop-2.7.2/tmp /usr/local/hadoop-2.7.2/dfs /usr/local/hadoop-2.7.2/dfs/data /usr/local/hadoop-2.7.2/dfs/name@H_403_42@
- 目录
/usr/local/hadoop-2.7.2/tmp
,用来存储临时生成的文件 - 目录
/usr/local/hadoop-2.7.2/dfs
,用来存储集群数据 - 目录
/usr/local/hadoop-2.7.2/dfs/data
,用来存储真正的数据 - 目录
/usr/local/hadoop-2.7.2/dfs/name
,用来存储文件系统元数据
[root@master ~]# ls /usr/local/hadoop-2.7.2/@H_403_42@
bin etc lib LICENSE.txt@H_403_42@ README.txt@H_403_42@ share dfs include libexec NOTICE.txt@H_403_42@ sbin tmp
[root@master@H_403_42@ ~]# rsync -av /usr/local/hadoop-2.7.2 slave:/usr/local@H_403_42@
slave
[root@slave ~]# ls /usr/local/hadoop-2.7.2@H_403_42@
bin etc lib LICENSE.txt@H_403_42@ README.txt@H_403_42@ share dfs include libexec NOTICE.txt@H_403_42@ sbin tmp
配置Hadoop
master
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/core-site.xml@H_403_42@
<configuration@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@fs.defaultFS</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@hdfs://master:9000</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@hadoop.tmp.dir</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@file:/usr/local/hadoop-2.7.2/tmp</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@io.file.buffer.size</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@131072</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
</configuration@H_403_42@>@H_403_42@
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/hdfs-site.xml@H_403_42@
<configuration@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@dfs.namenode.name.dir</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@file:/usr/local/hadoop-2.7.2/dfs/name</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@dfs.datanode.data.dir</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@file:/usr/local/hadoop-2.7.2/dfs/data</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@dfs.replication</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@1</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@dfs.namenode.secondary.http-address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:9001</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@dfs.webhdfs.enabled</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@true</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
</configuration@H_403_42@>@H_403_42@
保存退出
- 注意:变量
dfs.replication
指定了每个HDFS数据块的复制次数,即HDFS存储文件的副本个数,默认为3,如果不修改,Datanode少于3台就会报错。我的实验环境只有一台主机和一台从机(Datanode),所以值为1。
[root@master@H_403_42@ ~]# mv /usr/local/hadoop-2.7.2/etc/hadoop/mapred-site.xml.template /usr/local/hadoop-2.7.2/etc/hadoop/mapred-site.xml@H_403_42@
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/mapred-site.xml@H_403_42@
<configuration@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@mapreduce.framework.name</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@yarn</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@mapreduce.jobhistory.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:10020</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@mapreduce.jobhistory.webapp.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:19888</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
</configuration@H_403_42@>@H_403_42@
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/yarn-site.xml@H_403_42@
<configuration@H_403_42@>@H_403_42@
<!-- Site specific YARN configuration properties -->@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.nodemanager.aux-services</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@mapreduce_shuffle</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.nodemanager.auxservices.mapreduce.shuffle.class</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@org.apache.hadoop.mapred.ShuffleHandler</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.resourcemanager.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:8032</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.resourcemanager.scheduler.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:8030</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.resourcemanager.resource-tracker.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:8031</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.resourcemanager.admin.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:8033</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
<property@H_403_42@>@H_403_42@
<name@H_403_42@>@H_403_42@yarn.resourcemanager.webapp.address</name@H_403_42@>@H_403_42@
<value@H_403_42@>@H_403_42@master:8088</value@H_403_42@>@H_403_42@
</property@H_403_42@>@H_403_42@
</configuration@H_403_42@>@H_403_42@
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/hadoop-env.sh@H_403_42@
export@H_403_42@ JAVA_HOME=${JAVA_HOME}@H_403_42@
为
export@H_403_42@ JAVA_HOME=/usr/local/jdk1.7.0@H_403_42@_79
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/yarn-env.sh@H_403_42@
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/@H_403_42@
为
export@H_403_42@ JAVA_HOME=/usr/local/jdk1.7.0@H_403_42@_79
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/mapred-env.sh@H_403_42@
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/@H_403_42@
为
export@H_403_42@ JAVA_HOME=/usr/local/jdk1.7.0@H_403_42@_79
保存退出
[root@master@H_403_42@ ~]# vi /usr/local/hadoop-2.7.2/etc/hadoop/slaves@H_403_42@
里面的内容
localhost
修改为
slave
- 这个文件保存所有slave节点
[root@master@H_403_42@ ~]# rsync -av /usr/local/hadoop-2.7.2/etc/ slave:/usr/local/hadoop-2.7.2/etc/@H_403_42@
Hadoop环境变量
[root@master@H_403_42@ ~]# vi /etc/profile.d/hadoop.sh@H_403_42@
export@H_403_42@ HADOOP_HOME=/usr/local/hadoop-2.7@H_403_42@.2@H_403_42@
export@H_403_42@ PATH=$HADOOP_HOME@H_403_42@/bin:$HADOOP_HOME@H_403_42@/sbin:$PATH@H_403_42@
保存退出
[root@master@H_403_42@ ~]# source /etc/profile.d/hadoop.sh@H_403_42@
[root@master@H_403_42@ ~]# hadoop version@H_403_42@
Hadoop 2.7@H_403_42@.2@H_403_42@
Subversion https@H_403_42@://gi@H_403_42@t-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by@H_403_42@ jenkins on@H_403_42@ 2016@H_403_42@-01@H_403_42@-26@H_403_42@T00@H_403_42@:08@H_403_42@Z
Compiled with@H_403_42@ protoc 2.5@H_403_42@.0@H_403_42@
From source with@H_403_42@ checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /usr/local/hadoop-2.7@H_403_42@.2@H_403_42@/share/hadoop/common/hadoop-common-2.7@H_403_42@.2@H_403_42@.jar
[root@master@H_403_42@ ~]# scp /etc/profile.d/hadoop.sh slave:/etc/profile.d/@H_403_42@
slave
Hadoop环境变量
[root@slave@H_403_42@ ~]# source /etc/profile.d/hadoop.sh@H_403_42@
[root@slave@H_403_42@ ~]# hadoop version@H_403_42@
Hadoop 2.7@H_403_42@.2@H_403_42@
Subversion https@H_403_42@://gi@H_403_42@t-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by@H_403_42@ jenkins on@H_403_42@ 2016@H_403_42@-01@H_403_42@-26@H_403_42@T00@H_403_42@:08@H_403_42@Z
Compiled with@H_403_42@ protoc 2.5@H_403_42@.0@H_403_42@
From source with@H_403_42@ checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /usr/local/hadoop-2.7@H_403_42@.2@H_403_42@/share/hadoop/common/hadoop-common-2.7@H_403_42@.2@H_403_42@.jar
运行Hadoop
master
[root@master@H_403_42@ ~]# /usr/local/hadoop-2.7.2/bin/hdfs namenode -format@H_403_42@
[root@master@H_403_42@ ~]# echo $?@H_403_42@
0@H_403_42@
- 在执行格式化
-format
命令时,要避免NameNode的namespace ID与Datanode的namespace ID的不一致。这是因为每格式化一次就会产生Name、Data、temp等临时文件记录信息,多次格式化会产生很多的Name、Data、temp,这样容易导致ID的不同,使Hadoop不能正常运行。每次执行格式化-format
命令时,就需要将Datanode和NameNode上原来的data、temp文件删除。 - 建议只执行一次格式化。格式化NameNode的命令可以执行多次,但是这样会使所有的现有文件系统数据受损。只有在Hadoop集群关闭和你想进行格式化的情况下,才能执行格式化。但是在其他大多数情况下,格式化操作会快速、不可恢复地删除HDFS上的所有数据。它在大型集群上的执行时间更长。
[root@master@H_403_42@ ~]# /usr/local/hadoop-2.7.2/sbin/start-all.sh@H_403_42@
[root@master@H_403_42@ ~]# jps@H_403_42@
5560@H_403_42@ ResourceManager@H_403_42@
5239@H_403_42@ NameNode@H_403_42@
5631@H_403_42@ Jps@H_403_42@
5415@H_403_42@ SecondaryNameNode@H_403_42@
slave
[root@slave@H_403_42@ ~]# jps@H_403_42@
5231@H_403_42@ Datanode@H_403_42@
5444@H_403_42@ Jps@H_403_42@
5320@H_403_42@ NodeManager@H_403_42@
master
Web UI查看集群是否成功启动:
- 在浏览器地址栏中输入 master:50070 ,检查namenode和datanode是否正常。
- 在浏览器地址栏中输入 master:8088 ,检查Yarn是否正常。
运行PI实例检查集群是否成功
[root@master@H_403_42@ ~]# cd /usr/local/hadoop-2.7.2/@H_403_42@
[root@master@H_403_42@ hadoop-2.7@H_403_42@.2@H_403_42@]# bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 10 10@H_403_42@
最后输出运算的结果。
如果以上步骤都没有问题,说明集群正常启动。
停止服务(关机前,要停止Hadoop集群)
[root@master@H_403_42@ ~]# /usr/local/hadoop-2.7.2/sbin/stop-all.sh@H_403_42@
Active Nodes显示为0
解决方案:
每台机器上(master、slave)的/etc/hosts文件都修改为:
127.0.0.1 localhost 192.168.229.130 master 192.168.229.131 slave
之前文件里有其他内容,删除之后,先停止服务再启动,Active Nodes显示为1。
copyFromLocal: Cannot create directory /123/. Name node is in safe mode
如果提示 copyFromLocal: Cannot create directory /123/. Name node is in safe mode.
,这是因为开启了安全模式
解决方法:
[root@master@H_403_42@ ~]# /usr/local/hadoop-2.7.2/bin/hdfs dfsadmin -safemode leave@H_403_42@
- 安全模式(safe mode)
作为一种附加的保护措施,NameNode进程会将HDFS文件系统保持在只读模式下,直到它确认Datanode上报的数据块数量达到了副本阈值。通常情况下,只需所有DatNode上报其数据块状态即可。但是,如果某些Datanode发生故障,NameNode需要安排重新复制部分数据块,然后集群才会达到离开安全模式的条件。
Hadoop虐我千百遍,我待Hadoop如初恋(给自己挖的坑,跪着也要填平)