一、前言
近期公司需要部署一套完整的openstack ha集群环境,在选择后端存储方案上踏入了十字路口。
经过几个月的搜索整理与研究,发现网上介绍ceph部署没一个完整详细的!于是自己苦战多日整理了一套完整的ceph分布式集群存储系统解决方案。
关于为何最终选择ceph,期间也是各种搜索baidu、google。这里有点感慨;baidu你让我拿什么来爱你?于是FQ势在必行了。
直接正题,干货整理如下:方便日后忘记......
二、补脑
ceph的介绍这里就不在重复了,ceph是加州大学圣克鲁兹分校的Sage weil攻读博士时开发的分布式文件系统。并使用ceph完成了他的论文。ceph最早诞生于2004年,经过多年完善与修复,第一个稳定ceph版本发布于2012年。感兴趣的可以去看他的介绍 https://en.wikipedia.org/wiki/Sage_Weil
三、选择
根据业务特性选择Ceph 、 GlusterFS 、Swift 三款系统来比较,选择是一件很痛苦的事情!
1. Ceph:
1、支持POSIX接口
2、开源
3、文件被分片,每个数据块是一块对象,对象保存子不同的存储服务器上。
4、冗余保护多副本
5、由数据的多副本提供高可靠性
6、当节点故障时,自动迁移数据,重现复制副本。
7、易扩展
2. GlusteFS:
1、开源
2、不存在单点故障
3、支持POSIT接口
4、ClusterTranslators(GlusterFS集群存储的核心)包括AFR、DHT(和Stripe三种类型。
5、冗余机制:镜像数据有镜像提供可靠性。
6、当节点、硬件、磁盘、网络发生故障时,系统会自动处理这些故障,管理员不需介入。
7、支持回收站
8、易扩展
3. Swift:
1、开源
2、最终一致性
3、支持POSIT接口
4、当一个集群硬件环境发生故障时,Swift会回退以提供高利用率的数据访问。
5、稳定
6、Swift是做对象存储最佳选择
结论:根据应用场景需要包含 "块存储需求" 综合选择Ceph比较合适。(如果场景是对象存储选择Swift比较合适)。
四、硬件环境
+-----------+--------------+------+----------+--------+---------------------------+-------------------+--------------+
| Hostname | IP Address | Role | cpu | Memory | System Disk | Ceph Storage Disk | Journal Disk |
+-----------+--------------+------+----------+--------+---------------------------+-------------------+--------------+
| ceph-adm | 192.168.0.59 | adm | 4 cores | 4GB | 2*300GB SAS 15.7k(raid 1) | - | - |
| ceph-mon1 | 192.168.0.60 | mon | 32 cores | 64GB | 2*600GB SAS 15.7k(raid 1) | - | - |
| ceph-mon2 | 192.168.0.61 | mon | 32 cores | 64GB | 2*600GB SAS 15.7k(raid 1) | - | - |
| ceph-mon3 | 192.168.0.62 | mon | 32 cores | 64GB | 2*600GB SAS 15.7k(raid 1) | - | - |
| ceph-osd1 | 192.168.0.63 | osd | 24 cores | 64GB | 2*240GB SSD (raid 1) | 10*4TB SAS | 2*480GB SSD |
| ceph-osd2 | 192.168.0.64 | osd | 24 cores | 64GB | 2*240GB SSD (raid 1) | 10*4TB SAS | 2*480GB SSD |
+-----------+--------------+------+----------+--------+---------------------------+-------------------+--------------+
说明:
Ceph 要求必须是奇数个监控节点,而且最少3个。
adm 服务器2块300GB SAS硬盘做成 RAID1,安装操作系统,用来操作和管理 Ceph;
mon 服务器2块600GB SAS硬盘做成 RAID1,安装操作系统,用来监控 Ceph;
osd 服务器2块240GB SSD硬盘做成 RAID1,安装操作系统.10块4TB硬盘做Ceph存储,每个osd 对应1块硬盘,
每个osd 需要1个Journal,所以10块硬盘需要10个。Journal我们用2块480GB SSD 硬盘做journal,每个SSD等分成5个区,这样每个区分别对应一个 osd 硬盘的 journal。
五、软件环境
operating system:CentOS-7-x86_64-1611
ceph:Jewel版本
六、基本配置(1-6全部节点配置)
1. 禁用selinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
reboot
2. 时间同步
yum -y install ntp ntpdate vim wget
vim /etc/ntp.conf
server 192.168.0.20 iburst
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
systemctl stop ntpd
ntpdate 192.168.0.20
hwclock -w
systemctl restart ntpd
ntpq -p
注:192.168.0.20 是本地搭建ntp服务器,后面我会把搭建ntp服务器详细日记呈现出来。
3. 修改hostname
vim /etc/hosts
192.168.0.59 ceph-adm
192.168.0.60 ceph-mon1
192.168.0.61 ceph-mon2
192.168.0.62 ceph-mon3
192.168.0.63 ceph-osd1
192.168.0.64 ceph-osd2
4. 安装配置Firewalld (默认系统没安装firewalld)
yum -y install firewalld firewall-config
systemctl start firewalld
systemctl enable firewalld
firewall-cmd --zone=public --add-port=6789/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7100/tcp --permanent
firewall-cmd --reload
firewall-cmd --zone=public --list-all
5. 准备ceph源
yum clean all
rm -rf /etc/yum.repos.d/*.repo
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
sed -i '/aliyuncs/d' /etc/yum.repos.d/CentOS-Base.repo
sed -i '/aliyuncs/d' /etc/yum.repos.d/epel.repo
sed -i 's/$releasever/7/g' /etc/yum.repos.d/CentOS-Base.repo
vim /etc/yum.repos.d/ceph.repo
[ceph]
name=ceph
baseurl=http://mirrors.163.com/ceph/rpm-jewel/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.163.com/ceph/rpm-jewel/el7/noarch/
gpgcheck=0
6. 更新系统
yum update -y
reboot
7. 无密码访问(在ceph-adm上配置)
ssh-keygen
ssh-copy-id root@ceph-mon1
ssh-copy-id root@ceph-mon2
ssh-copy-id root@ceph-mon3
ssh-copy-id root@ceph-osd1
ssh-copy-id root@ceph-osd2
七、配置Ceph集群(在ceph-adm上操作1-6)
1. 安装ceph-deploy
yum -y install ceph-deploy
2. 创建ceph工作目录
mkdir /etc/ceph
cd /etc/ceph
3. 初始化集群,让ceph-deploy判断哪些节点是监控节点。
ceph-deploy new ceph-mon1 ceph-mon2 ceph-mon3
4. 在每一个ceph节点都安装ceph二进制软件包
ceph-deploy install --no-adjust-repos ceph-adm ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
5. 修改ceph配置文件,在ceph.conf文件下添加 public network = 192.168.0.0/24
cat ceph.conf
[global]
fsid = 1017c790-f1f0-497e-9935-7c726f56396d
mon_initial_members = ceph-mon1,ceph-mon2,ceph-mon3
mon_host = 192.168.0.60,192.168.0.61,192.168.0.62
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.0.0/24
6. 创建第一个监控节点
ceph-deploy mon create-initial
7. 在ceph-osd1、cephosd2查看2个osd硬盘信息(模拟实验选择10块SAS 100G的和240GB SAS )
ceph-deploy disk list ceph-osd1 ceph-osd2
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 300G 0 disk
├─sda1 8:1 0 500M 0 part /boot
├─sda2 8:2 0 100G 0 part /var
├─sda3 8:3 0 16G 0 part [SWAP]
├─sda4 8:4 0 1K 0 part
└─sda5 8:5 0 183.5G 0 part /
sdb 8:16 0 100G 0 disk
sdc 8:32 0 100G 0 disk
sdd 8:48 0 100G 0 disk
sde 8:64 0 100G 0 disk
sdf 8:80 0 100G 0 disk
sdg 8:96 0 100G 0 disk
sdh 8:112 0 100G 0 disk
sdi 8:128 0 100G 0 disk
sdj 8:144 0 100G 0 disk
sdk 8:160 0 100G 0 disk
sdl 8:176 0 240G 0 disk
sdm 8:192 0 240G 0 disk
sr0 11:0 1 1024M 0 rom
8. 在ceph-osd1上创建一个批量创建xfs文件系统脚本(对于存储海量的小文件,或者超大规模的文件,文件大小也偏大的话,我们使用xfs)。
创建脚本名称为 parted.sh
vim parted.sh
#!/bin/bash
set -e
if [ ! -x "/sbin/parted" ]; then
echo "This script requires /sbin/parted to run!" >&2
exit 1
fi
DISKS="b c d e f g h i j k"
for i in ${DISKS}; do
echo "Creating partitions on /dev/sd${i} ..."
parted -a optimal --script /dev/sd${i} -- mktable gpt
parted -a optimal --script /dev/sd${i} -- mkpart primary xfs 0% 100%
sleep 1
#echo "Formatting /dev/sd${i}1 ..."
mkfs.xfs -f /dev/sd${i}1 &
done
DISKS="l m"
for i in ${DISKS}; do
parted -s /dev/sd${i} mklabel gpt
parted -s /dev/sd${i} mkpart primary 0% 20%
parted -s /dev/sd${i} mkpart primary 21% 40%
parted -s /dev/sd${i} mkpart primary 41% 60%
parted -s /dev/sd${i} mkpart primary 61% 80%
parted -s /dev/sd${i} mkpart primary 81% 100%
done
9. 在ceph-osd1上把脚本设置为可执行权限
chmod -R 755 /root/parted.sh
10. 在ceph-osd1上把脚本parted.sh 复制到ceph-osd2上
scp -p /root/parted.sh ceph-osd2:/root/parted.sh
11. 分别在ceph-osd1和ceph-osd2节点执行./parted.sh
./parted.sh
12. 在ceph-adm节点查看ceph-osd1、ceph-osd2上硬盘是否成功创建xfs文件系统
ceph-deploy disk list ceph-osd1 ceph-osd2
#正常显示如下(ceph-osd2 就不粘贴了):
[ceph-osd1][INFO ] Running command: /usr/sbin/ceph-disk list
[ceph-osd1][DEBUG ] /dev/sda :
[ceph-osd1][DEBUG ] /dev/sda4 other,0x5
[ceph-osd1][DEBUG ] /dev/sda3 swap,swap
[ceph-osd1][DEBUG ] /dev/sda5 other,xfs,mounted on /
[ceph-osd1][DEBUG ] /dev/sda1 other,mounted on /boot
[ceph-osd1][DEBUG ] /dev/sda2 other,mounted on /var
[ceph-osd1][DEBUG ] /dev/sdb :
[ceph-osd1][DEBUG ] /dev/sdb1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdc :
[ceph-osd1][DEBUG ] /dev/sdc1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdd :
[ceph-osd1][DEBUG ] /dev/sdd1 other,xfs
[ceph-osd1][DEBUG ] /dev/sde :
[ceph-osd1][DEBUG ] /dev/sde1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdf :
[ceph-osd1][DEBUG ] /dev/sdf1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdg :
[ceph-osd1][DEBUG ] /dev/sdg1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdh :
[ceph-osd1][DEBUG ] /dev/sdh1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdi :
[ceph-osd1][DEBUG ] /dev/sdi1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdj :
[ceph-osd1][DEBUG ] /dev/sdj1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdk :
[ceph-osd1][DEBUG ] /dev/sdk1 other,xfs
[ceph-osd1][DEBUG ] /dev/sdl :
[ceph-osd1][DEBUG ] /dev/sdl1 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdl2 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdl3 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdl4 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdl5 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdm :
[ceph-osd1][DEBUG ] /dev/sdm1 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdm2 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdm3 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdm4 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sdm5 other,ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[ceph-osd1][DEBUG ] /dev/sr0 other,unknown
八、 创建存储节点
1. 在ceph-adm上创建ceph-osd1、ceph-osd2存储节点
ceph-osd1执行如下:
ceph-deploy disk zap ceph-osd1:sdb ceph-osd1:sdc ceph-osd1:sdd ceph-osd1:sde ceph-osd1:sdf ceph-osd1:sdg ceph-osd1:sdh ceph-osd1:sdi ceph-osd1:sdj ceph-osd1:sdk
ceph-deploy osd create ceph-osd1:sdb ceph-osd1:sdc ceph-osd1:sdd ceph-osd1:sde ceph-osd1:sdf ceph-osd1:sdg ceph-osd1:sdh ceph-osd1:sdi ceph-osd1:sdj ceph-osd1:sdk
ceph-osd2执行如下:
ceph-deploy disk zap ceph-osd2:sdb ceph-osd2:sdc ceph-osd2:sdd ceph-osd2:sde ceph-osd2:sdf ceph-osd2:sdg ceph-osd2:sdh ceph-osd2:sdi ceph-osd2:sdj ceph-osd2:sdk
ceph-deploy osd create ceph-osd2:sdb ceph-osd2:sdc ceph-osd2:sdd ceph-osd2:sde ceph-osd2:sdf ceph-osd2:sdg ceph-osd2:sdh ceph-osd2:sdi ceph-osd2:sdj ceph-osd2:sdk
2. 在ceph-adm上把配置文件同步部署到其它节点,让每个节点ceph配置保持一致性。
ceph-deploy --overwrite-conf admin ceph-adm ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
九、 测试
1. 查看Ceph的状态
[root@ceph-adm ceph]# ceph -s
cluster 0659c156-c46a-4c83-bcc1-786219991e81
health HEALTH_OK
monmap e1: 3 mons at {ceph-mon1=192.168.0.60:6789/0,ceph-mon2=192.168.0.61:6789/0,ceph-mon3=192.168.0.62:6789/0}
election epoch 8,quorum 0,1,2 ceph-mon1,ceph-mon3
osdmap e108: 20 osds: 20 up,20 in
flags sortbitwise,require_jewel_osds
pgmap v338: 512 pgs,1 pools,0 bytes data,0 objects
715 MB used,1898 GB / 1899 GB avail
512 active+clean
2. 查看ceph版本
[root@ceph-adm ceph]# ceph -v
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
3. 查看集群使用状态
[root@ceph-adm ceph]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
1899G 1898G 715M 0.04
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 949G 0
4. 查看ceph mon 状态
[root@ceph-adm ceph]# ceph mon stat
e1: 3 mons at {ceph-mon1=192.168.0.60:6789/0,ceph-mon3=192.168.0.62:6789/0},election epoch 8,ceph-mon3
5. 查看osd的crush map
[root@ceph-adm ceph]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.85394 root default
-2 0.92697 host ceph-osd1
0 0.09270 osd.0 up 1.00000 1.00000
1 0.09270 osd.1 up 1.00000 1.00000
2 0.09270 osd.2 up 1.00000 1.00000
3 0.09270 osd.3 up 1.00000 1.00000
4 0.09270 osd.4 up 1.00000 1.00000
5 0.09270 osd.5 up 1.00000 1.00000
6 0.09270 osd.6 up 1.00000 1.00000
7 0.09270 osd.7 up 1.00000 1.00000
8 0.09270 osd.8 up 1.00000 1.00000
9 0.09270 osd.9 up 1.00000 1.00000
-3 0.92697 host ceph-osd2
10 0.09270 osd.10 up 1.00000 1.00000
11 0.09270 osd.11 up 1.00000 1.00000
12 0.09270 osd.12 up 1.00000 1.00000
13 0.09270 osd.13 up 1.00000 1.00000
14 0.09270 osd.14 up 1.00000 1.00000
15 0.09270 osd.15 up 1.00000 1.00000
16 0.09270 osd.16 up 1.00000 1.00000
17 0.09270 osd.17 up 1.00000 1.00000
18 0.09270 osd.18 up 1.00000 1.00000
19 0.09270 osd.19 up 1.00000 1.00000
6. 导出ceph mon 的信息
[root@ceph-adm ceph]# ceph mon dump
dumped monmap epoch 1
epoch 1
fsid 0659c156-c46a-4c83-bcc1-786219991e81
last_changed 2017-07-20 13:17:49.202587
created 2017-07-20 13:17:49.202587
0: 192.168.0.60:6789/0 mon.ceph-mon1
1: 192.168.0.61:6789/0 mon.ceph-mon2
2: 192.168.0.62:6789/0 mon.ceph-mon3
7. 查看mon仲裁状态
[root@ceph-adm ceph]# ceph quorum_status --format json-pretty
{
"election_epoch": 8,
"quorum": [
0,
1,
2
],
"quorum_names": [
"ceph-mon1",
"ceph-mon2",
"ceph-mon3"
],
"quorum_leader_name": "ceph-mon1",
"monmap": {
"epoch": 1,
"fsid": "0659c156-c46a-4c83-bcc1-786219991e81",
"modified": "2017-07-20 13:17:49.202587",
"created": "2017-07-20 13:17:49.202587",
"mons": [
{
"rank": 0,
"name": "ceph-mon1",
"addr": "192.168.0.60:6789\/0"
},
{
"rank": 1,
"name": "ceph-mon2",
"addr": "192.168.0.61:6789\/0"
},
{
"rank": 2,
"name": "ceph-mon3",
"addr": "192.168.0.62:6789\/0"
}
]
}
}
8、查看ceph pg状态
[root@ceph-adm ceph]# ceph pg stat
v338: 512 pgs: 512 active+clean; 0 bytes data,715 MB used,1898 GB / 1899 GB avail
就此ceph集群已完成,下面如何配置 Nova,Glance、Cinder 基于Ceph的统一存储方案。
十、 Openstack后端存储Ceph解决方案
1. 配置glance镜像服务的后端存储基于ceph集群
持续更新中.......
学习交流请加qq 337972603
参考
https://yq.aliyun.com/articles/72357?spm=5176.100239.blogcont72372.12.nBvAu8
http://www.vpsee.com/2015/07/install-ceph-on-centos-7/