airflow是Airbnb开源出的一个数据流管理工具,关于使用,可参考官网http://pythonhosted.org/airflow/
现将安装过程及踩过的坑分享给大家。
安装airflow
(为了避免对其他程序造成影响,故不想替换掉原有的python2.6.6,此处希望2.6与2.7两个版本共存,而且安装的pip、virtualenv等软件,也只希望在python27中存在)
安装独立的python2.7,只需要在configure时指定prefix为不同的目录即可,这样make install时就会安装到prefix目录,而不是/usr/local/bin
1、下载python2.7.11源码,https://www.python.org/downloads/source/
2、源码安装
su - root cd /usr/local/ tar -zxvf Python-2.7.11.tgz mv Python-2.7.11 python27 cd python27 ./configure --prefix=/usr/local/python27 #(修改为自己的路径) make make install
3、安装setuptools(需要将setuptools安装到python27下面,服务器不能连接外网,故下载源码)
tar zvxf setuptools-23.1.0.tar.gz cd setuptools-23.1.0/ /usr/local/python27/python setup.py install4、安装pip(需要将pip安装到python27下面,服务器不能连接外网,故下载源码)(pypi可设置为豆瓣的库)
tar zvxf pip-8.1.2.tar.gz cd pip-8.1.2/ /usr/local/python27/python setup.py install
5、安装virtualenv,其他安装方式参考官网https://virtualenv.pypa.io/en/latest/index.html
tar zvxf virtualenv-15.0.2.tar.gz cd virtualenv-15.0.2/ /usr/local/python27/python setup.py install还需在 python2.6 下安装一次,否则在 python2.6 下创建 python2.7 的 virtualenv 时无法执行
6、由于执行virtualenv命令时,需要联网,所以还是需要设置代理,这里使用ccproxy
需要在linux上设置环境变量
export https_proxy=xxx.xxx.xxx.xxx:808 export http_proxy=xxx.xxx.xxx.xxx:808
7、使用virtualenv生成临时环境
virtualenv --pythonp=/usr/local/python27/bin/pythonairflowenv
这样 source airflowenv/bin/activate之后,就是使用python2.7的shell了
8、安装MysqL,不做赘述
9、使用root用户安装MysqL-devel,yum install MysqL-devel
10、安装MysqL-python,python官网下载MysqL-python-1.2.5.zip,解压缩
source airflowenv/bin/activate cd MysqL-python-1.2.5 python setup.py install
11、安装gevent
source airflowenv/bin/activate pip install gevent
12、安装airflow
source airflowenv/bin/activate export AIRFLOW_HOME=~/airflow (修改为自己的路径) pip install airflow # initialize the database airflow initdb13、vi $AIRFLOW_HOME/airflow.cfg文件
包括添加MysqL的连接,设置executor等,其他参数请根据实际需要调整
executor = LocalExecutor sql_alchemy_conn = MysqL://username:password@ip:port/dbname
14、再次执行airflowinitdb,此时将在MysqL中创建表
15、安装supervisor,使用supervisor启动airflow,一旦airflow挂掉,supervisor会自动重启airflow
source airflowenv/bin/activate pip install supervisor编辑supervisord.conf文件,指定要启动的程序和日志输出路径
[program:airflow_scheduler] command=/xxx/airflowenv/bin/airflow scheduler stdout_logfile=/tmp/airflow_scheduler.log
使用如下命令启动
supervisord -c /xxx/xxx/airflow/supervisord.conf
安装遇到的问题
1、airflowinitdb报错
(airflowenv)root@127.0.0.1:/xxx/xxx/airflowenv/bin$ airflow initdb
Traceback (most recent call last):
File "/xxx/xxx/airflowenv/bin/airflow",line 4,in <module>
from airflow import configuration
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/__init__.py",line 31,in <module>
from airflow.models import DAG
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/models.py",line 56,in <module>
from airflow import settings,utils
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/settings.py",line 76,in <module>
engine = create_engine(sql_ALCHEMY_CONN,**engine_args)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/__init__.py",line 386,in create_engine
return strategy.create(*args,**kwargs)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py",line 75,in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/dialects/MysqL/MysqLdb.py",line 92,in dbapi
return __import__('MysqLdb')
ImportError: No module named MysqLdb
缺少MysqL-python模块,官网下载MysqL-python-1.2.5.zip,解压缩,
cd MysqL-python-1.2.5
python setup.py install
2、安装MysqL-python后执行airflow initdb报错,
_MysqL.c:36:23: error:my_config.h: No such file or directory
_MysqL.c:38:19: error:MysqL.h: No such file or directory
_MysqL.c:39:26: error:MysqLd_error.h: No such file or directory
_MysqL.c:40:20: error:errmsg.h: No such file or directory
linux缺少MysqL-devel包,使用yum install MysqL-devel,或手工下载MysqL-devel的rpm包,自己安装
3、执行airflow webserver -p 8080启动webserver报错
Error: class uri 'gevent' invalid ornot found:
[Traceback (most recent call last):
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/util.py",line 140,in load_class
mod = import_module('.'.join(components))
File "/xxx/xxx/software/python27/lib/python2.7/importlib/__init__.py",line 37,in import_module
__import__(name)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/workers/ggevent.py",line 22,in <module>
raise RuntimeError("You need gevent installed to use thisworker.")
RuntimeError: You need geventinstalled to use this worker.
]
使用pip命令安装gevent pip install gevent