背景
而对于sql引擎,基本都是大同小异的,负责sql文法解析,语意分析,指定查询树,优化查询树,再到最终的执行,客户端返回结果。
而presto的也跟一般的是一样的。
架构如下:
准备
1.postgresql-9.5.3
2.MysqL-5.0.7
3.hadoop-2.5.2
4.hive-1.2.1
5.presto-server-0.147
6.presto-cli-0.147-executable.jar
且注意系统要求:
Mac OS X or Linux Java 8 Update 60 or higher (8u60+),64-bit Maven 3.3.9+ (for building) Python 2.4+ (for running with the launcher script)
环境搭建_1
MysqL,postgresql都是在windows这边搭建的,直接就可以使用。
hadoop-2.5.2的搭建手顺之前的博文中已经记载了,此处不再说明。
hive的环境,解压后就可以使用了。
这里主要说一下hive的两种CLI工具:
1.hive shell
2.beeline
现在官网标记beeline是new,hive shell是older。建议使用beeline,结果的现实比较直观易懂。跟MysqL的比较像。
beeline的启动,如果hive使用默认的debery数据库的话,请使用下面的方式启动
./bin/beeline -u jdbc:hive2://另外,derby只能同时一个用户使用,否则会报错如下所示:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/myProject/apache-hive-1.2.1-bin/Metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 83 more Error applying authorization policy on hive configuration: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.Metadata.SessionHiveMetaStoreClient
处理方法也很简单:
rm -rf derby.log Metastore_db
环境搭建_2
presto
- 解压presto-server-0.147.tar.gz
- mkdir presto-server-0.147/etc
- mkdir presto-server-0.147/catalog
- vim etc/node.properties
node.environment=production node.id=1 node.data-dir=/home/myProject/presto-server-0.147/data
- etc/jvm.config
-server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p
- etc/config.properties
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB discovery-server.enabled=true discovery.uri=http://your-hive-IP:8080
- etc/log.properties
com.facebook.presto=INFO
- 下载presto-cli-0.147-executable.jar
- 将其放置在bin目录下
- 赋权限
chmod +x presto-cli-0.147-executable.jar
- etc/catalog/MysqL.properties
connector.name=MysqL connection-url=jdbc:MysqL://your-MysqL-location-IP:3306 connection-user=your-MysqL-username connection-password=your-MysqL-password
- etc/catalog/postgresql.properties
connector.name=postgresql connection-url=jdbc:postgresql://your-postgresql-location-ip/postgres connection-user=your-postgres-username connection-password=your-postgresql-password
- etc/catalog/hive.properties
connector.name=hive-hadoop2 hive.Metastore.uri=thrift://your-hive-ip:9083 hive.config.resources=/etc/hadoop/core-site.xml,/etc/hadoop/hdfs-site.xml
connector.name的选取参照如下信息hive-hadoop1: Apache Hadoop 1.x hive-hadoop2: Apache Hadoop 2.x hive-cdh4: Cloudera CDH 4 hive-cdh5: Cloudera CDH 5
启动
- bin/launcher start
- ./presto --server localhost:8080--cataloghive --schema default
结果
presto:test_hive> select * from MysqL.sqoop.t1; id | int_col | char_col ----+---------+---------- 1 | 1 | a 2 | 2 | b 4 | 4 | d 3 | 3 | c 5 | 5 | e (5 rows) Query 20160520_101400_00009_k46dt,FINISHED,1 node http://localhost:8080/query.html?20160520_101400_00009_k46dt Splits: 2 total,0 done (0.00%) cpu Time: 0.0s total,0 rows/s,0B/s,100% active Per Node: 0.0 parallelism,0B/s Parallelism: 0.0 0:29 [0 rows,0B] [0 rows/s,0B/s]
postgresql
presto:test_hive> select * from postgresql.public.test; id | name ----+------ 1 | lily 2 | Tom 3 | Jim (3 rows) Query 20160520_101503_00010_k46dt,1 node http://localhost:8080/query.html?20160520_101503_00010_k46dt Splits: 2 total,0% active Per Node: 0.0 parallelism,0B/s Parallelism: 0.0 0:02 [0 rows,0B/s]
presto:test_hive> select id,char_col from MysqL.sqoop.t1 union select id,name from postgresql.public.test; id | char_col ----+---------- 1 | lily 2 | Tom 3 | Jim 1 | a 2 | b 4 | d 3 | c 5 | e (8 rows) Query 20160520_101532_00011_k46dt,1 node http://localhost:8080/query.html?20160520_101532_00011_k46dt Splits: 6 total,2 done (33.33%) cpu Time: 0.0s total,107 rows/s,17% active Per Node: 0.0 parallelism,0B/s Parallelism: 0.0 0:28 [3 rows,0B/s]
hive
--未完待续---