presto-0.147+postgresql-9.5.3+msql-5.0.7+hadoop-2.5.2+hive-1.2.1环境构筑以及测试

前端之家收集整理的这篇文章主要介绍了presto-0.147+postgresql-9.5.3+msql-5.0.7+hadoop-2.5.2+hive-1.2.1环境构筑以及测试前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

背景

每个支持sql数据库,都有一个强大的sql引擎。

而对于sql引擎,基本都是大同小异的,负责sql文法解析,语意分析,指定查询树,优化查询树,再到最终的执行,客户端返回结果。

presto的也跟一般的是一样的。

架构如下:


准备

1.postgresql-9.5.3

2.MysqL-5.0.7

3.hadoop-2.5.2

4.hive-1.2.1

5.presto-server-0.147

6.presto-cli-0.147-executable.jar


且注意系统要求:

Mac OS X or Linux
Java 8 Update 60 or higher (8u60+),64-bit
Maven 3.3.9+ (for building)
Python 2.4+ (for running with the launcher script)


环境搭建_1

MysqL,postgresql都是在windows这边搭建的,直接就可以使用。

hadoop-2.5.2的搭建手顺之前的博文中已经记载了,此处不再说明。

hive的环境,解压后就可以使用了。

这里主要说一下hive的两种CLI工具:

1.hive shell

2.beeline

现在官网标记beeline是new,hive shell是older。建议使用beeline,结果的现实比较直观易懂。跟MysqL的比较像。

beeline的启动,如果hive使用默认的debery数据库的话,请使用下面的方式启动

./bin/beeline -u jdbc:hive2://
另外,derby只能同时一个用户使用,否则会报错如下所示:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/myProject/apache-hive-1.2.1-bin/Metastore_db.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
        at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
        ... 83 more
Error applying authorization policy on hive configuration: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.Metadata.SessionHiveMetaStoreClient

处理方法也很简单:
rm -rf derby.log Metastore_db


环境搭建_2

presto

  1. 解压presto-server-0.147.tar.gz
  2. mkdir presto-server-0.147/etc
  3. mkdir presto-server-0.147/catalog
  4. vim etc/node.properties
    node.environment=production
    node.id=1
    node.data-dir=/home/myProject/presto-server-0.147/data
    

  5. etc/jvm.config
    -server
    -Xmx16G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:OnOutOfMemoryError=kill -9 %p
    

  6. etc/config.properties
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    query.max-memory=5GB
    query.max-memory-per-node=1GB
    discovery-server.enabled=true
    discovery.uri=http://your-hive-IP:8080
    

  7. etc/log.properties
    com.facebook.presto=INFO
    
  8. 下载presto-cli-0.147-executable.jar
  9. 将其放置在bin目录下
  10. 赋权限
    chmod +x presto-cli-0.147-executable.jar

  11. etc/catalog/MysqL.properties
    connector.name=MysqL
    connection-url=jdbc:MysqL://your-MysqL-location-IP:3306
    connection-user=your-MysqL-username
    connection-password=your-MysqL-password
    
  12. etc/catalog/postgresql.properties
    connector.name=postgresql
    connection-url=jdbc:postgresql://your-postgresql-location-ip/postgres
    connection-user=your-postgres-username
    connection-password=your-postgresql-password
    
  13. etc/catalog/hive.properties
    connector.name=hive-hadoop2
    hive.Metastore.uri=thrift://your-hive-ip:9083
    hive.config.resources=/etc/hadoop/core-site.xml,/etc/hadoop/hdfs-site.xml
    
    connector.name的选取参照如下信息
    hive-hadoop1: Apache Hadoop 1.x
    hive-hadoop2: Apache Hadoop 2.x
    hive-cdh4: Cloudera CDH 4
    hive-cdh5: Cloudera CDH 5


启动

  1. bin/launcher start
  2. ./presto --server localhost:8080--cataloghive --schema default


结果

MysqL

presto:test_hive> select * from MysqL.sqoop.t1;
 id | int_col | char_col 
----+---------+----------
  1 |       1 | a        
  2 |       2 | b        
  4 |       4 | d        
  3 |       3 | c        
  5 |       5 | e        
(5 rows)

Query 20160520_101400_00009_k46dt,FINISHED,1 node
http://localhost:8080/query.html?20160520_101400_00009_k46dt
Splits: 2 total,0 done (0.00%)
cpu Time: 0.0s total,0 rows/s,0B/s,100% active
Per Node: 0.0 parallelism,0B/s
Parallelism: 0.0
0:29 [0 rows,0B] [0 rows/s,0B/s]


postgresql

presto:test_hive> select * from postgresql.public.test;
 id | name 
----+------
  1 | lily 
  2 | Tom  
  3 | Jim  
(3 rows)

Query 20160520_101503_00010_k46dt,1 node
http://localhost:8080/query.html?20160520_101503_00010_k46dt
Splits: 2 total,0% active
Per Node: 0.0 parallelism,0B/s
Parallelism: 0.0
0:02 [0 rows,0B/s]


MysqL&postgresql

presto:test_hive> select id,char_col from MysqL.sqoop.t1 union select id,name from postgresql.public.test;
 id | char_col 
----+----------
  1 | lily     
  2 | Tom      
  3 | Jim      
  1 | a        
  2 | b        
  4 | d        
  3 | c        
  5 | e        
(8 rows)

Query 20160520_101532_00011_k46dt,1 node
http://localhost:8080/query.html?20160520_101532_00011_k46dt
Splits: 6 total,2 done (33.33%)
cpu Time: 0.0s total,107 rows/s,17% active
Per Node: 0.0 parallelism,0B/s
Parallelism: 0.0
0:28 [3 rows,0B/s]


hive


--未完待续---

猜你在找的Postgre SQL相关文章