目录[-]
pg_prewarm
预加载关系/索引到PG缓存中,在Postgresql中可能吗?当然可以了,感谢Robert Hass,他提交一些补丁到社区,期待它能够在PG 9.2或者PG 9.3中可行。然而,你可以使用这个补丁在PG 9.1做一些测试。
pg_prewarm
有三种模式:
- PREFETCH: 异步获取数据块到操作系统缓存中,而不是PG缓冲(提示只是操作系统缓存)
- READ:读取所有的数据块到虚拟缓存中,然后写到操作系统缓存中(提示只是操作系统缓存)
- BUFFER:读取所有的或者一些数据块到数据库缓冲中。
我把pg_prewarm补丁加入到PG源码安装中,你需要调整你的每步安装。
注意:应用pg_prewarm补丁之前安装PG
1. 下载补丁,放在/usr/local/src
http://archives.postgresql.org/pgsql-hackers/2012-03/binRVNreQMnK4.bin
补丁附加邮件地址
2. 在下载之后,到PG源码目录,然后执行下面几步。
3. 上面的命令会在$PGPATH/contrib/extension目录里创建文件。现在准备添加contrib模块了。
EXTENSION pg_prewarm;
EXTENSION
postgres=# \dx
List
of
installed extensions
Name
| Version |
Schema
| Description
----------------+---------+------------+----------------------------------------
pg_buffercache | 1.0 |
public
| examine the shared buffer cache
pg_prewarm | 1.0 |
| prewarm relation data
pgfincore | 1.1.1 |
| examine
and
manage the os buffer cache
(4
rows
)
local
/src/postgres-9.1.3/doc/src/sqml
[root@localhost sgml]# ll pgpre*
-rw-r
--r-- 1 root root 2481 Apr 10 10:15 pgprewarm.sgml</span>
dstat
它是vmstat,netstat,top等工具的组合到一起成了一个"dstat"linux命令。当数据库表现不正常时,从操作系统级别了解语句,我们会打开好几个终端来显示进程,内存,磁盘读写,网络信息,但是在这些窗口切换是十分痛苦的。所以,dstat有几个选项来帮助显示所有的命令在一个输出窗口中。
安装:
Dstat下载连接:(RHEL 6)
wget http://pkgs.repoforge.org/dstat/dstat-0.7.2-1.el6.rfx.noarch.rpm
或者
yum install dstat
文档:http://dag.wieers.com/home-made/dstat/
Linux ftools
在被设计在现代的linux操作系统中用来调用像mincore,fallocate,fadivse等的工具。Ftools它会帮助你找出那些文件在操作系统缓存中。使用perl/python脚本,你可以获得操作系统页面缓存信息在pg_class.relfilenode对象中。pg_fincore就是建立在它之上的。你可以使用pgfincore或者ftools脚本。安装:
从这个连接下载.tar.gz https://github.com/david415/python-ftools cd python-ftools python setup.py build export PYTHONPATH=build/lib.linux-x86_64-2.5 python setup.py install 注意:你应该在安装python-ftools之前就已经安装好了。
现在,我使用例子来检验这些工具。在这个例子中,有一个表,它有一个索引和序列(sequence),大小为100多MB。
postgres=# \d+ cache Table "public.cache" Column | Type | Modifiers | Storage | Description --------+---------+-----------------------------------------+----------+------------- name | text | | extended | code | integer | | plain | id | integer | default nextval('icache_seq'::regclass) | plain | Indexes: "icache" btree (code) Has OIDs: no使用查询来了解这表,序列和它的索引所占的大小.
postgres=# SELECT c.relname AS object_name,CASE when c.relkind='r' then 'table' when c.relkind='i' then 'index' when c.relkind='S' then 'sequence' else 'others' END AS type,pg_relation_size(c.relname::text) AS size,pg_size_pretty(pg_relation_size(c.relname::text)) AS pretty_size FROM pg_class c JOIN pg_roles r ON r.oid = c.relowner LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE (c.relkind = ANY (ARRAY['r'::"char",'i'::"char",'S'::"char",''::"char"])) AND n.nspname = 'public'; object_name | type | size | pretty_size -------------+----------+----------+------------- icache_seq | sequence | 8192 | 8192 bytes cache | table | 83492864 | 80 MB icache | index | 35962880 | 34 MB (3 rows) Total object size 'cache' postgres=# select pg_size_pretty(pg_total_relation_size('cache')); pg_size_pretty ---------------- 114 MB (1 row)我已经写了联合pgfincore和pg_buffercache的一个简单查询来获得PG缓冲和操作系统页面缓存的信息。我会在这个查询贯穿整个例子,仅仅复制这个查询就好了。
select rpad(c.relname,30,' ') as Object_Name,case when c.relkind='r' then 'Table' when c.relkind='i' then 'Index' else 'Other' end as Object_Type,rpad(count(*)::text,5,' ') as "PG_Buffer_Cache_usage(8KB)",split_part(pgfincore(c.relname::text)::text,','::text,5) as "OS_Cache_usage(4KB)" from pg_class c inner join pg_buffercache b on b.relfilenode=c.relfilenode inner join pg_database d on (b.reldatabase=d.oid and d.datname=current_database() and c.relnamespace=(select oid from pg_namespace where nspname='public')) group by c.relname,c.relkind order by "PG_Buffer_Cache_usage(8KB)" desc limit 10; object_name | object_type | PG_Buffer_Cache_usage(8KB) | OS_Cache_usage(4KB) -------------+-------------+----------------------------+--------------------- (0 rows) 注意: 我已经刷新PG缓冲和操作系统页面缓存。所以,缓存/缓冲没有任何数据.
使用pg_prewarm预加载关系/索引:
在之前,我刷新整个"Cache"表的顺序遍历查询和之前预加载关系/索引的时间。postgres=# explain analyze select * from cache ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on cache (cost=0.00..26192.00 rows=1600000 width=19) (actual time=0.033..354.691 rows=1600000 loops=1) Total runtime: 427.769 ms (2 rows)现在让我们使用pg_prewarm来预加载关系/索引/序列,然后查看查询计划。
postgres=# select pg_prewarm('cache','main','buffer',null,null); pg_prewarm ------------ 10192 (1 row) postgres=# select pg_prewarm('icache',null); pg_prewarm ------------ 4390 (1 row) Output of combined buffers: object_name | object_type | PG_Buffer_Cache_usage(8KB) | OS_Cache_usage(4KB) -------------+-------------+----------------------------+--------------------- icache | Index | 4390 | 8780 cache | Table | 10192 | 20384 (2 rows)
pgfincore 输出:
postgres=# select relname,5) as "In_OS_Cache" from pg_class c where relname ilike '%cache%'; relname | In_OS_Cache ------------+------------- icache_seq | 2 cache | 20384 icache | 8780 (3 rows) or for each object. postgres=# select * from pgfincore('cache'); relpath | segment | os_page_size | rel_os_pages | pages_mem | group_mem | os_pages_free | databit ------------------+---------+--------------+--------------+-----------+-----------+---------------+--------- base/12780/16790 | 0 | 4096 | 20384 | 20384 | 1 | 316451 | (1 row)To retrieve similar information using python-ftools script you need to know objects relfilenode number,check below.
使用python-ftools脚本也会获得相似的信息,你应该知道relfilenode对象的编号。像下面一样查看。
postgres=# select relfilenode,relname from pg_class where relname ilike '%cache%'; relfilenode | relname -------------+---------------- 16787 | icache_seq /// 你执行的序列 16790 | cache /// 表 16796 | icache /// 索引 (3 rows)使用python-ftools脚本
有趣吧....!!!!.
现在比较一下预加载表到缓冲之后的explain plan
postgres=# explain analyze select * from cache ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on cache (cost=0.00..26192.00 rows=1600000 width=19) (actual time=0.016..141.804 rows=1600000 loops=1) Total runtime: 215.100 ms (2 rows)
怎样刷新在操作系统缓存中的关系/索引 ?
使用pgfadvise,你可以预加载或者刷新关系到操作系统缓存。获得更多信息,在终端中使用\df pgfadvise*来显示所有与pgfadvise相关的函数。下面是刷新操作系统缓存的一个例子。postgres=# select * from pgfadvise_dontneed('cache'); relpath | os_page_size | rel_os_pages | os_pages_free ------------------+--------------+--------------+--------------- base/12780/16790 | 4096 | 20384 | 178145 (1 row) postgres=# select * from pgfadvise_dontneed('icache'); relpath | os_page_size | rel_os_pages | os_pages_free ------------------+--------------+--------------+--------------- base/12780/16796 | 4096 | 8780 | 187166 (1 row) postgres=# select relname,5) as "In_OS_Cache" from pg_class c where relname ilike '%cache%'; relname | In_OS_Cache ------------+------------- icache_seq | 0 cache | 0 icache | 0 (3 rows)通过使用dstat,这些信息显示在一个窗口中,如你可以查看读写比例。更多信息使用 dstat --list
dstat -s --top-io --top-bio --top-mem
使用pg_prewarm的range功能预加载随机块.
假设,因为一些原因,你想刷新服务器,但是有一个很大的表它在缓存中,而且执行的很好。在刷新中,缓存 中就没有了你的表了,要回到刷新前的状态,你不得不知道多少表块在缓存中以及使用pg_prewarm的range选项来预加载它。我通过pg_buffercache来查询已创建的表,然后我使用pg_prewarm的range选项来发送块。通过这样,共享缓存就像先前加载到缓存中一样回来了。请看例子
select c.relname,count(*) as buffers from pg_class c
inner join pg_buffercache b on b.relfilenode=c.relfilenode and c.relname ilike '%cache%'
inner join pg_database d on (b.reldatabase=d.oid and d.datname=current_database())
group by c.relname
order by buffers desc;
relname | buffers
---------+---------
cache | 10192
icache | 4390
(2 rows)
Note: These are the blocks in buffer.
postgres=# create table blocks_in_buff (relation,fork,block) as select c.oid::regclass::text,case b.relforknumber when 0 then 'main' when 1 then 'fsm' when 2 then 'vm' end,b.relblocknumber from pg_buffercache b,pg_class c,pg_database d where b.relfilenode = c.relfilenode and b.reldatabase = d.oid and d.datname = current_database() and b.relforknumber in (0,1,2);
SELECT 14716
刷新服务器以及从"blocks_in_buff"表中查看预加载和表相关的随机块的缓存。
postgres=# select sum(pg_prewarm(relation,block,block)) from blocks_in_buff;
sum
-------
14716
(1 row)
postgres=# select c.relname,count(*) as buffers from pg_class c
inner join pg_buffercache b on b.relfilenode=c.relfilenode and c.relname ilike '%cache%'
inner join pg_database d on (b.reldatabase=d.oid and d.datname=current_database())
group by c.relname
order by buffers desc;
relname | buffers
---------+---------
cache | 10192
icache | 4390
(2 rows)
看,我的共享缓存又回来工作了。