通过各个监控维度的排行,可以知道你到底需要关注或者优先关注哪些服务器.
本文测试环境 :
OS : RHEL5
DB : Postgresql 9.1.2
集中之后对报告进行分析,找出需要关注的服务器来.
一、首先看看我们要收集什么信息 :
我这里举了一个简单的例子,收集前一天sar的统计报告. 如下 :
1. sar -b
反映系统的每秒读写IO请求等,详情
@H_403_29@@H_403_29@ tps@H_403_29@ Total number of transfers per second that were issued to physical devices. A transfer is an I/O@H_403_29@ request to a physical device. Multiple logical requests can be combined into a single I/O request@H_403_29@@H_403_29@ rtps@H_403_29@@H_403_29@ wtps@H_403_29@@H_403_29@ Total amount of data read from the devices in blocks per second. Blocks are equivalent to sec-@H_403_29@ tors with 2.4 kernels and newer and therefore have a size of 512 bytes. With older kernels, a@H_403_29@
2. sar -B
反映每秒系统写入或从磁盘读出的page数,详情
@H_403_29@@H_403_29@ Total number of kilobytes the system paged in from disk per second. Note: With old kernels@H_403_29@@H_403_29@ Total number of kilobytes the system paged out to disk per second. Note: With old kernels@H_403_29@@H_403_29@ Number of page faults (major + minor) made by the system per second (post 2.5 kernels only).@H_403_29@ This is not a count of page faults that generate I/O, because some page faults can be resolved@H_403_29@
3. sar -c
反映系统每秒创建的进程数,如果这个数字很大可能是应用程序连接数据库是短连接,并且请求数据库频繁,而Postgresql采用的是客户端连接过来fork新进程然后这个新进程与客户端进行交互的模式,因此这种情况会造成数据库服务器大量的关闭和创建进程,sar -c能反映这种情况. 使用短连接还有一个坏处就是当系统中使用到sequence,并且这个sequence被大量的这种短连接进程请求,那么它设置的sequence cache没有效果并且会造成大量的跳号.
详情
4. sar -q
反映系统的负载,详情
5. sar -r
反映系统的内存和SWAP的使用情况,详情
@H_403_29@ -r Report memory and swap space utilization statistics. The following values are displayed:@H_403_29@@H_403_29@ kbmemfree@H_403_29@@H_403_29@ kbmemused@H_403_29@ Amount of used memory in kilobytes. This does not take into account memory used by the kernel@H_403_29@@H_403_29@@H_403_29@ kbbuffers@H_403_29@@H_403_29@ kbcached@H_403_29@@H_403_29@ kbswpfree@H_403_29@@H_403_29@ kbswpused@H_403_29@@H_403_29@@H_403_29@ kbswpcad@H_403_29@ Amount of cached swap memory in kilobytes. This is memory that once was swapped out, is swapped@H_403_29@ back in but still also is in the swap area (if memory is needed it doesn’t need to be swapped out
6. sar -R
反映每秒被free的内存,新增给buffer的内存,新增给cache的内存. 详情
@H_403_29@@H_403_29@ Number of memory pages freed by the system per second. A negative value represents a number of@H_403_29@ pages allocated by the system. Note that a page has a size of 4 kB or 8 kB according to the@H_403_29@@H_403_29@ Number of additional memory pages used as buffers by the system per second. A negative value@H_403_29@@H_403_29@ Number of additional memory pages cached by the system per second. A negative value means fewer
7. sar -u
反映系统cpu在user,nice,system,iowait,steal,idle的分配比例. 详情
@H_403_29@@H_403_29@ Percentage of cpu utilization that occurred while executing at the user level (application).@H_403_29@@H_403_29@ Percentage of cpu utilization that occurred while executing at the user level with nice priority.@H_403_29@@H_403_29@ Percentage of cpu utilization that occurred while executing at the system level (kernel).@H_403_29@@H_403_29@ Percentage of time that the cpu or cpus were idle during which the system had an outstanding disk@H_403_29@@H_403_29@ Show the percentage of time spent in involuntary wait by the virtual cpu or cpus while the hyper-@H_403_29@@H_403_29@ Percentage of time that the cpu or cpus were idle and the system did not have an outstanding disk@H_403_29@@H_403_29@ Note: On SMP machines a processor that does not have any activity at all (0.00 for every field) is a
8. sar -v
inode,file 或其他内核表的报告,详情
@H_403_29@ -v Report status of inode, file and other kernel tables. The following values are displayed:@H_403_29@@H_403_29@ dentunusd@H_403_29@@H_403_29@@H_403_29@@H_403_29@@H_403_29@ Percentage of allocated super block handlers with regard to the maximum number of super block@H_403_29@@H_403_29@@H_403_29@ Percentage of allocated disk quota entries with regard to the maximum number of cached disk quota@H_403_29@@H_403_29@@H_403_29@ Percentage of queued RT signals with regard to the maximum number of RT signals that can be
9. sar -w
反映每秒上下文的切换数量,详情
10. sar -W
反映SWAP每秒被换进或换出的数量,详情
二、接下来看看数据库表的设计 :
创建序列,函数 和表 :
@H_403_29@create sequence seq_server_id start with 1 increment by 1;
create or replace function get_info (i_id int) returns text as $BODY$declarev_info text;beginselect info into v_info from server where id=i_id;return v_info;exceptionwhen others thenreturn @H_566_2301@'no info';end$BODY$ language plpgsql;
@H_403_29@declare@H_403_29@begin@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_context where s_date=current_date-1) t1@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_context where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_cpu where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_cpu where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_inode where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_inode where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_io where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_io where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_load where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_load where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_mem where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_mem where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_mem_swap where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_mem_swap where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_page where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_page where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_proc where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_proc where s_date=current_date-1) t1@H_403_29@@H_403_29@(select * from (select server_id,row_number() over (partition by server_id order by s_date desc) from sar_swap where s_date=current_date-1) t1@H_403_29@return query select s1.ip||@H_566_2301@',row_number() over (partition by server_id order by s_date desc) from sar_swap where s_date=current_date-1) t1@H_403_29@@H_403_29@end
sar信息存放表 :
@H_403_29@create table sar_io@H_403_29@@H_403_29@create table sar_page@H_403_29@@H_403_29@create table sar_proc@H_403_29@@H_403_29@create table sar_load@H_403_29@@H_403_29@create table sar_mem_swap@H_403_29@@H_403_29@create table sar_mem@H_403_29@@H_403_29@@H_403_29@create table sar_inode@H_403_29@@H_403_29@create table sar_context@H_403_29@@H_403_29@create table sar_swap
三、在需要收集sar报告的操作系统中配置如下程序用来收集sar信息 :
配置 ~/.pgpass文件
@H_403_29@10.10.10.1:1931:sar:sar:DIGOAL
@H_403_29@chmod 400 ~/.pgpass
编写sar_collect.sh脚本,用于收集昨天的SAR报告
@H_403_29@vi /home/postgres/sar_collect.sh
@H_403_29@#!/bin/bash # 环境变量,数据库连接,避免风暴随机等待60秒内 . /home/postgres/.bash_profile DB_URL=@H_566_2301@"-h 10.10.10.1 -p 1931 -U sar -d sar" sleep $(($RANDOM%60)) NET_DEV=@H_566_2301@"`/sbin/route -n|grep UG|awk '{print $8}'|head -n 1`" IP_ADDR=@H_566_2301@"'`/sbin/ip addr show $NET_DEV|grep inet|grep "global $NET_DEV$@H_566_2301@"|awk '{print $2}'`'" SAR_FILE=@H_566_2301@"/var/log/sa/sa`date -d -1day +%d`" SAR_DATE=@H_566_2301@"'`date -d -1day +%Y-%m-%d`'" SERVER_ID=@H_566_2301@"`psql -A -t $DB_URL -c "select * from get_server_id($IP_ADDR)@H_566_2301@"`" # sar -b,sar_io tps rtps wtps bread/s bwrtn/s sql=@H_566_2301@`sar -b -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_io(server_id,s_date,s_time,tps,rtps,wtps,bread_p_s,bwrtn_p_s) values('$SERVER_ID','$SAR_DATE',","\47"$1$2"\47,$3",$4",$5",$6",$7");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -B,sar_page pgpgin/s pgpgout/s fault/s majflt/s sql=@H_566_2301@`sar -B -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_page(server_id,pgpgin_p_s,pgpgout_p_s,fault_p_s,majflt_p_s) values('$SERVER_ID',$6");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -c,sar_proc proc/s sql=@H_566_2301@`sar -c -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_proc(server_id,proc_p_s) values('$SERVER_ID',$3");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -q,sar_load runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 sql=@H_566_2301@`sar -q -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_load(server_id,runq_sz,plist_sz,ldavg_1,ldavg_5,ldavg_15) values('$SERVER_ID',$7");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -r,sar_mem_swap kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad sql=@H_566_2301@`sar -r -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_mem_swap(server_id,kbmemfree,kbmemused,percnt_memused,kbbuffers,kbcached,kbswpfree,kbswpused,percnt_swpused,kbswpcad) values('$SERVER_ID',$7",$8",$9",$10",$11");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -R,sar_mem frmpg/s bufpg/s campg/s sql=@H_566_2301@`sar -R -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_mem(server_id,frmpg_p_s,bufpg_p_s,campg_p_s) values('$SERVER_ID',$5");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -u,sar_cpu %user %nice %system %iowait %steal %idle sql=@H_566_2301@`sar -u -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_cpu(server_id,percnt_user,percnt_nice,percnt_system,percnt_iowait,percnt_steal,percnt_idle) values('$SERVER_ID',$9");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -v,sar_inode dentunusd file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz sql=@H_566_2301@`sar -v -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_inode(server_id,dentunusd,file_sz,inode_sz,super_sz,percnt_super_sz,dquot_sz,percnt_dquot_sz,rtsig_sz,percnt_rtsig_sz) values('$SERVER_ID',$11");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -w,sar_context cswch/s sql=@H_566_2301@`sar -w -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_context(server_id,cswch_p_s) values('$SERVER_ID',$3");"}'` psql $DB_URL -c @H_566_2301@"$sql" # sar -W,sar_swap pswpin/s pswpout/s sql=@H_566_2301@`sar -W -f $SAR_FILE|grep -E 'AM[ ]+([0-9]+|\.+|all|-)|PM[ ]+([0-9]+|\.+|all|-)'|awk '{print "insert into sar_swap(server_id,pswpin_p_s,pswpout_p_s) values('$SERVER_ID',$4");"}'` psql $DB_URL -c @H_566_2301@"$sql"@H_403_29@@H_403_29@# Author : Digoal.Zhou@H_403_29@# THE END
修改权限 :
chmod 500 sar_collect.sh
新建执行计划,
@H_403_29@crontab -e
@H_403_29@# 昨天15分钟最大负载排名前10,后一条为平均值排行 # 负载过高需要关注这些服务器上运行的数据库和业务是否健康,例如是否需要建索引,是否需要使用绑定变量等. select get_ip(server_id),* from (select *,row_number() over (partition by server_id order by ldavg_15 desc) from sar_load where s_date=current_date-1) t where row_number=1 order by ldavg_15 desc limit 10; select get_info(server_id),get_ip(server_id),round(avg(ldavg_15),2) ldavg_15 from sar_load where s_date=current_date-1 group by server_id,s_date order by ldavg_15 desc limit 10; # 昨天最大读请求数排名前10,后一条为平均值排行 # 读请求过高需要关注这些服务器上运行的数据库和业务是否健康,是否需要加内存,是否需要对存储性能扩容等. select get_ip(server_id),row_number() over (partition by server_id order by rtps desc) from sar_io where s_date=current_date-1) t where row_number=1 order by rtps desc limit 10; select get_info(server_id),round(avg(rtps),2) rtps from sar_io where s_date=current_date-1 group by server_id,s_date order by rtps desc limit 10; # 昨天最大写请求数排名前10,后一条为平均值排行 # 写请求过高需要关注这些服务器上运行的数据库和业务是否健康,例如是否需要减少索引,是否需要使用异步IO,是否需要对存储性能进行扩容等. select get_ip(server_id),row_number() over (partition by server_id order by wtps desc) from sar_io where s_date=current_date-1) t where row_number=1 order by wtps desc limit 10; select get_info(server_id),round(avg(wtps),2) wtps from sar_io where s_date=current_date-1 group by server_id,s_date order by wtps desc limit 10; # 昨天最大iowait排名前10,后一条为平均值排行 # iowait过高需要关注这些服务器上运行的数据库和业务是否健康,例如是否需要加内存,是否需要将常用数据放入内存,是否需要对存储性能进行扩容等. select get_ip(server_id),row_number() over (partition by server_id order by percnt_iowait desc) from sar_cpu where s_date=current_date-1) t where row_number=1 order by percnt_iowait desc limit 10; select get_info(server_id),round(avg(percnt_iowait),2) percnt_iowait from sar_cpu where s_date=current_date-1 group by server_id,s_date order by percnt_iowait desc limit 10; # 昨天最大swap页进出排名前10,后一条为平均值排行 # swap也进出过高需要关注这些服务器上运行的数据库和业务是否健康,是否需要将常用数据放入内存等. select get_ip(server_id),row_number() over (partition by server_id order by pswpin_p_s+pswpout_p_s desc) from sar_swap where s_date=current_date-1) t where row_number=1 order by pswpin_p_s+pswpout_p_s desc limit 10; select get_info(server_id),round(avg(pswpin_p_s+pswpout_p_s),2) pswpin_out_p_s from sar_swap where s_date=current_date-1 group by server_id,s_date order by pswpin_out_p_s desc limit 10; # 昨天最大SWAP使用比例前10,后一条为平均值排行 # SWAP使用率过高需要关注这些服务器上运行的数据库和业务是否健康,是否需要调整数据库参数,是否需要使用大页等. select get_ip(server_id),row_number() over (partition by server_id order by percnt_swpused desc) from sar_mem_swap where s_date=current_date-1) t where row_number=1 order by percnt_swpused desc limit 10; select get_info(server_id),round(avg(percnt_swpused),2) percnt_swpused from sar_mem_swap where s_date=current_date-1 group by server_id,s_date order by percnt_swpused desc limit 10; # 昨天每秒新建进程排名前10,后一条为平均值排行 # 每秒新建进程数过高需要关注这些服务器上运行的数据库和业务是否健康,例如是否需要加个数据库连接池使用长连接,Oracle是否需要使用共享连接,应用程序是否可以将短连接改成长连接的模式等. select get_ip(server_id),row_number() over (partition by server_id order by proc_p_s desc) from sar_proc where s_date=current_date-1) t where row_number=1 order by proc_p_s desc limit 10; select get_info(server_id),round(avg(proc_p_s),2) proc_p_s from sar_proc where s_date=current_date-1 group by server_id,s_date order by proc_p_s desc limit 10;
报告如图 :
五、最后通过邮件将报告发送给自己 :
发送邮件脚本 :
@H_403_29@#!/bin/bash@H_403_29@. /home/postgres/.bash_profile@H_403_29@@H_403_29@@H_403_29@echo -e @H_566_2301@`date +%F\ %T` >@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n---- WeeklyAvgValue TOP10: ----\n" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n1. ldavg_15 TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),get_ip(server_id),round(avg(ldavg_15),2) ldavg_15 from sar_load where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by ldavg_15 desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n2. rtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(rtps),2) rtps from sar_io where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by rtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n3. wtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(wtps),2) wtps from sar_io where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by wtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n4. iowait TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(percnt_iowait),2) percnt_iowait from sar_cpu where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by percnt_iowait desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n5. swap_page_in_out TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(pswpin_p_s+pswpout_p_s),2) pswpin_out_p_s from sar_swap where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by pswpin_out_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n6. swap_usage TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(percnt_swpused),2) percnt_swpused from sar_mem_swap where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by percnt_swpused desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n7. newproc_p_s TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),round(avg(proc_p_s),2) proc_p_s from sar_proc where s_date<=current_date-1 and s_date>=current_date-7 group by server_id order by proc_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n---- DailyAvgValue TOP10: ----\n" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n1. ldavg_15 TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) ldavg_15 from sar_load where s_date=current_date-1 group by server_id order by ldavg_15 desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n2. rtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) rtps from sar_io where s_date=current_date-1 group by server_id order by rtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n3. wtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) wtps from sar_io where s_date=current_date-1 group by server_id order by wtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n4. iowait TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) percnt_iowait from sar_cpu where s_date=current_date-1 group by server_id order by percnt_iowait desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n5. swap_page_in_out TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) pswpin_out_p_s from sar_swap where s_date=current_date-1 group by server_id order by pswpin_out_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n6. swap_usage TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) percnt_swpused from sar_mem_swap where s_date=current_date-1 group by server_id order by percnt_swpused desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n7. newproc_p_s TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),2) proc_p_s from sar_proc where s_date=current_date-1 group by server_id order by proc_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n---- DailyMaxValue TOP10: ----\n" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n1. ldavg_15 TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),ldavg_15 from (select *,row_number() over (partition by server_id order by ldavg_15 desc) from sar_load where s_date=current_date-1) t where row_number=1 order by ldavg_15 desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n2. rtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),bwrtn_p_s from (select *,row_number() over (partition by server_id order by rtps desc) from sar_io where s_date=current_date-1) t where row_number=1 order by rtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n3. wtps TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),row_number() over (partition by server_id order by wtps desc) from sar_io where s_date=current_date-1) t where row_number=1 order by wtps desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n4. iowait TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),percnt_idle from (select *,row_number() over (partition by server_id order by percnt_iowait desc) from sar_cpu where s_date=current_date-1) t where row_number=1 order by percnt_iowait desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n5. swap_page_in_out TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),pswpout_p_s from (select *,row_number() over (partition by server_id order by pswpin_p_s+pswpout_p_s desc) from sar_swap where s_date=current_date-1) t where row_number=1 order by pswpin_p_s+pswpout_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n6. swap_usage TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),kbswpcad from (select *,row_number() over (partition by server_id order by percnt_swpused desc) from sar_mem_swap where s_date=current_date-1) t where row_number=1 order by percnt_swpused desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n7. newproc_p_s TOP10 :\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select get_info(server_id),proc_p_s from (select *,row_number() over (partition by server_id order by proc_p_s desc) from sar_proc where s_date=current_date-1) t where row_number=1 order by proc_p_s desc limit 10;" >>@H_566_2301@/tmp/sar_report.log echo -e @H_566_2301@"\n---- get_server_nodata_yesterday: ----\n" >>@H_566_2301@/tmp/sar_report.log psql -h 127.0.0.1 sar sar -c @H_566_2301@"select * from get_server_nodata_yesterday();" >>@H_566_2301@/tmp/sar_report.log@H_403_29@@H_403_29@# Author : Digoal.Zhou@H_403_29@# THE END
六、其他,
1. 展现可以通过WEB形式来做,这里只是举了个简单的收集和统计的例子,未设计WEB开发.
2. 1000台服务器一天收集的这些sar日志数据量约200MB. 按照现在的硬盘容量,放几年没问题.
【参考】
man sar