linux – PostgreSQL缓慢提交性能

前端之家收集整理的这篇文章主要介绍了linux – PostgreSQL缓慢提交性能前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我们在Postgresql配置方面遇到了一些问题.在一些基准测试之后,我发现非常简单的查询需要相对较长的时间,在进一步检查后,看起来实际的COMMIT命令确实很慢.

我使用下表进行了一个非常简单的测试:

CREATE TABLE test (
    id serial primary key,foo varchar(16),);

打开所有语句后,我运行了以下查询10000次:

BEGIN;
INSERT INTO test (a) VALUES ('bar');
COMMIT;

BEGIN和INSERT正在采用< 1ms完成,但COMMIT平均需要22ms才能完成. 在我自己的PC上运行相同的基准测试,速度要慢得多,产生的BEGIN和INSERT语句的平均值相同,但平均COMMIT约为0.4ms(快20倍). 经过一些阅读后,我尝试使用pg_test_fsync工具来解决问题.在服务器上我得到这些结果:

$./pg_test_fsync -o 1024
1024 operations per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      14.875 ops/sec
        fdatasync                          11.920 ops/sec
        fsync                              30.524 ops/sec
        fsync_writethrough                            n/a
        open_sync                          30.425 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      19.956 ops/sec
        fdatasync                          23.299 ops/sec
        fsync                              21.955 ops/sec
        fsync_writethrough                            n/a
        open_sync                           3.619 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
        16kB open_sync write                5.923 ops/sec
         8kB open_sync writes               3.120 ops/sec
         4kB open_sync writes              10.246 ops/sec
         2kB open_sync writes               1.787 ops/sec
         1kB open_sync writes               0.830 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar,fsync() can sync data written
on a different descriptor.)
        write,fsync,close                34.371 ops/sec
        write,close,fsync                36.527 ops/sec

Non-Sync'ed 8kB writes:
        write                           248302.619 ops/sec

在我自己的电脑上,我得到:

$./pg_test_fsync -o 1024
1024 operations per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      69.862 ops/sec
        fdatasync                          68.871 ops/sec
        fsync                              34.593 ops/sec
        fsync_writethrough                            n/a
        open_sync                          26.595 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      26.872 ops/sec
        fdatasync                          59.056 ops/sec
        fsync                              34.031 ops/sec
        fsync_writethrough                            n/a
        open_sync                          17.284 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
        16kB open_sync write                7.412 ops/sec
         8kB open_sync writes               3.942 ops/sec
         4kB open_sync writes               8.700 ops/sec
         2kB open_sync writes               4.161 ops/sec
         1kB open_sync writes               1.492 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar,close                35.086 ops/sec
        write,fsync                34.043 ops/sec

Non-Sync'ed 8kB writes:
        write                           240544.985 ops/sec

服务器的配置:

cpu: Intel(R) Core(TM) i7-3770 cpu @ 3.40GHz
RAM: 32GB
Disk: 2x 2TB SATA disk in Software RAID 1

用于比较的机器是带有16GB RAM和普通SATA磁盘的i5(没有raid).

更多信息:

>操作系统:ubuntu服务器12.10
>内核:Linux … 3.5.0-22-generic#34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU / Linux
>软件RAID 1
>文件系统是ext4
>未指定其他安装选项.
> Postgres版本9.1
> Linux mdadm raid

dump2efs的输出

dumpe2fs 1.42.5 (29-Jul-2012)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          16e30b20-0531-4bcc-877a-818e1f5d5fb2
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              182329344
Block count:              729289039
Reserved block count:     36464451
Free blocks:              609235080
Free inodes:              182228152
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      850
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   256
RAID stride:              1
Flex block group size:    16
Filesystem created:       Sat Jan 19 12:42:19 2013
Last mount time:          Wed Jan 23 16:23:11 2013
Last write time:          Sat Jan 19 12:46:13 2013
Mount count:              8
Maximum mount count:      30
Last checked:             Sat Jan 19 12:42:19 2013
Check interval:           15552000 (6 months)
Next check after:         Thu Jul 18 13:42:19 2013
Lifetime writes:          257 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           128
Journal inode:            8
First orphan inode:       17304375
Default directory hash:   half_md4
Directory Hash Seed:      a71fa518-7696-4a28-bd89-b21c10d4265b
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x000df5a4
Journal start:            31733

Mdadm – 详细输出

/dev/md2:
        Version : 1.2
  Creation Time : Sat Jan 19 12:42:05 2013
     Raid Level : raid1
     Array Size : 2917156159 (2782.02 GiB 2987.17 GB)
  Used Dev Size : 2917156159 (2782.02 GiB 2987.17 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Fri Mar 22 11:16:45 2013
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : rescue:2
           UUID : d87b98e7:d584a4ed:5dac7907:ae5639b0
         Events : 38

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3

更新2013-03-25:
我对两个磁盘都进行了长时间的智能测试,结果没有问题.两个磁盘均来自Seagate,型号:ST3000DM001-9YN166.

更新2013-03-27:
我在完全空闲的机器上运行了最新版本(9.2.3)的pg_test_fsync:

$./pg_test_fsync -s 3
3 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      39.650 ops/sec
        fdatasync                          34.283 ops/sec
        fsync                              19.309 ops/sec
        fsync_writethrough                            n/a
        open_sync                          55.271 ops/sec

它比之前略好,但仍然令人遗憾.两个磁盘上的分区对齐:

$sudo parted /dev/sdb unit s print
Model: ATA ST3000DM001-9YN1 (scsi)
Disk /dev/sdb: 5860533168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start      End          Size         File system  Name  Flags
 4      2048s      4095s        2048s                           bios_grub
 1      4096s      25169919s    25165824s                       raid
 2      25169920s  26218495s    1048576s                        raid
 3      26218496s  5860533134s  5834314639s                     raid

挂载-v输出

$mount -v | grep ^/dev/
/dev/md2 on / type ext4 (rw,noatime)
/dev/md1 on /boot type ext3 (rw)

md2设备正用于测试.要破坏交换分区并在各个磁盘上运行pg_test_fsync.

如果我在两个磁盘上单独运行pg_test_fsync,我得到大致相同的性能,分区是用noatime挂载的:

$pg_test_fsync/pg_test_fsync -s 3

3 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      75.111 ops/sec
        fdatasync                          71.925 ops/sec
        fsync                              37.352 ops/sec
        fsync_writethrough                            n/a
        open_sync                          33.746 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order,except fdatasync
is Linux's default)
        open_datasync                      38.204 ops/sec
        fdatasync                          49.907 ops/sec
        fsync                              32.126 ops/sec
        fsync_writethrough                            n/a
        open_sync                          13.642 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write          25.325 ops/sec
         2 *  8kB open_sync writes         12.539 ops/sec
         4 *  4kB open_sync writes          6.207 ops/sec
         8 *  2kB open_sync writes          3.098 ops/sec
        16 *  1kB open_sync writes          1.208 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar,close                27.275 ops/sec
        write,fsync                20.561 ops/sec

Non-Sync'ed 8kB writes:
        write                           562902.020 ops/sec

在阵列和单个磁盘上运行测试几次后,数字似乎变化很大.最坏的情况是性能大约是我在这里发布的50%(第一次测试大约30次操作/秒).这是正常的吗?机器一直处于空闲状态.

此外,根据dmesg输出,控制器处于AHCI模式.

解决方法

服务器具有令人难以置信的,令人难以置信的,非常慢的fsync性能.您的软件RAID 1设置存在严重问题.可怕的fsync性能几乎肯定是导致性能问题的原因.

桌面只有非常慢的fsync.

通过设置synchronous_commit = off并设置commit_delay,您可以以崩溃后丢失一些数据为代价来解决性能问题.但是,你确实需要在服务器上整理磁盘性能,这是令人不快的.

为了比较,这是我在笔记本电脑上得到的东西(i7,8GB RAM,中档128G SSD,9.2中的pg_test_fsync):

Compare file sync methods using one 8kB write:

        open_datasync                    4445.744 ops/sec
        fdatasync                        4225.793 ops/sec
        fsync                            2742.679 ops/sec
        fsync_writethrough                            n/a
        open_sync                        2907.265 ops/sec

不可否认,这款SSD可能不是硬电源丢失安全的,但是当我们谈论服务器成本时,它并不像一个体面的电源故障安全SSD成本很高.

猜你在找的Linux相关文章