本文原创为freas_1990,转载请标明出处:http://www.jb51.cc/article/p-zzldlbjr-yu.html
在Oracle里,曾经被itpub吵得沸沸扬扬的buffer,buffer handle概念已经逐渐冷淡下来。当年的热闹也仅仅停留在官方文档,以及一些猜测层面。
现在Oracle在互联网公司(主要是阿里巴巴)已经被逐渐铲除。开源技术攻城略地之势越见明显。
为了纪念曾经的热闹,我们从开源的postgresql层面来看一下buffer descriptor是什么概念吧。
/* * struct sbufdesc -- shared buffer cache Metadata for a single * shared buffer descriptor. * * We keep the name of the database and relation in which this * buffer appears in order to avoid a catalog lookup on cache * flush if we don't have the reldesc in the cache. It is also * possible that the relation to which this buffer belongs is * not visible to all backends at the time that it gets flushed. * Dbname,relname,dbid,and relid are enough to determine where * to put the buffer,for all storage managers. */ struct sbufdesc { Buffer freeNext; /* link for freelist chain */ Buffer freePrev; SHMEM_OFFSET data; /* pointer to data in buf pool */ /* tag and id must be together for table lookup to work */ BufferTag tag; /* file/block identifier */ int buf_id; /* maps global desc to local desc */ BufFlags flags; /* described below */ int16 bufsmgr; /* storage manager id for buffer */ unsigned refcount; /* # of times buffer is pinned */ char sb_dbname[NAMEDATALEN+1]; /* name of db in which buf belongs */ char sb_relname[NAMEDATALEN+1]; /* name of reln */ #ifdef HAS_TEST_AND_SET /* can afford a dedicated lock if test-and-set locks are available */ slock_t io_in_progress_lock; #endif /* HAS_TEST_AND_SET */ /* * I padded this structure to a power of 2 (128 bytes on a MIPS) because * BufferDescriptorGetBuffer is called a billion times and it does an * C pointer subtraction (i.e.,"x - y" -> array index of x relative * to y,which is calculated using division by struct size). Integer * ".div" hits you for 35 cycles,as opposed to a 1-cycle "sra" ... * this hack cut 10% off of the time to create the Wisconsin database! * It eats up more shared memory,of course,but we're (allegedly) * going to make some of these types bigger soon anyway... -pma 1/2/93 */ /* NO spinlock */ #if defined(PORTNAME_ultrix4) char sb_pad[60]; /* no slock_t */ #endif /* mips */ /* HAS_TEST_AND_SET -- platform dependent size */ #if defined(PORTNAME_aix) char sb_pad[44]; /* typedef unsigned int slock_t; */ #endif /* aix */ #if defined(PORTNAME_alpha) char sb_pad[40]; /* typedef msemaphore slock_t; */ #endif /* alpha */ #if defined(PORTNAME_hpux) char sb_pad[44]; /* typedef struct { int sem[4]; } slock_t; */ #endif /* hpux */ #if defined(PORTNAME_irix5) char sb_pad[44]; /* typedef abilock_t slock_t; */ #endif /* irix5 */ #if defined(PORTNAME_next) char sb_pad[56]; /* typedef struct mutex slock_t; */ #endif /* next */ /* HAS_TEST_AND_SET -- default 1 byte spinlock */ #if defined(PORTNAME_BSD44_derived) || \ defined(PORTNAME_bsdi) || \ defined(PORTNAME_bsdi_2_1) || \ defined(PORTNAME_i386_solaris) || \ defined(PORTNAME_linux) || \ defined(PORTNAME_sparc) || \ defined(PORTNAME_sparc_solaris) char sb_pad[56]; /* has slock_t */ #endif /* 1 byte slock_t */ };
开头的两个域:freeNext,freePrev是典型的双向链表知识,不多做阐述。
SHMEM_OFFSET data;定义了当前buffer的内存地址。由于buffer是在共享内存内,这个地址其实是一个“unsigned long”类型(参考postgresql共享内存之——分片(slice))。
refcount这个域是一个老概念了,在redis源代码解析系列里曾经提到过,也就是这个buffer(或者内存对象)被引用的次数。postgresql(或者Oracle)里的buffer pin即起源于这里。
当refcount非0时,说明当前buffer正在被使用,此时,应该被pin住。
/* HAS_TEST_AND_SET -- default 1 byte spinlock */ #if defined(PORTNAME_BSD44_derived) || \ defined(PORTNAME_bsdi) || \ defined(PORTNAME_bsdi_2_1) || \ defined(PORTNAME_i386_solaris) || \ defined(PORTNAME_linux) || \ defined(PORTNAME_sparc) || \ defined(PORTNAME_sparc_solaris) char sb_pad[56]; /* has slock_t */ #endif /* 1 byte slock_t */
这里为每个平台预留了1 byte的空间用做“自旋锁”。
顺便提一下。Oracle 10g里采用了mutex机制,并与之前的latch机制做了比较。窃以为,没有读到源代码,光看一些官方宣传文档,对mutex和latch机制做出评价都是在瞎扯淡。