/* Compile: gcc -pthread -m32 -ansi x.c */ #include <stdio.h> #include <inttypes.h> #include <pthread.h> static volatile uint64_t v = 0; void *func (void *x) { __sync_add_and_fetch (&v,1); return x; } int main (void) { pthread_t t; pthread_create (&t,NULL,func,NULL); pthread_join (t,NULL); printf ("v = %"PRIu64"\n",v); return 0; }
我有一个uint64_t变量,我想原子地增加,因为该变量是多线程程序中的计数器.
为了达到原子性,我使用GCC的atomic builtins.
如果我为一个amd64系统(-m64)编译,那么生成的汇编代码很容易理解.
通过使用锁addq,处理器保证增量为原子.
400660: f0 48 83 05 d7 09 20 lock addq $0x1,0x2009d7(%rip)
但同样的C代码在ia32系统(-m32)上产生一个非常复杂的ASM代码:
804855a: a1 28 a0 04 08 mov 0x804a028,%eax 804855f: 8b 15 2c a0 04 08 mov 0x804a02c,%edx 8048565: 89 c1 mov %eax,%ecx 8048567: 89 d3 mov %edx,%ebx 8048569: 83 c1 01 add $0x1,%ecx 804856c: 83 d3 00 adc $0x0,%ebx 804856f: 89 ce mov %ecx,%esi 8048571: 89 d9 mov %ebx,%ecx 8048573: 89 f3 mov %esi,%ebx 8048575: f0 0f c7 0d 28 a0 04 lock cmpxchg8b 0x804a028 804857c: 08 804857d: 75 e6 jne 8048565 <func+0x15>
这是我不明白的:
> lock cmpxchg8b
确保只有当预期值仍然存在于目标地址中时,才会写入已更改的变量.比较和交换保证以原子方式发生.
>但是什么保证读取0x804855a和0x804855f中的变量是原子的?
可能没有关系,如果有一个“脏的阅读”,但有人可以提出一个简短的证明,没有问题?
进一步:为什么生成的代码跳回到0x8048565而不是0x804855a?我很肯定,只有其他作家也只是增加变量,这只是正确的.这是对__sync_add_and_fetch函数的牵连需求吗?
解决方法
A word or doubleword operand that crosses a 4-byte boundary or a
quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles for access.
第3A卷8.1.1:
The Pentium processor (and newer processors since) guarantees that the
following additional memory operations will always be carried out
atomically:• Reading or writing a quadword aligned on a 64-bit
boundary• 16-bit accesses to uncached memory locations that fit
within a 32-bit data busThe P6 family processors (and newer
processors since) guarantee that the following additional memory
operation will always be carried out atomically:• Unaligned 16-,32-,
and 64-bit accesses to cached memory that fit within a cache line
因此,通过对齐,它可以在1个循环中读取,并且它适合于使读取原子的一个高速缓存行.
由于指针已经被加载,所以代码跳转到0x8048565,因为CMPXCHG8B将无法加载它们,因为CMPXCHG8B会将EAX:EDX设置为目标值,如果失败:
CMPXCHG8B Intel ISA手册的说明书Vol. 2A:
Compare EDX:EAX with m64. If equal,set ZF and load ECX:EBX into m64.
Else,clear ZF and load m64 into EDX:EAX.
因此,代码只需要增加新返回的值,然后重试.
如果我们这个在C代码中变得更容易:
value = dest; While(!CAS8B(&dest,value,value + 1)) { value = dest; }