Atomics From Scratch
TODOs:
- Explain what the assembly is
- What test and set it
- The importance of atomics
- What’s the failure mode here
- How libc++ implements atomics
- Discuss gcc intrinsics
- Build our own atomics
- Build our own spinlock
- Add RAII
If you’ve worked with multi-threaded code, you may have seen something like:
std::atomic_flag flag=ATOMIC_FLAG_INIT;
flag.clear(std::memory_order_release);
bool x=flag.test_and_set();
where you modify flag concurrently safely. If you look at the the
disassembly, you’ll see this compiled down to the following:
mov BYTE PTR [rbp-17], 0
mov DWORD PTR [rbp-12], 3
mov eax, DWORD PTR [rbp-12]
mov esi, 65535
mov edi, eax
call std::operator&(std::memory_order, std::__memory_order_modifier)
mov DWORD PTR [rbp-16], eax
cmp DWORD PTR [rbp-16], 1
sete al
movzx eax, al
test rax, rax
je .L4
mov ecx, OFFSET FLAT:.LC0
mov edx, OFFSET FLAT:.LC1
mov esi, 286
mov edi, OFFSET FLAT:.LC2
call std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L4:
cmp DWORD PTR [rbp-16], 2
sete al
movzx eax, al
test rax, rax
je .L5
mov ecx, OFFSET FLAT:.LC3
mov edx, OFFSET FLAT:.LC1
mov esi, 287
mov edi, OFFSET FLAT:.LC2
call std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L5:
cmp DWORD PTR [rbp-16], 4
sete al
movzx eax, al
test rax, rax
je .L6
mov ecx, OFFSET FLAT:.LC4
mov edx, OFFSET FLAT:.LC1
mov esi, 288
mov edi, OFFSET FLAT:.LC2
call std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L6:
lea rax, [rbp-17]
mov edx, 0
xchg dl, BYTE PTR [rax]
nop
mov DWORD PTR [rbp-8], 5
lea rdx, [rbp-17]
mov eax, 1
xchg al, BYTE PTR [rdx]
nop
mov BYTE PTR [rbp-1], al
First, let’s review some theory with test and set and how that hardware m works at a high level
TODO
You could imagine an interleaving such as:
TODO
We could actually build a spinlock in gcc like:
.globl
Here’s what xchg does:
TODO
namespace kanrab {
template<typename T>
struct atomic {
T element;
};
}
We could actually constrain what’s being instantiated with
namespace {
extern test_and_set_spinlock(u8 flag);
}
namespace kanrab {
template<typename T>
concept AtomicTypes = requires { std::is_same_v<T, bool>};
template<AtomicTypes AtomicT>
struct atomic {
AtomicT element;
void lock();
void unlock();
}
}
auto flag = kanrab::atomic<bool> {};
And here is how the spinlock is created in assembly:
.text
.globl test_and_set_spinlock
.type test_and_set_spinlock, @function
test_and_set_spinlock:
spin:
xchg al, byte ptr [rdi]
cmp al, 0
jnz spin
ret
And now you could have threads TODO
But actually, there’s one more thing. We want to take advantage of RAII, and when we TODO