Atomics From Scratch

TODOs:

Explain what the assembly is
What test and set it
The importance of atomics
- What’s the failure mode here
How libc++ implements atomics
Discuss gcc intrinsics
Build our own atomics
Build our own spinlock
Add RAII

If you’ve worked with multi-threaded code, you may have seen something like:

  std::atomic_flag flag=ATOMIC_FLAG_INIT;
  flag.clear(std::memory_order_release);
  bool x=flag.test_and_set();

where you modify flag concurrently safely. If you look at the the disassembly, you’ll see this compiled down to the following:

        mov     BYTE PTR [rbp-17], 0
        mov     DWORD PTR [rbp-12], 3
        mov     eax, DWORD PTR [rbp-12]
        mov     esi, 65535
        mov     edi, eax
        call    std::operator&(std::memory_order, std::__memory_order_modifier)
        mov     DWORD PTR [rbp-16], eax
        cmp     DWORD PTR [rbp-16], 1
        sete    al
        movzx   eax, al
        test    rax, rax
        je      .L4
        mov     ecx, OFFSET FLAT:.LC0
        mov     edx, OFFSET FLAT:.LC1
        mov     esi, 286
        mov     edi, OFFSET FLAT:.LC2
        call    std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L4:
        cmp     DWORD PTR [rbp-16], 2
        sete    al
        movzx   eax, al
        test    rax, rax
        je      .L5
        mov     ecx, OFFSET FLAT:.LC3
        mov     edx, OFFSET FLAT:.LC1
        mov     esi, 287
        mov     edi, OFFSET FLAT:.LC2
        call    std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L5:
        cmp     DWORD PTR [rbp-16], 4
        sete    al
        movzx   eax, al
        test    rax, rax
        je      .L6
        mov     ecx, OFFSET FLAT:.LC4
        mov     edx, OFFSET FLAT:.LC1
        mov     esi, 288
        mov     edi, OFFSET FLAT:.LC2
        call    std::__glibcxx_assert_fail(char const*, int, char const*, char const*)
.L6:
        lea     rax, [rbp-17]
        mov     edx, 0
        xchg    dl, BYTE PTR [rax]
        nop
        mov     DWORD PTR [rbp-8], 5
        lea     rdx, [rbp-17]
        mov     eax, 1
        xchg    al, BYTE PTR [rdx]
        nop
        mov     BYTE PTR [rbp-1], al

First, let’s review some theory with test and set and how that hardware m works at a high level

TODO

You could imagine an interleaving such as:

TODO

We could actually build a spinlock in gcc like:

.globl

Here’s what xchg does:

TODO

namespace kanrab {
    template<typename T>
    struct atomic {
        T element;
    };
}

We could actually constrain what’s being instantiated with


namespace {
    extern test_and_set_spinlock(u8 flag);
}

namespace kanrab {
template<typename T>
concept AtomicTypes = requires { std::is_same_v<T, bool>};

template<AtomicTypes AtomicT>
struct atomic {
    AtomicT element;
    void lock();
    void unlock();
}
}

auto flag = kanrab::atomic<bool> {};

And here is how the spinlock is created in assembly:

.text
.globl test_and_set_spinlock
.type test_and_set_spinlock, @function
test_and_set_spinlock:
spin:
xchg al, byte ptr [rdi]
cmp al, 0
jnz spin
ret

And now you could have threads TODO

But actually, there’s one more thing. We want to take advantage of RAII, and when we TODO