Understanding the SBCL entry/exit assembly boiler plate code

Spread the love

Question Description

BACKGROUND

When using 64bit Steel Bank Common Lisp on Windows for a trivial identity function:

(defun a (x)
   (declare (fixnum x)) 
   (declare (optimize (speed 3) (safety 0))) 
  (the fixnum x))

I find the disassembly is given as:

* (disassemble 'a)

; disassembly for A
; Size: 13 bytes
; 02D7DFA6:       84042500000F20   TEST AL, [#x200F0000]      ; safepoint
                                                              ; no-arg-parsing entry point
;       AD:       488BE5           MOV RSP, RBP
;       B0:       F8               CLC
;       B1:       5D               POP RBP
;       B2:       C3               RET

I understand that the lines:

mov rsp, rbp
pop rbp
ret  

perform standard return from function operations, but I don’t understand why there are the lines:

TEST AL, [#x200F0000]  // My understanding is that this sets flags based on bitwise and of AL and contents of memory 0x200F0000

and

CLC // My understanding is that this clears the carry flag.

QUESTIONS

  1. Why does SBCL generate a test instruction, but never use the flags?
  2. Why does SBCL clear the carry flag before returning from a function?

Practice As Follows

As the disassembler hints, the TEST instruction is a safepoint. It’s used for synchronizing threads for the garbage collector. Safepoints are inserted in places where the compiler knows the thread is in a safe state for garbage collection to occur.

The form of the safepoint is defined in compiler/x86-64/macros.lisp:

#!+sb-safepoint
(defun emit-safepoint ()
  (inst test al-tn (make-ea :byte :disp sb!vm::gc-safepoint-page-addr)))

You are of course correct about the result of the operation not being used. In this case, SBCL is interested in a side effect of the operation. Specifically, if the page containing the address happens to be protected, the instruction generates a page fault. If the page is accessible, the instruction just wastes a very small amount of time. I should point out this is probably much, much, faster than simply checking a global variable.

On Windows, the C functions map_gc_page and unmap_gc_page in runtime/win32-os.c are used to map and unmap the page:

void map_gc_page()
{
    DWORD oldProt;
    AVER(VirtualProtect((void*) GC_SAFEPOINT_PAGE_ADDR, sizeof(lispobj),
                        PAGE_READWRITE, &oldProt));
}

void unmap_gc_page()
{
    DWORD oldProt;
    AVER(VirtualProtect((void*) GC_SAFEPOINT_PAGE_ADDR, sizeof(lispobj),
                        PAGE_NOACCESS, &oldProt));
}

Unfortunately I haven’t been able to track down the page fault handler, but the general idea seems to be that when a collection is needed, unmap_gc_page will be called. Each thread will continue running until it hits one of these safepoints, and then a page fault occurs. Presumably the page fault handler would then pause that thread, and then when all threads have been paused, garbage collection runs, and then map_gc_page is called again and the threads are allowed to resume.

The credits file honors Anton Kovalenko with introducing this mechanism.

On Linux and Mac OS X, a different synchronization mechanism is used by default, which is why the instruction isn’t generated on default builds for those platforms. (I’m not sure if the PowerPC ports use safepoints by default, but obviously they don’t use x86 instructions).

On the other hand, I have no idea about the CLC instruction.

Leave a Comment