asm goto is really useful in this context. It means you can pick the branch instructions and define the exact control flow graph you want. Intrinsics, normal C, inline asm defining basic blocks and asm goto defining the CFG lets you emit exactly the instructions you want, albeit with somewhat challenging syntax.
The pinning local variables to registers thing didn't work in llvm a couple of years ago (which seems consistent with the gcc docs) but does work at the boundaries of inline asm and that's generally enough. I like a pin-register intrinsic, something like `u64 pin(u64, enum reg)` where the compile time constant enum names the register and the semantics are a no-op other than constraining the register allocator, but that doesn't seem to be readily available in gcc/clang.
I don't have a good answer to constraining instruction scheduling.
On reflection it's all somewhat more horrible than it needs to be, perhaps inline compiler IR is a better idea.
The pinning local variables to registers thing didn't work in llvm a couple of years ago (which seems consistent with the gcc docs) but does work at the boundaries of inline asm and that's generally enough. I like a pin-register intrinsic, something like `u64 pin(u64, enum reg)` where the compile time constant enum names the register and the semantics are a no-op other than constraining the register allocator, but that doesn't seem to be readily available in gcc/clang.
I don't have a good answer to constraining instruction scheduling.
On reflection it's all somewhat more horrible than it needs to be, perhaps inline compiler IR is a better idea.