Yikes! Assembly!

Author

Clayton Cafiero

Published

2025-09-11

To many, assembly may seem daunting. Why? Because (almost) all the layers of abstraction are stripped away. With assembly, there are registers and memory addresses, flags and instructions. That’s almost all there is to it!

According to Wikipedia, assembly can be any “low-level programming language with a very strong correspondence between the instructions in the language and the architecture’s machine code instructions. Assembly language usually has one statement per machine code instruction… but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.”¹

Assembly language instructions typically include an opcode, e.g., mov, add, sub, str, etc., which indicates the operation to be performed. Some opcodes requre one or more operands. For example, an addition instruction might look like this:

add r1, r3, r2

which would mean add the operands in registers r3 and r2, and leave the result in register r1.

As you might expect, there are opcodes to load data into registers—e.g., constants or values retrieved from memory locations, and opcodes to write data back to memory.

Every architecture has its own set of supported instruction, so there is no one, single “Assembly language”—there are many. Instruction sets vary by architecture (and sometimes by operating system), but they all share many common features, and all have aclose connection to hardware.

Depending on the architecture of the machine you’re working on, you could have instruction sets of different sizes. However, there’s a lot we can do with a small subset of instructions, and that’s enough to get us started.

Here’s some C code. Try not to be disappointed—it doesn’t do much.

/**
 * A small example to get started with assembly
 *
 * CS / CMPE 2210
 * Computer Organization
 * Clayton Cafiero <cbcafier@uvm.edu>
 * 2025-09-09
 */
int main(void)
{
   volatile int i = 1;
   volatile int j = 2;
   volatile int k = i + j;

   return 0;
}

All this does is assign 1 to i, 2 to j, and assign sum of i and j to k. When done, main() returns 0, and execution halts. Not much, right? But let’s see what happens when we compile this to (ARM) assembly.

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 15, 0 sdk_version 15, 5
    .globl  _main                           ; -- Begin function main
    .p2align    2
_main:                                  ; @main
    .cfi_startproc
; %bb.0:
    sub sp, sp, #16
    .cfi_def_cfa_offset 16
    mov w8, #1                          ; =0x1
    str w8, [sp, #12]
    mov w8, #2                          ; =0x2
    str w8, [sp, #8]
    ldr w8, [sp, #12]
    ldr w9, [sp, #8]
    add w8, w9, w8
    str w8, [sp, #4]
    mov w0, #0                          ; =0x0
    add sp, sp, #16
    ret
    .cfi_endproc
                                        ; -- End function
.subsections_via_symbols

That’s not too bad. Fewer than two dozen lines of code. But what does it mean?

Keep in mind that the CPU has a control / logic unit, arithmetic logic unit(s), some general-purpose registers, some special-purpose registers, and a few other doodads.

Here I’ve added annotations (in assembly ; is used to start a single line comment).

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 15, 0 sdk_version 15, 5
    .globl  _main                       ; -- Begin function main
    .p2align    2                       ; align the following code on a 4-byte boundary (2**2 = 4)
_main:                                  ; @main
    .cfi_startproc
; %bb.0:
    sub sp, sp, #16                     ; stack frame setup: reserve 16 bytes
                                        ; subtract 16 from current stack pointer
                                        ; make that the new stack pointer
    .cfi_def_cfa_offset 16              ; for debugging (ignore)
    mov w8, #1                          ; =0x1 move the value 1 (literal) into register w8
                                        ; on AArch64 wN is a 32-bit register, xN is 64-bit
    str w8, [sp, #12]                   ; store the value in w8 to the address: stack pointer + 12 bytes
                                        ; compiler set up stack frame of 16 bytes
                                        ; i is at [sp, #12], j is at [sp, #8], k is at [sp, #4]
                                        ; so this is writing i = 1
    mov w8, #2                          ; =0x2 move the value 2 (literal) into register w8
    str w8, [sp, #8]                    ; write value in w8 to address: stack pointer + 8 bytes
                                        ; j = 2
    ldr w8, [sp, #12]                   ; load value at sp + 12 (i) into register w8
    ldr w9, [sp, #8]                    ; load value at sp + 8 (j) into register w9
    add w8, w9, w8                      ; add operands in w9 and w8, and put result into w8
    str w8, [sp, #4]                    ; store the value in register 8 to stack pointer + 4 (k)
    mov w0, #0                          ; =0x0 move 0 to register w0
    add sp, sp, #16                     ; pop stack
    ret                                 ; return (return value, 0, is in w0)
    .cfi_endproc
                                        ; -- End function
.subsections_via_symbols

No generative AI was used in writing this material. This was written the old-fashioned way.

Footnotes

https://en.wikipedia.org/wiki/Assembly_language↩︎

Reuse

CC BY-NC-SA 4.0