Running Apple M1 Assembly with the Hello World program

Sigrid Jin
8 min readAug 20



MacOS uses LLVM, or Low-Level Virtual Machine by default in that the Cupertino-based tech firm has given extensive support to the LLVM toolchain ensuring its compatibility with new Macs.

Apple discarded the support to ensure compatibility of LLVM over GNU GCC while Linux-based system typically uses GNU Compiler Collection (GNU GCC).

These compiler differences affect the command line arguments in the Makefile for this article.

Loading Addresses

ADR instruction

You often need to load an address into a register on ARM-based architectures.

ADR function loads the address of a label into a register. It calculates the address based on the current program counter value. Instead of loading a data value, it loads the memory address where some data/instruction is stored.

The instruction is designed to load the address of a label into a register. But instead of having a static address, the instruction computes this address dynamically, in relation to where the program is currently executing.

In Assembly, a label is an identifier placed at the beginning of an instruction to indicate a specific location within the program.

From the example below, startLabel is a label indicating the location of the ADR instruction in memory.

ADR x0, startLabel # x0 register will contain the memory address of the startLabel
  • When the ADR instruction is executed, it takes the memory address associated with a given label and loads that address into a specific register. When it comes to the example above, after executing the ADR x0, startLabel instruction, the X0 register will contain the memory address of the startLabel.
  • Since the program counter always points to the next instruction to be executed, the ADR instruction looks at the program counter's value in that it considers its own position relative to the program counter and then determines the address of the label based on its offsets.
  • Apple Silicon prefers the use of the ADR instruction for loading addresses because its linker/loader is designed to minimize the number of relocations. Relocations are adjustments made by the loader or linker to account for dynamic addresses, and by minimizing these, the process becomes more efficient and fast.

LDR(or Load Register) instruction

  • The instruction is used to load a value from memory into a register. It is typically done using a memory address or an offset from a base register.
  • The example below refers to the action of loading the value stored at the memory address contained in register R2 into register R1
LDR R1, [R2] // R2's memory addr to allocated in R1


  • The ADR instruction is typically used when you want an address that is relative to the current position in the program (counter), ensuring the code remains location-independent (or the position-independent code, PIC), while LDR instruction is suitable for situations where an absolute address is needed, and you know the location will not be changed at compile time.
  • Imagine that we have the following program (running on macOS)
.globl _main
.p2align 2
adr x1, label # adr x1, label loads the PC-relative address of label into register x1.
ldr x2, =label # ldr x2, =label loads the absolute address of label into register x2.
mov w0, 0

mov w0, 0x1234
  • Compiling above and looking at it in disassembler yields:
  • The adr instruction directly references label (sym.func.100003fa8).
  • The ldr instruction references a memory location (0x100003fb0) that contains the absolute address of label
;-- _main:
0x100003f98 81000010 adr x1, sym.func.100003fa8 # see here
0x100003f9c a2000058 ldr x2, 0x100003fb0 # see here
0x100003fa0 00008052 mov w0, 0
0x100003fa4 c0035fd6 ret
;-- func.100003fa8:
0x100003fa8 80468252 mov w0, 0x1234
0x100003fac c0035fd6 ret
0x100003fb0 a83f0000 invalid
0x100003fb4 01000000 invalid
  • When working in environments where the program’s loading address might change (due to dynamic linking), using adr can offer flexibility.
  • With adr, the address calculations are based on where the code is currently running. This avoids hardcoding specific addresses, which can be problematic if the location of the code changes.
  • In contrast, if you’re using ldr to load an address, and the program's base address changes, any hardcoded addresses in your program would need to be adjusted to account for the change. The dynamic linker must adjust the loaded address to account for the adjusted offset.
  • Given the aversion of macOS to the relocations that LDR would require the fact that ADR works seamlessly on both platforms, it's the more universal choice.

Service Calls

When a program makes a system call, or service call, the user-space programs request another services from the operating system’s kernel. The services can range from file operations to network operations and process management.

When a program makes a system call, it communicates a few pieces of information to the kernel:

  1. Which service is being requested: This is specified by a system call number. Each type of service (e.g., open, read, write, etc.) has a unique identifier.
  • MacOS may also use the X16 register to hold the system call number,
  • In Linux (especially the ARM64 architecture), the identifier for the system call is placed in the register X8.

2. Any necessary arguments: These might be file paths, data to write, buffer locations, etc. The exact arguments vary depending on the system call. The first few arguments for the system call (if any) are passed using the registers X0 through X7.

3. Service Trigger: After loading the appropriate values into the registers, a special instruction, often SVC 0, is executed to signal the kernel to perform the system call.

package main

import (

func main() {
filename := "/tmp/testfile.txt"
flags := syscall.O_RDONLY // open file in read-only mode
mode := uint32(0666) // file permissions - rw-rw-rw-
fd, err := syscall.Open(filename, flags, mode)
if err != nil {
fmt.Printf("Error opening file %s: %s\n", filename, err)
defer syscall.Close(fd)
fmt.Printf("File %s opened successfully!\n", filename)

  • Remarks: MacOS, which has a more controlled ecosystem on Apple’s hardware, might have prioritized compatibility and optimization to their silicones and thus diverged their implementations from the standard.

Library and Include Paths

Linux follows the Filesystem Hierarchy Standard (FHS) which standardizes the directory structure and directory contents in Linux distributions.

  1. /usr/lib: This directory contains libraries for the binaries located in /usr/bin and /usr/sbin. It’s one of the primary directories for shared libraries, modules, and kernel drivers.
  2. /usr/include: This is the directory for header files that are used by the C compiler to reference libraries. When developing in C or C++, these header files are required to link programs against the installed libraries.

MacOS has a different philosophy and approach. One major reason for this is that Apple manages multiple OS platforms (MacOS, iOS, iPadOS, WatchOS, etc.), each with its own set of libraries and headers.

XCode and SDKs: When you install XCode, it comes bundled with Software Development Kits (SDKs) for the various Apple platforms. Each SDK contains the libraries, headers, and other essential tools required to develop that platform.

Path Structure: Instead of a simple folder like /usr/include, the libraries and headers in macOS are tucked away inside these SDKs, each having its own version. This structure supports the simultaneous development of different Apple platforms and versions from a single machine.

Tools like xcrun: Given the complexity of these paths and the potential for them to change between XCode versions, developers are discouraged from hardcoding them. Instead, tools like xcrun provide a way to find the right toolchain or path dynamically.

For instance, xcrun --show-sdk-path will show the path to the currently active SDK, ensuring scripts and makefiles remain valid across different setups and versions.


  • Alignment refers to arranging data at memory addresses in a way that meets specific constraints defined by the processor, bus architecture. Proper alignment can help CPUs fetch data more efficiently and avoid unnecessary bus transactions.
  • In macOS, it requires programs to start on a 64-bit boundary which means that the address should be divisible by 8 bytes (using .align 3 means the representation of a power of 2, which is 2^3=8 bytes)


In MacOS you need to link in the System library even if you don’t make a system call from it or you get a linker error.

The sample Hello World program below uses software interrupts to make the system calls rather than the API in the System library and so shouldn’t need to link to it.

In MacOS the default entry point is _main whereas in Linux it is _start. This is changed via a command line argument to the linker.

Hello World Code


  • This rule links the HelloWorld.o object file to produce the executable HelloWorld.
  • -macosx_version_min 11.0.0: Specifies the minimum macOS version for the binary.
  • -lSystem: Links against the System library
  • -syslibroot : xcrun -sdk macosx --show-sdk-path : Sets the root directory for system libraries using xcrun to fetch the path dynamically.
  • -e _start: Specifies the entry point for the program, which is the _start label we defined in the assembly code.
  • -arch arm64: Specifies the target architecture, which is ARM64.
HelloWorld: HelloWorld.o
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64

// This rule compiles the HelloWorld.s assembly file into an object file HelloWorld.o.
HelloWorld.o: HelloWorld.s
as -o HelloWorld.o HelloWorld.s


  • mov : the mov instruction that moves data between registers and memory
// Assembler program to print "Hello World!"
// to stdout.
// X0-X2 - parameters to linux function services
// X16 - linux function number
.global _start // Provide program starting address to linker
.align 3 // To align on a 64-bit boundary, you'd typically use .align 3, which aligns to 2^3 (or 8 bytes).

// Setup the parameters to print hello world
// and then call Linux to do it.

_start: mov X0, #1 // 1 = StdOut
// setting X0 registers to 1, which represents the file descriptor for stdout
adr X1, helloworld // string to print
// reads the instruction sets the X1 register to the address of the `helloworld` string
mov X2, #13 // length of our string
// This sets X2 register to 13, which is the length of the "Hello World!\n" string.
mov X16, #4 // MacOS write system call
svc 0 // Call linux to output the string
// This instruction triggers a system call.
// It tells the OS to perform the system call specified by the value in the X16 register.

// Setup the parameters to exit the program
// and then call Linux to do it.

mov X0, #0 // Use 0 return code
mov X16, #1 // Service command code 1 terminates this program
svc 0 // Call MacOS to terminate the program

helloworld: .ascii "Hello World!\n"

Running the code

./HelloWorld # Hello World!