A deep dive on macOS universal binaries


TL;DR: This article describes in detail how Mach-O universal binaries work

In 2006, Apple transitioned macOS from PowerPC to Intel processors. In 2020, Apple is transitioning macOS from Intel to ARM. In both cases, universal binaries played a key role on enabling a smooth CPU architecture transition for both developers and end-users.

Universal binaries, internally referred to as fat binaries, are not Mach-O objects. Instead, Apple defines fat binaries as simple archives that embed one or more Mach-O objects.

Let’s take the /usr/bin/wc utility as an example. The file(1) command detects /usr/bin/wc as being an x86_64 and arm64 universal binary:

$ file /usr/bin/wc
/usr/bin/wc: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/wc (for architecture x86_64):  Mach-O 64-bit executable x86_64
/usr/bin/wc (for architecture arm64e):  Mach-O 64-bit executable arm64e

Parsing the fat_header structure

The format of a fat binary is defined in the /usr/include/mach-o/fat.h header within your macOS SDK installation. Assuming you have Xcode installed, the full path to this header file is the following:

$(xcrun --show-sdk-path)/usr/include/mach-o/fat.h

This file defines the following structures:

struct fat_header {
    uint32_t    magic;      /* FAT_MAGIC or FAT_MAGIC_64 */
    uint32_t    nfat_arch;  /* number of structs that follow */
};

struct fat_arch {
    cpu_type_t  cputype;    /* cpu specifier (int) */
    cpu_subtype_t   cpusubtype; /* machine specifier (int) */
    uint32_t    offset;     /* file offset to this object file */
    uint32_t    size;       /* size of this object file */
    uint32_t    align;      /* alignment as a power of 2 */
};

struct fat_arch_64 {
    cpu_type_t  cputype;    /* cpu specifier (int) */
    cpu_subtype_t   cpusubtype; /* machine specifier (int) */
    uint64_t    offset;     /* file offset to this object file */
    uint64_t    size;       /* size of this object file */
    uint32_t    align;      /* alignment as a power of 2 */
    uint32_t    reserved;   /* reserved */
};

A fat binary consists of a fat_header structure followed by N fat_arch or fat_arch_64 structures followed by the corresponding Mach-O objects in order. A fat binary can define a single architecture. However a fat binary cannot declare the same architecture more than once.

The 4-byte magic constant is used by tools such as file(1) to determine whether the file is a fat binary. Additionally, the magic constant determines whether fat_arch or fat_arch_64 structures will be used. In comparison to fat_arch, fat_arch_64 uses 64-bit unsigned integers for the offset and size. As a consequence, fat_arch_64 can describe larger Mach-O objects than fat_arch.

Let’s dump the first 48 octets of /usr/bin/wc:

$ xxd -l 48 -c 12 /usr/bin/wc
00000000: cafe babe 0000 0002 0100 0007  ............
0000000c: 0000 0003 0000 4000 0000 dba0  ......@.....
00000018: 0000 000e 0100 000c 8000 0002  ............
00000024: 0001 4000 0000 daf0 0000 000e  ..@.........

The fat binary starts with the constant 0xCA 0xFE 0xBA 0xBE. This means that the fat binary will use fat_arch structures, according to the following definitions from mach-o/fat.h:

#define FAT_MAGIC     0xcafebabe
#define FAT_MAGIC_64  0xcafebabf

The magic constant is followed by the 32-bit unsigned integer 2, which means that fat_header is followed by two fat_arch structures.

Parsing fat_arch structures

The first fat_arch structure looks like this:

cpu_type_t cputype       = 0100 0007 = CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64)
cpu_subtype_t cpusubtype = 0000 0003 = CPU_SUBTYPE_X86_64_ALL
uint32_t offset          = 0000 4000 = 16384
uint32_t size            = 0000 dba0 = 56224
uint32_t align           = 0000 000e = 14

The cpu_type_t and cpu_subtype_t fields represent the target architecture of the corresponding Mach-O object. The $(xcrun --show-sdk-path)/usr/include/mach/machine.h header defines these CPU types in terms of the legacy Mach integer_t type:

typedef integer_t       cpu_type_t;
typedef integer_t       cpu_subtype_t;

In turn, integer_t is defined as a 32-bit signed integer by $(xcrun --show-sdk-path)/usr/include/mach/machine/machine_types.defs:

type integer_t = int32_t;

The mach/machine.h header defines the valid cpu_type_t and cpu_subtype_t values. In this case, this fat_arch structure represents a generic x86_64 Mach-O object (CPU_TYPE_X86_64 with subtype CPU_SUBTYPE_X86_64_ALL).

The fat binary defines that the Mach-O object starts at the offset 16384. We can corroborate that the offset is correct by reading the first two octects at such position. The result corresponds to the Little Endian MH_CIGAM Mach-O magic constant defined in $(xcrun --show-sdk-path)/usr/include/mach-o/loader.h:

$ xxd -s 16384 -l 4 /usr/bin/wc
00004000: cffa edfe                                ....

The remaining fields of the fat_arch structure tell us that the Mach-O object has a size of 56224 bytes and that the object is aligned to 16384 (2 ^ 14) bytes.

The second fat_arch structure looks like this:

cpu_type_t cputype       = 0100 000c = CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)
cpu_subtype_t cpusubtype = 8000 0002 = CPU_SUBTYPE_ARM64E
uint32_t offset          = 0001 4000 = 81920
uint32_t size            = 0000 daf0 = 56048
uint32_t align           = 0000 000e = 14

This time, the CPU information refers to 64-bit ARMv8.3 (CPU_TYPE_ARM64 with subtype CPU_SUBTYPE_ARM64E). The offset of this Mach-O object is 81920, the next multiple of 16384 (due to align) after the offset of the previous fat_arch structure (16384 + 56224 = 72608). We can corroborate that this offset points to the start of a valid Mach-O object like we did before:

$ xxd -s 81920 -l 4 /usr/bin/wc
00014000: cffa edfe                                ....

The remaining fields of the fat_arch structure tell us that the Mach-O object has a size of 56048 bytes and that the object is again aligned to 16384 (2 ^ 14) bytes.

To automate this process, we can parse fat binaries using the lipo(1) utility tool that ships with macOS along with its -detailed_info option:

$ lipo -detailed_info /usr/bin/wc
Fat header in: /usr/bin/wc
fat_magic 0xcafebabe
nfat_arch 2
architecture x86_64
    cputype CPU_TYPE_X86_64
    cpusubtype CPU_SUBTYPE_X86_64_ALL
    capabilities 0x0
    offset 16384
    size 56224
    align 2^14 (16384)
architecture arm64e
    cputype CPU_TYPE_ARM64
    cpusubtype CPU_SUBTYPE_ARM64E
    capabilities PTR_AUTH_VERSION USERSPACE 0
    offset 81920
    size 56048
    align 2^14 (16384)

Extracting Mach-O objects

A fat binary is a simple uncompressed archive format to embed more than one standalone Mach-O object in a single file. Each Mach-O object is associated with a fat_arch or fat_arch_64 structure that defines its offset and size within the fat binary.

Let’s use this knowledge to extract the arm64 Mach-O object from the /usr/bin/wc fat binary. We know that the offset and the size of the arm64 variant is 81920 and 56048, respectively. Therefore, we can use dd(1) to extract the executable into a file called wc-arm as follows:

$ dd if=/usr/bin/wc of=wc-arm iseek=81920 count=56048 bs=1
56048+0 records in
56048+0 records out
56048 bytes transferred in 0.123558 secs (453617 bytes/sec)

Running file(1) over the newly created file correctly reports that the file is an arm64 Mach-O object:

$ file wc-arm
wc-arm: Mach-O 64-bit executable arm64e

We can confirm that the binary works by giving execution permissions to it and using it to count the number of characters in a given string:

$ chmod +x wc-arm
$ echo "hello" | ./wc-arm -c
       6

Instead of manually calculating the start and end offsets and extracting the objects using dd(1), we can use lipo(1) with the -thin option by passing the architecture that we want to extract as a command-line argument:

$ lipo /usr/bin/wc -thin arm64e -output wc-arm
$ file wc-arm
wc-arm: Mach-O 64-bit executable arm64e

Creating Fat Executable Binaries

The lipo(1) utility can be used to create fat binaries out of existing Mach-O objects using the -create option. Let’s create a basic C program called test.c that prints a message passed by the pre-processor at build time:

#include <stdio.h>

int main() {
  printf("Test %s\n", ARCH);
  return 0;
}

We will compile test.c to both arm64 and x86_64, using -D to pass a different architecture message in each case. The arm64 executable will print Test arm64 while the Intel x64 executable will print Test x86_64:

$ clang test.c -o arm64-test -arch arm64 -DARCH=\"arm64\"
$ clang test.c -o x86_64-test -arch x86_64 -DARCH=\"x86_64\"

We will merge the arm64-test and x86_64-test Mach-O objects into a fat binary called universal-test using lipo(1):

$ lipo -create arm64-test x86_64-test -output universal-test
$ file universal-test
universal-test: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
universal-test (for architecture x86_64):       Mach-O 64-bit executable x86_64
universal-test (for architecture arm64):        Mach-O 64-bit executable arm64

In order to test the universal binary, we will execute it using the arch(1) utility tool that comes with macOS. This tool takes a fat binary and a desired architecture to execute inputs. I have an Apple Silicon MacBook Pro, in which case the arm64 variant will run natively and the x86_64 variant will run on Rosetta 2:

$ arch -arm64 ./universal-test
Test arm64
$ arch -x86_64 ./universal-test
Test x86_64

Creating Universal Objects

Fat binaries can bundle any type of Mach-O objects, not only executables of type MH_EXECUTE, and the linker will resolve the right variant automatically. Let’s write a basic C module that exposes a function to print the current architecture and write different implements for both arm64 and x86_64:

// arch.h
#ifndef ARCH_H_
#define ARCH_H_
#include <stdio.h>
void print_arch();
#endif

// arch-arm64.c
#include "arch.h"
void print_arch() {
  printf("arm64\n");
}

// arch-x86_64.c
void print_arch() {
  printf("x86_64\n");
}

Let’s separately compile the module for both architectures and create a fat binary called arch-universal out of the results:

$ clang -c arch-arm64.c -arch arm64
$ clang -c arch-x86_64.c -arch x86_64
$ lipo -create arch-arm64.o arch-x86_64.o -output arch-universal.o
$ file arch-universal.o
arch-universal.o: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit object x86_64] [arm64:Mach-O 64-bit object arm64]
arch-universal.o (for architecture x86_64):     Mach-O 64-bit object x86_64
arch-universal.o (for architecture arm64):      Mach-O 64-bit object arm64

Finally, let’s write a sample executable program that makes use of this module:

// main.c
#include "arch.h"

int main() {
  print_arch();
}

If we compile main.c to arm64 and link it to arch-universal.o, the linker will create a Mach-O executable object (not a fat binary) that makes use of the arm64 implementation of print_arch:

$ clang main.c arch-universal.o -arch arm64 -o main-arm64
$ file main-arm64
main-arm64: Mach-O 64-bit executable arm64
$ ./main-arm64
arm64

Similarly, if we compile main.c to x86_64 and link it to arch-universal.o, the linker will create a Mach-O executable object that makes use of the x86_64 implementation of print_arch:

$ clang main.c arch-universal.o -arch x86_64 -o main-x86_64
$ file main-x86_64
main-x86_64: Mach-O 64-bit executable x86_64
$ ./main-x86_64
x86_64