Mach-O

Mac OS X, released in 2001, is built on the XNU kernel, a derivative of the Mach Kernel from 1985, originally developed for operating system research. This explains why Apple devices employ the Mach-O format for structuring binaries.

Here’s a brief overview of the Mach-O file structure:

  • Header: Contains metadata about the file (CPU type, file type: exec, dylib, etc.), with a fixed size across all files.
  • Load Commands: Describe the memory layout and linking information, with a variable size.
  • Data: segments to be loaded into memory, notably these three:
    • __TEXT: immutable code and read-only data.
    • __DATA: mutable data.
    • __LINKEDIT: dyld instructions to relocate the library, import functions it needs, and export functions it implements.

Mach-O Header

The header for all 64 bit binaries is mach_header_64, defined in loader.h. That’s better documented that the mach_header_64 API docs, or the translated Swift version you can read by command-clicking this code:

import Darwin
let header = mach_header_64()

With this header we can read the header of a Mach-O binary from Swift. Although in practice, it’s faster to use the otool command from the terminal.

Mach-O File Type

To see the file type of a Mach-O product, dump the header with otool -h:

% otool -h Hello.o              
Hello.o:
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777228          0  0x00           1     9       1480 0x00002000

Mind the column filetype, it corresponds with the types defined in <mach-o/loader.h>. Here are the one you’ll find during development:

  • MH_OBJECT 0x1: Relocatable object file (.o)
  • MH_EXECUTE 0x2: Executable
  • MH_DYLIB 0x6: Dynamic library (.dylib)
  • 0x21: Static library (.a)
  • (…) core dumps, legacy formats, and others.

The following don’t have specific types:

  • Frameworks and xcframeworks: not a Mach-O but a bundle that contains a static or dynamic library.
  • Mergeable libraries: uses a dylib file type (MH_DYLIB) despite the additional metadata.
  • TBD: a text file that work as a placeholder of a dylib. How so? if the program that uses the dylibs runs in an environment where the dylib is present, it doesn’t need the symbols until executed. It is enough to know what symbols are defined in the library.

Mach-O Magic number

Universal binaries have a special header:

% otool -f UseHello
Fat headers
fat_magic 0xcafebabe
nfat_arch 2
architecture 0
    cputype 16777223
    cpusubtype 3
    capabilities 0x0
    offset 4096
    size 43400
    align 2^12 (4096)
architecture 1
    cputype 16777228
    cpusubtype 0
    capabilities 0x0
    offset 49152
    size 63792
    align 2^14 (16384)

The fat_magic 0xcafebabe is the Mach-O magic number. Dump the first 4 bytes of the executable to see its value:

% xxd -l 4 UseHello   
00000000: cafe babe 

Magic numbers can be:

  • 0xcafebabe (cafe babe) Universal binary
  • 0xfeedface (feed face) for 32-bit
  • 0xfeedfacf (feed face f) for 64-bit

These are the macOS equivalent of 0x7FELF for ELF or MZ for Windows Portable Executables.

Mach-O Load Commands

The Mach-O binary format contains segments of data, each preceded by a load command (LC) that instructs the kernel and dynamic linker how to handle the binary. Here’s an example of load commands from the object code in the previous article:

% otool -l Hello.o | grep LC_
      cmd LC_SEGMENT_64      - maps sections into process address space
      cmd LC_BUILD_VERSION   - sets the minimum OS and SDK version
     cmd LC_SYMTAB           - Location and structure of symbol table, string table
            cmd LC_DYSYMTAB  - Idem for the Dynamic symbol table
     cmd LC_LINKER_OPTION - load swiftSwiftOnoneSupport
     cmd LC_LINKER_OPTION - load swiftCore
     cmd LC_LINKER_OPTION - load swift_Concurrency
     cmd LC_LINKER_OPTION - load swift_StringProcessing
     cmd LC_LINKER_OPTION - load objc runtime interoperability

Executable files may have additional load commands like LC_LOAD_DYLIB, LC_MAIN, etc. All possible load commands are defined in <mach-o/loader.h>.

LC_SEGMENT_64

LC_SEGMENT_64 includes a number of parameters indicating to the kernel how to map the code into virtual memory to get the code ready for execution. They are provided for program code, program data, and symbol tables used by the linker.

LC_LINKER_OPTION

We see this code uses several libraries. Some of them are:

  • swiftSwiftOnoneSupport is a support library for Swift’s “Onone” optimization level, which is typically used for debug builds.
  • String Processing provides declarative string processing APIs.
  • The Objective-C runtime system is still included by default on Apple platforms. Some features of the Swift language like dynamic dispatch and reflection use the Objective-C runtime. It’s also needed to interact with Apple frameworks written in Objective-C.

LC_BUILD_VERSION

As for the LC_BUILD_VERSION, we can take a look with vtool -show Hello.o.

% vtool -show Hello.o 
Hello.o:
Load command 1
      cmd LC_BUILD_VERSION
  cmdsize 24
 platform MACOS
    minos 15.0
      sdk 15.0
   ntools 0

LC_LOAD_DYLIB

For an executable, there are a lot of load commands, try for instance otool -l UseHello | grep LC_. While it is a lot of information, it’s not impossible to decipher, it just takes time.

It may help you see the dependencies of a program. For instance, here is the UseHello executable telling the kernel to load the dynamic loader (dyld) and then look for symbols in a number of libraries.

 % otool -l UseHello | grep -E "LC_LOAD_DYLINKER|LC_LOAD_DYLIB| name"
          cmd LC_LOAD_DYLINKER
         name /usr/lib/dyld (offset 12)
          cmd LC_LOAD_DYLIB
         name @rpath/Hello.framework/Hello (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libSystem.B.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libc++.1.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libobjc.A.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name @rpath/libswiftCore.dylib (offset 24)
          cmd LC_LOAD_DYLINKER
         name /usr/lib/dyld (offset 12)
          cmd LC_LOAD_DYLIB
         name @rpath/Hello.framework/Hello (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libSystem.B.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libc++.1.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/swift/libswiftCore.dylib (offset 24)

Conclusion

This knowledge seems arcane but it gives us the ability to peek into executables to see architectures and dependencies. I talked about universal binaries, in the next page I’ll build one from the terminal.

References

Mach-O

Several books mention Mach-O