Fortran - ARM 64-Bit Platform - Modulo Two Integers

Fortran - ARM 64-Bit Platform - Modulo Two Integers#

../../../../../../_images/mod_int_Fortran_ARM64_splash.png

Introduction#

In this section we will be disassembling simple binaries generated by the Fortran high-level language compiled for the 64-bit ARM platform.

Project code for this section is contained in my markkhusid/Disassembling-Binaries.

The program mod_int.f08#

program mod_int

        implicit none

        integer         :: a, b, c

        a = 10
        b = 4

        c = MOD(a, b)

end program mod_int

The program displays the contents of the program mod_int.f08. The program creates three integers: a, b, and c. a is assigned the value of 10, b is assigned the value of 4, and c is assigned the result of the operation MOD(a, b). MOD() is a Fortran intrinsic function that returns the remainder of the division of a by b.

The program is obviously very simple, with no inputs and outputs. The idea is to generate the binary and look at the disassembly to learn about the workings of the 64-bit ARM processor platform.

The chosen test system is my trusty Samsung Chromebook Plus V2, which has a 64-bit ARMv8 processor. This is a very convenient platform for this exercise due to its availability and ease of access from multiple remote systems via SSH.

The program is compiled with the GNU Fortran compiler, gfortran, which is available on the ARM platform. The program is compiled with debugging information using the -ggdb3 option, which allows us to debug the program in GDB and see the source code interspersed with the assembly instructions.

The program is compiled with:

$ gfortran -ggdb3 mod_int.f08 -o mod_int_Fortran_aarch64_ggdb3

For general edification, we also have gfortran produce generic assembly with the -S option, an object file with the -o option, and object dumps of the object and executable files.

The generic assembly is generated by using the -S (assembly) option:

$ gfortran -S -ggdb3 mod_int.f08 -o mod_int.s

The object file is generated by using the -c (compile) option:

$ gfortran -c -ggdb3 mod_int.s -o mod_int.o

The objdump files are generated by using the following command and options:

$ objdump -x -D -S -s -g -t mod_int.o > objdump_of_dot_o.txt
$ objdump -x -D -S -s -g -t mod_int_Fortran_aarch64_ggdb3 > objdump_of_dot_exe.txt

A rundown of the objdump options is shown here:

$objdump 
Usage: objdump <option(s)> <file(s)>
 Display information from object <file(s)>.
 At least one of the following switches must be given:
  -a, --archive-headers    Display archive header information
  -f, --file-headers       Display the contents of the overall file header
  -p, --private-headers    Display object format specific file header contents
  -P, --private=OPT,OPT... Display object format specific contents
  -h, --[section-]headers  Display the contents of the section headers
  -x, --all-headers        Display the contents of all headers
  -d, --disassemble        Display assembler contents of executable sections
  -D, --disassemble-all    Display assembler contents of all sections
      --disassemble=<sym>  Display assembler contents from <sym>
  -S, --source             Intermix source code with disassembly
      --source-comment[=<txt>] Prefix lines of source code with <txt>
  -s, --full-contents      Display the full contents of all sections requested
  -g, --debugging          Display debug information in object file
  -e, --debugging-tags     Display debug information using ctags style
  -G, --stabs              Display (in raw form) any STABS info in the file
  -W, --dwarf[a/=abbrev, A/=addr, r/=aranges, c/=cu_index, L/=decodedline,
              f/=frames, F/=frames-interp, g/=gdb_index, i/=info, o/=loc,
              m/=macro, p/=pubnames, t/=pubtypes, R/=Ranges, l/=rawline,
              s/=str, O/=str-offsets, u/=trace_abbrev, T/=trace_aranges,
              U/=trace_info]
                           Display the contents of DWARF debug sections
  -Wk,--dwarf=links        Display the contents of sections that link to
                            separate debuginfo files
  -WK,--dwarf=follow-links
                           Follow links to separate debug info files (default)
  -WN,--dwarf=no-follow-links
                           Do not follow links to separate debug info files
  -L, --process-links      Display the contents of non-debug sections in
                            separate debuginfo files.  (Implies -WK)
      --ctf[=SECTION]      Display CTF info from SECTION, (default `.ctf')
      --sframe[=SECTION]   Display SFrame info from SECTION, (default '.sframe')
  -t, --syms               Display the contents of the symbol table(s)
  -T, --dynamic-syms       Display the contents of the dynamic symbol table
  -r, --reloc              Display the relocation entries in the file
  -R, --dynamic-reloc      Display the dynamic relocation entries in the file
  @<file>                  Read options from <file>
  -v, --version            Display this program's version number
  -i, --info               List object formats and architectures supported
  -H, --help               Display this information

In our case, we want -x (all headers), -D (disassemble all), -S (display source code with assembly), -s (full contents of all sections), -g (debug info), and finally, -t (display contents of the symbol tables).

We will now disassemble this program on the 64-bit ARM platform and step through the assembly instructions.

Disassembling mod_int_Fortran_aarch64_ggdb3#

When we look at the executable’s objdump, we notice that there are two functions of interest, one is main, and the other is MAIN__. The Fortran compiler sets up the program arguments and options in main, while the actual program is contained within MAIN__ (that is capital MAIN followed by two underscores).

The following text from the executable’s objdump illustrates this:

Disassembly of section .text:

0000000000000814 <MAIN__>:
program mod_int
 814:	d10043ff 	sub	sp, sp, #0x10

        implicit none

        integer         :: a, b, c

        a = 10
 818:	52800140 	mov	w0, #0xa                   	// #10
 81c:	b9000fe0 	str	w0, [sp, #12]
        b = 4
 820:	52800080 	mov	w0, #0x4                   	// #4
 824:	b9000be0 	str	w0, [sp, #8]

        c = MOD(a, b)
 828:	b9400fe0 	ldr	w0, [sp, #12]
 82c:	b9400be1 	ldr	w1, [sp, #8]
 830:	1ac10c02 	sdiv	w2, w0, w1
 834:	b9400be1 	ldr	w1, [sp, #8]
 838:	1b017c41 	mul	w1, w2, w1
 83c:	4b010000 	sub	w0, w0, w1
 840:	b90007e0 	str	w0, [sp, #4]

end program mod_int
 844:	d503201f 	nop
 848:	910043ff 	add	sp, sp, #0x10
 84c:	d65f03c0 	ret

0000000000000850 <main>:
 850:	a9be7bfd 	stp	x29, x30, [sp, #-32]!
 854:	910003fd 	mov	x29, sp
 858:	b9001fe0 	str	w0, [sp, #28]
 85c:	f9000be1 	str	x1, [sp, #16]
 860:	f9400be1 	ldr	x1, [sp, #16]
 864:	b9401fe0 	ldr	w0, [sp, #28]
 868:	97ffffa2 	bl	6f0 <_gfortran_set_args@plt>
 86c:	90000000 	adrp	x0, 0 <_init-0x660>
 870:	9124c001 	add	x1, x0, #0x930
 874:	528000e0 	mov	w0, #0x7                   	// #7
 878:	97ffff9a 	bl	6e0 <_gfortran_set_options@plt>
 87c:	97ffffe6 	bl	814 <MAIN__>
 880:	52800000 	mov	w0, #0x0                   	// #0
 884:	a8c27bfd 	ldp	x29, x30, [sp], #32
 888:	d65f03c0 	ret
 88c:	d503201f 	nop

Explaining the Dissassembly by ChatGPT 4o#

This disassembly represents the ARM AArch64 machine code for a Fortran program that performs an integer modulus operation using MOD(a, b). In AArch64, there is no single instruction for modulo, so the operation must be implemented using multiple primitive instructions.

Let’s examine in great detail why and how the compiler synthesizes MOD(a, b) using multiple instructions.

High-Level Fortran Code#

program mod_int
  implicit none
  integer :: a, b, c

  a = 10
  b = 4
  c = MOD(a, b)
end program mod_int

Disassembly Overview#

0000000000000814 <MAIN__>:

This is the Fortran-compiled MAIN__ subroutine. Let’s go through it line by line, especially focusing on the MOD operation.

1. Stack Frame Setup#

814:	d10043ff 	sub	sp, sp, #0x10

Allocate 16 bytes on the stack for local variables a, b, and c.

2. Initialize `a = 10`#

818:	52800140 	mov	w0, #0xa           // w0 = 10
81c:	b9000fe0 	str	w0, [sp, #12]      // store a at sp+12

3. Initialize `b = 4`#

820:	52800080 	mov	w0, #0x4           // w0 = 4
824:	b9000be0 	str	w0, [sp, #8]       // store b at sp+8

4. Perform `c = MOD(a, b)`#

Here is the key section of interest:

b9400fe0 	ldr	w0, [sp, #12]      // load a into w0
82c:	b9400be1 	ldr	w1, [sp, #8]       // load b into w1
1ac10c02 	sdiv	w2, w0, w1         // w2 = a / b (integer division)
b9400be1 	ldr	w1, [sp, #8]       // reload b into w1
1b017c41 	mul	w1, w2, w1         // w1 = (a / b) * b
83c:	4b010000 	sub	w0, w0, w1         // w0 = a - (a / b) * b = MOD(a, b)
b90007e0 	str	w0, [sp, #4]       // store result in c

Step-by-step Breakdown of `MOD(a, b)`:#

The formula implemented is:

\[ \text{MOD}(a, b) = a - \left(\left\lfloor \frac{a}{b} \right\rfloor \times b\right) \]

In ARM64:

No mod or remainder instruction exists.
We must simulate the modulus using:
1. Signed integer division: sdiv
2. Multiplication: mul
3. Subtraction: sub

Why multiple operations?#

There’s no native `mod` instruction#

Unlike some architectures (e.g., x86 with idiv producing remainder), ARM AArch64 has only sdiv, which gives you:

The quotient, but not the remainder.

To compute the modulus, we must do:

\[ \text{MOD}(a, b) = a - \left(\left( \text{a div b} \right) \cdot b\right) \]

This is mathematically sound because:

\[ a = (a \div b) \cdot b + (a \mod b) \Rightarrow a \mod b = a - (a \div b) \cdot b \]

Instruction-by-instruction: `MOD(a, b)`#

Load values:#

828:	b9400fe0 	ldr	w0, [sp, #12]      // w0 = a = 10
82c:	b9400be1 	ldr	w1, [sp, #8]       // w1 = b = 4

Integer Division:#

830:	1ac10c02 	sdiv	w2, w0, w1         // w2 = w0 / w1 = 10 / 4 = 2

Signed division: result in w2 = 2

Reload `b`:#

834:	b9400be1 	ldr	w1, [sp, #8]       // w1 = b = 4

Multiply quotient × b:#

838:	1b017c41 	mul	w1, w2, w1         // w1 = 2 * 4 = 8

Subtract to get modulus:#

83c:	4b010000 	sub	w0, w0, w1         // w0 = 10 - 8 = 2

Store result:#

840:	b90007e0 	str	w0, [sp, #4]       // store c = 2

Function Epilogue#

844:	d503201f 	nop
848:	910043ff 	add	sp, sp, #0x10      // restore original stack pointer
84c:	d65f03c0 	ret                   // return

Final Stack Layout#

Offset	Variable	Value
+12	`a`	10
+8	`b`	4
+4	`c`	2

Summary: Why Multiple Instructions Are Needed for `MOD`#

Operation	ARM64 Instruction	Purpose
Integer Division	`sdiv`	Compute quotient
Multiply	`mul`	Compute `(a / b) * b`
Subtract	`sub`	Get remainder: `a - (a / b) * b`

Because ARM64 lacks a direct `mod` instruction, the compiler emulates it via:#

\[ \text{MOD}(a, b) = a - \left(\left\lfloor \frac{a}{b} \right\rfloor \times b\right) \]

Let me know if you’d like a version that works with negative numbers or Fortran’s MODULO() semantics (which differ subtly from MOD).

Explaining the Disassembly by Grok 4#

Overview of the Disassembly#

This disassembly is in ARM64 (AArch64) assembly language, representing the machine code generated from a simple Fortran program named mod_int. The Fortran source is provided inline for context:

program mod_int
    implicit none
    integer :: a, b, c
    a = 10
    b = 4
    c = MOD(a, b)
end program mod_int

This program declares three 32-bit integers (a, b, c), assigns a = 10 and b = 4, and computes c = MOD(10, 4), which should yield 2 (the remainder of 10 divided by 4). The result is stored but not used further (e.g., no printing or return value), so the program has no visible side effects when run—it simply computes and exits.

The disassembly consists of two main functions:

MAIN__ at address 0x814: This is the compiler-generated name for the Fortran program’s body (common in gfortran). It allocates stack space for the three integers, performs the assignments and modulo operation, and returns.
main at address 0x850: The C-style entry point for the executable, which handles command-line arguments, initializes the GNU Fortran (gfortran) runtime library, calls MAIN__, and exits with status 0.

This code was likely compiled with gfortran (evident from calls to _gfortran_set_args and _gfortran_set_options). It’s unoptimized, as seen in redundant loads (e.g., reloading b after division). The program uses 32-bit operations (w registers) for integers, matching Fortran’s default INTEGER type.

A key focus is the MOD(a, b) operation, which requires multiple instructions because ARM64 lacks a direct signed modulo instruction. Instead, the compiler implements it using signed division (sdiv), multiplication (mul), and subtraction (sub). This is a standard software emulation of modulo: a % b = a - ((a / b) * b), where / is integer division. This approach ensures correct handling of signed integers (e.g., negative values, though not used here), but it’s less efficient than a hypothetical hardware modulo. ARM64 does have unsigned divide-with-remainder (udiv + msub for multiply-subtract), but for signed operations like Fortran’s MOD, the compiler opts for this sequence to handle signs properly (using sdiv for signed division). If optimization were enabled (e.g., -O2), the compiler might inline constants or reduce redundancies, but here it’s straightforward.

I’ll explain each instruction in detail, including opcode, semantics, context, and binary encoding. ARM64 conventions: Registers x0-x30 (64-bit), w0-w30 (lower 32 bits); stack grows downward; ABI uses x0-x7 for params, x0 for returns, x29 for frame pointer, x30 for link register.

Detailed Explanation of `MAIN__` (Fortran Program Logic)#

This function allocates 16 bytes on the stack (for three 4-byte integers, padded for 16-byte alignment), assigns values to a and b, computes c = MOD(a, b) via the multi-step sequence, and cleans up. The nop is likely for alignment.

814: d10043ff sub sp, sp, #0x10
Binary: d1 00 43 ff
Semantics: Subtracts 0x10 (16 decimal) from the stack pointer sp, allocating 16 bytes of local stack space. Result written back to sp. ARM64 requires manual stack management; allocations must be 16-byte aligned.
Context: Function prologue. Reserves space for a at sp+12 (4 bytes), b at sp+8, c at sp+4, with padding at sp+0 for alignment. After this, sp points to the frame base.
818: 52800140 mov w0, #0xa
Binary: 52 80 01 40
Semantics: Moves immediate 0xa (10 decimal) into w0 (lower 32 bits of x0; upper bits zeroed). This is a 32-bit move immediate.
Context: Initializes a temporary for a = 10. Constants are loaded into registers before storing to memory.
81c: b9000fe0 str w0, [sp, #12]
Binary: b9 00 0f e0
Semantics: Stores 32-bit w0 (10) to memory at sp + 12. str is “store register”.
Context: Assigns to a on the stack. Stack offsets are positive from current sp.
820: 52800080 mov w0, #0x4
Binary: 52 80 00 80
Semantics: Moves 0x4 (4) into w0, overwriting previous value.
Context: Initializes temporary for b = 4.
824: b9000be0 str w0, [sp, #8]
Binary: b9 00 0b e0
Semantics: Stores w0 (4) at sp + 8.
Context: Assigns to b.
828: b9400fe0 ldr w0, [sp, #12]
Binary: b9 40 0f e0
Semantics: Loads 32-bit value from sp + 12 (10, value of a) into w0. ldr is “load register”.
Context: Loads a into w0 as the dividend for the modulo operation.
82c: b9400be1 ldr w1, [sp, #8]
Binary: b9 40 0b e1
Semantics: Loads from sp + 8 (4, value of b) into w1.
Context: Loads b into w1 as the divisor.
830: 1ac10c02 sdiv w2, w0, w1
Binary: 1a c1 0c 02
Semantics: Performs signed integer division: w2 = w0 / w1 (10 / 4 = 2). sdiv handles negative numbers by rounding toward zero (Fortran MOD semantics require this for consistency with positive/negative remainders). No remainder is produced directly.
Context: First step of modulo: Compute quotient q = a / b (stored in w2). This is necessary because ARM64 has no single-instruction signed modulo. sdiv is used over udiv (unsigned) to match Fortran’s signed integer behavior.
834: b9400be1 ldr w1, [sp, #8]
Binary: b9 40 0b e1
Semantics: Reloads b (4) from sp + 8 into w1.
Context: Prepares b for multiplication. This reload is redundant (w1 already held b before sdiv), likely due to no optimization; an optimized build might eliminate it.
838: 1b017c41 mul w1, w2, w1
Binary: 1b 01 7c 41
Semantics: Multiplies w1 = w2 * w1 (2 * 4 = 8). mul is integer multiplication, with overflow wrapping (but no issue here).
Context: Second step of modulo: Compute q * b (quotient times divisor). This reconstructs the largest multiple of b less than or equal to a (for positive values).
83c: 4b010000 sub w0, w0, w1
Binary: 4b 01 00 00
Semantics: Subtracts w0 = w0 - w1 (10 - 8 = 2). Integer subtraction without borrow flags checked.
Context: Final step of modulo: Remainder r = a - (q * b). This yields the modulo result (2). The multi-step process is needed because:
- ARM64’s integer division instructions (sdiv, udiv) only produce the quotient, not the remainder.
- For signed modulo, the compiler must ensure the remainder has the sign of the dividend (Fortran MOD rule: remainder same sign as a), which sdiv + mul + sub achieves.
- Hardware support for remainder exists in some forms (e.g., msub for multiply-subtract in unsigned contexts), but for signed ops, this sequence is portable and safe.
- Alternatives like bit manipulation or loops aren’t efficient for general cases. If b were a power of 2, the compiler might optimize to AND (e.g., MOD(a, 4) = a & 3), but here b=4 is constant—unoptimized code doesn’t exploit that.
- Edge cases (e.g., b=0 causes divide-by-zero trap in sdiv; negative a or b) are handled implicitly by sdiv’s signed semantics.
840: b90007e0 str w0, [sp, #4]
Binary: b9 00 07 e0
Semantics: Stores w0 (2) at sp + 4.
Context: Assigns result to c. Stored but unused—Fortran doesn’t require using variables.
844: d503201f nop
Binary: d5 03 20 1f
Semantics: No operation—advances program counter without effect. Often for alignment (instructions 4-byte aligned) or padding.
Context: Compiler-inserted; no functional role, possibly after inlining checks.
848: 910043ff add sp, sp, #0x10
Binary: 91 00 43 ff
Semantics: Adds 0x10 to sp, deallocating the frame.
Context: Epilogue—restores caller’s stack.
84c: d65f03c0 ret
Binary: d6 5f 03 c0
Semantics: Returns by jumping to x30 (link register). No explicit return value.
Context: Ends MAIN__.

Detailed Explanation of `main` (Program Entry Point)#

This is the runtime wrapper for Fortran executables. It processes args, sets up gfortran, calls MAIN__, and exits. Similar to the previous disassembly, with minor address differences.

850: a9be7bfd stp x29, x30, [sp, #-32]!
Semantics: Allocates 32 bytes (pre-index subtract), stores frame pointer x29 and link x30 at new sp.
Context: Prologue—saves caller state, allocates for locals (argc at +28, argv at +16).
854: 910003fd mov x29, sp
Semantics: Sets x29 to current sp for frame-relative access.
Context: ABI frame pointer setup.
858: b9001fe0 str w0, [sp, #28]
Semantics: Stores w0 (argc) locally.
Context: Saves argument count.
85c: f9000be1 str x1, [sp, #16]
Semantics: Stores x1 (argv) locally.
Context: Saves argument vector.
860: f9400be1 ldr x1, [sp, #16]
Semantics: Loads argv into x1 for call.
Context: Prepares second arg for runtime.
864: b9401fe0 ldr w0, [sp, #28]
Semantics: Loads argc into w0.
Context: Prepares first arg.
868: 97ffffa2 bl 6f0 <_gfortran_set_args@plt>
Semantics: Calls gfortran arg setup via PLT (dynamic link).
Context: Passes args to Fortran runtime for handling (e.g., if program used COMMAND_ARGUMENT_COUNT).
86c: 90000000 adrp x0, 0
Semantics: Loads page base (0 here) into x0.
Context: Prepares for options address computation.
870: 9124c001 add x1, x0, #0x930
Semantics: Adds 0x930 (2352) to x0, result in x1 (points to options data).
Context: Address of gfortran options array.
874: 528000e0 mov w0, #0x7
Semantics: Moves 7 into w0 (likely option count).
Context: Arg for options call.
878: 97ffff9a bl 6e0 <_gfortran_set_options@plt>
Semantics: Calls gfortran options setup.
Context: Configures runtime (e.g., error modes, precision).
87c: 97ffffe6 bl 814 <MAIN__>
Semantics: Calls Fortran body.
Context: Executes the program.
880: 52800000 mov w0, #0x0
Semantics: Sets w0 to 0 (exit success).
Context: Prepares exit code.
884: a8c27bfd ldp x29, x30, [sp], #32
Semantics: Loads x29/x30, deallocates 32 bytes (post-index add).
Context: Epilogue—restores state.
888: d65f03c0 ret
Semantics: Returns to OS with code 0.
Context: Program end.
88c: d503201f nop
Semantics: Padding nop, possibly for section alignment.

Overall, the program computes but discards the modulo result. Running it produces no output. The multi-op modulo is a compiler choice for correctness and portability on ARM64.