I have been learning some x86 assembly recently, and one of the suggestions in Jeff Dunteman’s excellent book, Assembly Language: Step by Step is to look at the asm output of a compiler such as GCC. I decided to start with a basic mathematical function, the factorial. Here is my C code:

factorial.c

unsigned int factorial(unsigned int n){ if (n == 0) return 1; if (n == 1) return 1; unsigned int result = 2; while (n > 2){ result *= n; n--; } return result; }

From Wikipedia we get the following definition for the factorial: “In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n.” With the important identification of 0! = 1, this becomes clear if you consider the answer to the question “How many permutations of n unique objects are there?” to be n!, there is only one permutation of no objects.

Anyway, back to the assembly. I compile the C source code into assembly with the -S option to gcc, I specify I would like intel syntax rather than the AT&T syntax, and I also enable the O2 optimisation level (this eliminates many superfluous x86 instructions and assembly labels).

% gcc -S -masm=intel -O2 factorial.c -o factorial.asm

Now located in factorial.asm (compiled with gcc 4.5.2 on 32 bit Ubuntu 11.04) I have the following code:

.file "factorial.c" .intel_syntax noprefix .text .p2align 4,,15 .globl factorial .type factorial, @function factorial: push ebp mov eax, 1 mov ebp, esp mov edx, DWORD PTR [ebp+8] test edx, edx je .L2 cmp edx, 1 je .L2 cmp edx, 2 mov al, 2 jbe .L2 .p2align 4,,7 .p2align 3 .L3: imul eax, edx sub edx, 1 cmp edx, 2 jne .L3 .L2: pop ebp ret .size factorial, .-factorial .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits

Starting at the function .factorial, and slightly cleaning up to more closely correspond with nasm syntax, we have:

push ebp mov eax, 1 mov ebp, esp mov edx, DWORD [ebp+8]

The C ABI demands that the contents of the ebp, ebx, esi, & edi registers must be preserved. GCC has not generated any code that affects the ebx, esi or edi registers, so the first instruction simply pushes ebp onto the stack to restore later. Next eax is set to 1, and the stack pointer is copied to ebp. The fourth instruction had me confused for a while. This seems to be copying the 3rd 32 bit quantity on the stack (ebp = esp currently) into edx. I thought the stack would look something like:

…

0x000000A0 0x????????

And so we would reference our factorial input with mov edx, DWORD [ebp+4]. A quick check in a debugger shows that when the factorial function is called, the stack looks like

…

0x000000A0 0x???????? 0x????????

Ah ha, it all makes sense. Of course, the ret instruction needs to know where in the code to return to, so the address of the instruction after factorial was called is pushed onto the stack, and mov edx, DWORD [epb+8] correctly copies the input to our factorial function into eax. Next we have:

test edx, edx je .L2

I didn’t recognise the test instruction, so it’s time to check some references. Unfortunately it wasn’t covered in Step by Step Assembly, so I ventured further afield. The Nasm Documentation contained the following description “TEST performs a `mental’ bitwise AND of its two operands, and affects the flags as if the operation had taken place, but does not store the result of the operation anywhere.” Okay, so this instruction is performing edx AND edx and updating eflags, but preserving edx. JE (Jump if Equal) which is identical to JZ (Jump if Zero) performs a jump if the zero flag, ZF = 1. So the purpose of the test edx, edx instruction is to see if edx is equal to 0. GCC has an interesting way of optimising “if (n == 0)”. Looking at the unoptimised version GCC uses a CMP instruction between DWORD [ebp+8] and 0.

So if edx is zero, t code jumps to .L2, let’s analyse that section:

.L2: pop ebp ret

Ah, not complicated at all, simply restore the stack frame and return from the function. This implies the return value is already stored in eax. And sure enough the first thing this function did to eax was to set it to 1. Okay, so if edx was not equal to one, the conditional jump would not have been triggered, and we’d now be at the following bit of code.

cmp edx, 1 je .L2 cmp edx, 2 mov al, 2 jbe .L2

Here GCC is using CMP to see if edx is equal to 1, if yes we are done, eax already contains the correct value 1! = 1, so jump to .L2 and return. Next, compare edx to 2. Now before the conditional jump we have the instruction mov al, 2. Here we are setting the lower 8 bits of eax to 2. GCC knows that at this point eax has no bits higher than bit 8 set, so it can use the al sub register to update eax. JBE (Jump if below or equal) jumps if the zero flag or the carry flag are set to 1. What is CMP a,b actually doing internally, it performs a subtraction between a and b and sets the flags but doesn’t store the result. CMP sets the zero flag is the result of the subtraction is zero, and sets the carry flag if the result is negative. So JBE will jump if edx less than or equal to 2, precisely what you would expect below or equal to mean. For our purposes we are only interested in whether edx is equal to 2, but we are using unsigned integers and have already accounted for the cases edx == 0 and edx == 1, so this assembly is perfectly valid if not a particularly literal translation of the C. Again if edx is equal to 2 then the code jumps to .L2 and exits with the correct value 2! = 2 in eax.

.L3: imul eax, edx sub edx, 1 cmp edx, 2 jne .L3

This next bit of code is the main loop of the function. the first line multiplies eax by edx and stores the result in eax, then we subtract one from edx, then if edx is not equal to 2 we repeat the whole procedure. Why does this work? Well edx must contain at least 3, because it is an unsigned 32 bit integer and it is not 0, 1 or 2. So the first run of the loop multiplies eax by edx and subtracts one from edx, if edx is equal to 2 now then eax = 2 * 3 = 3!, if edx is not equal to 2, for example of edx was originally 7, then eax = 2*7, so the loop continues until eax = 2*3*4*5*6*7 = 7!. Once edx is equal to 2, then eax contains n! and the code falls through to .L2 and the function returns.

Of course this function (like it’s C counterpart) is subject to integer overflow. n! rises very quickly as n increases, and this function is only capable of calculating correctly the factorials of 0 to 12, 13! = 6227020800 which is larger than 4294967295 (the largest 32-bit unsigned integer). Moving to 64 bit unsigned integers would only allow you to correctly calculate up to 20!. In fact it is a nice problem in basic functional analysis to show that the factorial function n! always grows faster than any exponential function (i.e. one of the form 2^x).