Just put all of those files somewhere in your Matlab path.

]]>A few people commented recently that they were unable to compile the NFFT library for 64-bit Windows and Matlab, following my previous instructions. It has been a while since I’ve actually used these libraries on windows, so I wanted to see if my instructions were still valid. Here are the results of my attempts, with the latest versions I could find.

**Step 1:**

You need a suitable compilation environment. I wanted native 64 bit binaries, so my only real choice was to use the 64 bit mingw-32 compiler. I chose to use the 64-bit cygwin environment to ease the setup of this compiler. Download the cygwin setup-x86_64.exe from the cygwin homepage.

Run the setup.exe file and when you come to select packages, check at least make and mingw64-x86_64-gcc-core under the devel section.

**Step 2:**

Open the cygwin terminal. Now we need to obtain the fftw3 library. The FFTW project provides precompied windows binaries, and the best choice here is to use these. Go to the FFTW windows download page, and select the download link for the 64 bit binaries.

As we are using cygwin and have a terminal open I ran

$ wget ftp://ftp.fftw.org/pub/fftw/fftw-3.3.3-dll64.zip

to download the zip file into my cygwin home directory, and

$ unzip -d fftw-3.3.3 fftw-3.3.3-dll64.zip

to unzip it into a subdirectory.

The final stage is to rename the fftw dll so that it can be found and properly linked. Simply change into your fftw directory and rename or copy libfftw3-3.dll to libfftw3.dll

$ cd fftw-3.3.3

cp libfftw3-3.dll libfftw3.dll

cd ..

cp libfftw3-3.dll libfftw3.dll

cd ..

**Step 3:**

Obtain the NFFT source code. The latest version as of November 2013 is 3.2.3 and is available from the NFFT download page.

I ran the following to download and unzip the source:

$ wget http://www-user.tu-chemnitz.de/~potts/nfft/download/nfft-3.2.3.tar.gz

tar zxvf nfft-3.2.3.tar.gz

tar zxvf nfft-3.2.3.tar.gz

Now cd into nfft-3.2.3 and configure the build. The options I chose worked well for me, and they assume that your matlab installation is located in C:\MATLAB\R2013a, this needs to be changed for different releases and installation locations (nb. if your matlab folder is of the form C:\Program Files\MATLAB\R2013a, then the space in the path may cause problems, you can try making a windows symlink between C:\Program Files\MATLAB and C:\MATLAB, to do this open an elevated command prompt, i.e. Start Menu, Type ‘cmd’ and then right click on cmd.exe and Run as administrator. Then run ‘mklink /D C:\MATLAB”C:\Program Files\MATLAB”‘)

So I ran the following command in the cygwin terminal: (replace jpowell with your username, I didn’t use ~/ style as this seemed to cause problems when building)

$ cd nfft-3.2.3

export LIBS=-lws2_32

./configure --host=x86_64-w64-mingw32 --enable-all --enable-openmp --with-fftw3-libdir=/cygdrive/c/Users/jpowell/fftw-3.3.3/ --with-fftw3-includedir=/cygdrive/c/Users/jpowell/fftw-3.3.3/ --with-matlab=/cygdrive/c/MATLAB/R2013a/ --with-matlab-arch=win64 --with-matlab-fftw3-libdir=/cygdrive/c/Users/jpowell/fftw-3.3.3/

export LIBS=-lws2_32

./configure --host=x86_64-w64-mingw32 --enable-all --enable-openmp --with-fftw3-libdir=/cygdrive/c/Users/jpowell/fftw-3.3.3/ --with-fftw3-includedir=/cygdrive/c/Users/jpowell/fftw-3.3.3/ --with-matlab=/cygdrive/c/MATLAB/R2013a/ --with-matlab-arch=win64 --with-matlab-fftw3-libdir=/cygdrive/c/Users/jpowell/fftw-3.3.3/

The critical command here is –host=x86_64-w64-mingw32 this configures the compilation for a mingw-w64 cross compile, and produces binaries that are not linked to cygwin. If the configure script runs without errors you can type

$ make

to compile.

**Step 4:**

The final step for me was to get the compiled binaries working in my Matlab environment. For me this involved copying the nfft helper functions in C:\Users\jpowell\nfft-3.2.3\matlab\nfft into a subdirectory in my matlab folder. To simplify things I copied the libnfft.mexw64 file from C:\Users\jpowell\nfft-3.2.3\matlab\nfft\.libs into the same directory as the helper functions and then renamed it to nfftmex.mexw64.

This nfftmex binary relies on a couple of other dynamic libraries, including libfftw3-3.dll so I copied that into the same directory from C:\Users\jpowell\fftw-3.3.3. I also need some additional mingw libraries, libgomp-1.dll, libgcc_s_seh-1.dll and libwinpthread-1.dll, which can be found in your cygwin install at C:\cygwin64\usr\x86_64-w64-mingw32\sys-root\mingw\bin

If you still have other dependancies causing Matlab to not recognise the mex file as valid, you can try using Dependancy Walker to see track these down.

With all of those installed I can confirm everything is working by running the matlab command nfft_get_num_threads to show that the nfft library is working and taking advantage of all the available threads of my processor.

Let me know if this helps, or if you encounter any problems following the above instructions. Hopefully the changes to make for a 32 bit system or different version of matlab will be easy enough to work out.

]]>It was a bit of a pain, but I finally succeeded in getting some working binaries up and running. Maybe this will save someone else some time and pain.

**Step 1:**

You need a suitable compilation environment. I wanted native 64 bit binaries, so my only real choice was to use the 64 bit mingw-32 compiler. I chose to use the cygwin environment to ease the setup of this compiler. Download the cygwin setup.exe from the cygwin homepage.

Run the setup.exe file and when you come to select packages, check at least make and mingw64-x86_64-gcc-core under the devel section.

**Step 2:**

Open the cygwin terminal. Now we need to obtain the fftw3 library. The FFTW project provides precompied windows binaries, and the best choice here is to use these. Go to the FFTW windows download page, and select the download link for the 64 bit binaries.

As we are using cygwin and have a terminal open I ran

$ wget ftp://ftp.fftw.org/pub/fftw/fftw-3.3.2-dll64.zip

to download the zip file into my cygwin home directory, and

$ unzip -d fftw-3.3.2 fftw-3.3.2-dll64.zip

to unzip it into a subdirectory.

The final stage is to rename the fftw dll so that it can be found and properly linked. Simply change into your fftw directory and rename or copy libfftw3-3.dll to libfftw3.dll

$ cd fftw-3.3.2

cp libfftw3-3.dll libfftw3.dll

cd ..

cp libfftw3-3.dll libfftw3.dll

cd ..

**Step 3:**

Obtain the NFFT source code. The latest version as of July 2012 is 3.2.0 and is available from the NFFT download page.

I ran the following to download and unzip the source:

$ wget http://www-user.tu-chemnitz.de/~potts/nfft/download/nfft-3.2.0.tar.gz

tar zxvf nfft-3.2.0.tar.gz

tar zxvf nfft-3.2.0.tar.gz

Now cd into nfft-3.2.0 and configure the build. The options I chose worked well for me, and they assume that your matlab installation is located in C:\matlab\R2012a, this needs to be changed for different releases and installation locations (nb. My matlab directory was actually C:\Program Files\MATLAB\R2012a, but this caused problems for the nfft build, so I made a windows symlink between C:\Program Files\MATLAB and C:\matlab, to do this open an elevated command prompt, i.e. Start Menu, Type ‘cmd’ and then right click on cmd.exe and Run as administrator. Then run ‘mklink /D C:\matlab “C:\Program Files\MATLAB”‘)

So I ran the following command in the cygwin terminal: (replace jpowell with your username, I didn’t use ~/ style as this seemed to cause problems when building)

$ cd nfft-3.2.0

./configure --host=x86_64-w64-mingw32 --enable-all -enable-openmp --with-fftw3-libdir=/home/jpowell/fftw-3.3.2/ --with-fftw3-includedir=/home/jpowell/fftw-3.3.2/ --with-matlab=/cygdrive/c/matlab/R2012a/ --with-matlab-arch=win64 --with-matlab-fftw3-libdir=/home/jpowell/fftw-3.3.2/

./configure --host=x86_64-w64-mingw32 --enable-all -enable-openmp --with-fftw3-libdir=/home/jpowell/fftw-3.3.2/ --with-fftw3-includedir=/home/jpowell/fftw-3.3.2/ --with-matlab=/cygdrive/c/matlab/R2012a/ --with-matlab-arch=win64 --with-matlab-fftw3-libdir=/home/jpowell/fftw-3.3.2/

The critical command here is –host=x86_64-w64-mingw32 this configures the compilation for a mingw-w64 cross compile, and produces binaries that are not linked to cygwin. If the configure script runs without errors you can type

$ make

to compile.

**Step 4:**

The final step for me was to get the compiled binaries working in my Matlab environment. For me this involved copying the nfft helper functions in C:\cygwin\home\jpowell\nfft-3.2.0\matlab\nfft into a subdirectory in my matlab folder. To simplify things I copied the libnfft.mexw64 file from C:\cygwin\home\jpowell\nfft-3.2.0\matlab\nfft\.libs into the same directory as the helper functions and then renamed it to nfftmex.mexw64.

This nfftmex binary relies on a couple of other dynamic libraries, including libfftw3-3.dll so I copied that into the same directory from C:\cygwin\home\jpowell\fftw-3.3.2. I also need some additional mingw libraries, libgomp-1.dll and pthreadGC2.dll, which can be found in your cygwin install at C:\cygwin\usr\x86_64-w64-mingw32\sys-root\mingw\bin

With all of those installed I can confirm everything is working by running the matlab command nfft_get_num_threads to show that the nfft library is working and taking advantage of all the available threads of my processor.

Let me know if this helps, or if you encounter any problems following the above instructions. Hopefully the changes to make for a 32 bit system or different version of matlab will be easy enough to work out.

]]>factorial.c

unsigned int factorial(unsigned int n){ if (n == 0) return 1; if (n == 1) return 1; unsigned int result = 2; while (n > 2){ result *= n; n--; } return result; }

From Wikipedia we get the following definition for the factorial: “In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n.” With the important identification of 0! = 1, this becomes clear if you consider the answer to the question “How many permutations of n unique objects are there?” to be n!, there is only one permutation of no objects.

Anyway, back to the assembly. I compile the C source code into assembly with the -S option to gcc, I specify I would like intel syntax rather than the AT&T syntax, and I also enable the O2 optimisation level (this eliminates many superfluous x86 instructions and assembly labels).

% gcc -S -masm=intel -O2 factorial.c -o factorial.asm

Now located in factorial.asm (compiled with gcc 4.5.2 on 32 bit Ubuntu 11.04) I have the following code:

.file "factorial.c" .intel_syntax noprefix .text .p2align 4,,15 .globl factorial .type factorial, @function factorial: push ebp mov eax, 1 mov ebp, esp mov edx, DWORD PTR [ebp+8] test edx, edx je .L2 cmp edx, 1 je .L2 cmp edx, 2 mov al, 2 jbe .L2 .p2align 4,,7 .p2align 3 .L3: imul eax, edx sub edx, 1 cmp edx, 2 jne .L3 .L2: pop ebp ret .size factorial, .-factorial .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits

Starting at the function .factorial, and slightly cleaning up to more closely correspond with nasm syntax, we have:

push ebp mov eax, 1 mov ebp, esp mov edx, DWORD [ebp+8]

The C ABI demands that the contents of the ebp, ebx, esi, & edi registers must be preserved. GCC has not generated any code that affects the ebx, esi or edi registers, so the first instruction simply pushes ebp onto the stack to restore later. Next eax is set to 1, and the stack pointer is copied to ebp. The fourth instruction had me confused for a while. This seems to be copying the 3rd 32 bit quantity on the stack (ebp = esp currently) into edx. I thought the stack would look something like:

…

0x000000A0 0x????????

And so we would reference our factorial input with mov edx, DWORD [ebp+4]. A quick check in a debugger shows that when the factorial function is called, the stack looks like

…

0x000000A0 0x???????? 0x????????

Ah ha, it all makes sense. Of course, the ret instruction needs to know where in the code to return to, so the address of the instruction after factorial was called is pushed onto the stack, and mov edx, DWORD [epb+8] correctly copies the input to our factorial function into eax. Next we have:

test edx, edx je .L2

I didn’t recognise the test instruction, so it’s time to check some references. Unfortunately it wasn’t covered in Step by Step Assembly, so I ventured further afield. The Nasm Documentation contained the following description “TEST performs a `mental’ bitwise AND of its two operands, and affects the flags as if the operation had taken place, but does not store the result of the operation anywhere.” Okay, so this instruction is performing edx AND edx and updating eflags, but preserving edx. JE (Jump if Equal) which is identical to JZ (Jump if Zero) performs a jump if the zero flag, ZF = 1. So the purpose of the test edx, edx instruction is to see if edx is equal to 0. GCC has an interesting way of optimising “if (n == 0)”. Looking at the unoptimised version GCC uses a CMP instruction between DWORD [ebp+8] and 0.

So if edx is zero, t code jumps to .L2, let’s analyse that section:

.L2: pop ebp ret

Ah, not complicated at all, simply restore the stack frame and return from the function. This implies the return value is already stored in eax. And sure enough the first thing this function did to eax was to set it to 1. Okay, so if edx was not equal to one, the conditional jump would not have been triggered, and we’d now be at the following bit of code.

cmp edx, 1 je .L2 cmp edx, 2 mov al, 2 jbe .L2

Here GCC is using CMP to see if edx is equal to 1, if yes we are done, eax already contains the correct value 1! = 1, so jump to .L2 and return. Next, compare edx to 2. Now before the conditional jump we have the instruction mov al, 2. Here we are setting the lower 8 bits of eax to 2. GCC knows that at this point eax has no bits higher than bit 8 set, so it can use the al sub register to update eax. JBE (Jump if below or equal) jumps if the zero flag or the carry flag are set to 1. What is CMP a,b actually doing internally, it performs a subtraction between a and b and sets the flags but doesn’t store the result. CMP sets the zero flag is the result of the subtraction is zero, and sets the carry flag if the result is negative. So JBE will jump if edx less than or equal to 2, precisely what you would expect below or equal to mean. For our purposes we are only interested in whether edx is equal to 2, but we are using unsigned integers and have already accounted for the cases edx == 0 and edx == 1, so this assembly is perfectly valid if not a particularly literal translation of the C. Again if edx is equal to 2 then the code jumps to .L2 and exits with the correct value 2! = 2 in eax.

.L3: imul eax, edx sub edx, 1 cmp edx, 2 jne .L3

This next bit of code is the main loop of the function. the first line multiplies eax by edx and stores the result in eax, then we subtract one from edx, then if edx is not equal to 2 we repeat the whole procedure. Why does this work? Well edx must contain at least 3, because it is an unsigned 32 bit integer and it is not 0, 1 or 2. So the first run of the loop multiplies eax by edx and subtracts one from edx, if edx is equal to 2 now then eax = 2 * 3 = 3!, if edx is not equal to 2, for example of edx was originally 7, then eax = 2*7, so the loop continues until eax = 2*3*4*5*6*7 = 7!. Once edx is equal to 2, then eax contains n! and the code falls through to .L2 and the function returns.

Of course this function (like it’s C counterpart) is subject to integer overflow. n! rises very quickly as n increases, and this function is only capable of calculating correctly the factorials of 0 to 12, 13! = 6227020800 which is larger than 4294967295 (the largest 32-bit unsigned integer). Moving to 64 bit unsigned integers would only allow you to correctly calculate up to 20!. In fact it is a nice problem in basic functional analysis to show that the factorial function n! always grows faster than any exponential function (i.e. one of the form 2^x).

]]>