(转载自original version by Alfredo Correa)
FFTW3 is a library designed to compute discrete Fourier transforms. As described in the the official FFTW site, there are various versions available, with different features and different levels of maturity. In this tutorial I deal with the installation of version 3, including the experimental MPI version. However the installation instructions seems to be valid (although not tested) also for the more popular version 2.
Contents
[hide]
3 Compilation with Shared Libraries
4.2.1 Important notes about MPI-FFTW3 memory distribution (do not skip)
General Remarks
As usual we would like to install the libraries in the user space, so we will create a couple of directories for that purpose:
mkdir $HOME/usr
mkdir $HOME/soft
To install FFTW3, download the package from the FFTW3 download page and decompress it:
cd ~/soft
wgethttp://www.fftw.org/fftw-3.3alpha1.tar.gz
tar -zxvf fftw-3.3alpha1.tar.gz
cd fftw-3.3alpha1
Ubuntu only: If you want to install FFTW3 (serial version) in your local Ubuntu you can skip this installation section altogether and just run:
sudo apt-get install libfftw3-dev libfftw3-doc
However the MPI version (e.g. for testing) will not be available. If you want to have the MPI version follow the instructions in the other sections.
Build and Install
The configure/make/install procedure works well for installation in wcr.stanford.edu. We have the option of building and using static or shared libraries. If you are going to use shared libraries read last section.
Serial version only
Then configure, make and install:
./configure --prefix=$HOME/usr --enable-shared=yes
make --jobs=8
make install
(1 minute). The following files will be installed in:
~/usr/include/fftw3.h
~/usr/include/fftw3.f
~/usr/lib/libfftw3.a
~/usr/lib/libfftw3.la
~/usr/lib/libfftw3.so
The typical compilation options will be
export LD_RUN_PATH=$HOME/usr/lib #do this once *before* compiling
cc -I$HOME/usr/include program.c -L$HOME/usr/lib -lfftw3 -o program
The official tutorial on the usage of FFTW3 (which is different from FFTW 2) is located here.
MPI version
To install the experimental MPI version of FFTW3, make sure you downloaded fftw-3.3alpha1 (and not fftw-3.2 for example). Also make sure that there is an MPI compiler available:
$which mpicc
/usr/bin/mpicc
If it is not available, you can choose one with the command 'mpi-selector-menu' in wcr. I tested this with the 'openmpi_gcc-1.2.2' compiler. If that is not possible then set the variable MPICC, for example: export MPICC=$HOME/usr/bin/mpicc.
Do the same procedure of downloading the file and decompressing it, but add the --enable-mpi flag:
./configure --prefix=$HOME/usr --enable-mpi --enable-shared=yes
make install
now the library will be installed in your home directory, besides the files mentioned above, you will find also:
~/usr/include/fftw3-mpi.h
~/usr/lib/libfftw3_mpi.a
~/usr/lib/libfftw3_mpi.la
~/usr/lib/libfftw3_mpi.so
The typical command line for compilation will be
export LD_RUN_PATH=$HOME/usr/lib #do this *before* compiling
mpicc -I$HOME/usr mpi_program.c -L$HOME/usr/lib -lfftw3_mpi -lfftw3 -o mpi_program
Make sure to link *first* to fftw3_mpi and *later* to fftw3. For some MPI platforms (notably openmpi-gcc) setting LD_RUN_PATH does not do the job of storing the library path inside the executables (see note in previous section), it may be necessary to use the following command:
mpicc -Wl,-rpath=$HOME/usr/lib -I$HOME/usr mpi_program.c -L$HOME/usr/lib -lfftw3_mpi -lfftw3 -o mpi_program
In any case we should always check that the executable is properly linked by doing
ldd ./mpi_program
and checking that all shared libraries are "found".
The official tutorial for the MPI version of FFTW3 can be found here.
Compilation with Shared Libraries
In the command line compilation examples above I set the variables LD_RUN_PATH. Using LD_RUN_PATH saves us from having to set path variables before *running* the program, such as LD_LIBRARY_PATH (which is a bad practice). When LD_RUN_PATH is set before compilation, the created executable will store the search path of the shared library internally (but will not enforce it). I learned this trick from http://gcc.gnu.org/faq.html#rpath and it works well with gcc at least. Setting this variable before compilation can be annoying, but is better than having to set variables *each time we use* the executable. This seems to be the only good option left when using libraries installed in the home directory (does anybody know a better alternative?).
Update on LD_RUN_PATH: It seems that the trick of setting this variable does not work with mpicxx, at least with the openmpi implementation. A more general way to store the path of the libraries in the executable seems to be to use the option '-Wl,-rpath=$HOME/usr/lib' (sic); as described here.
Another option to avoid this issues all together is to use the -static option when compiling to ensure that the static fftw3 library files are embedded in the executable.原网页
Important notes about MPI-FFTW3 memory distribution (do not skip)
Although the resulting distribution of the matrix and allocated memory seems to very reasonable in the previous example, this is only because of the simple dimensions and process number used in the example. There are some characteristics of the memory distribution for which the previous example can be misleading, and it is worth noting them:
Only FFTW (via fftw_mpi_local_size_2d) knows how to split the data and allocated memory size among processes, so don't try to predict the splitting yourself.
The splitting of the global matrix is only done over rows (first dimension, for N>2-dimensional arrays).
1D FFT's usage is totally different, because the splitting depends on the type of transform to be performed (e.g. in-place, out-of-place).
The splitting of the matrix can be uneven (for example, #columns is not divisible by #processes).
Some processes can be assigned zero columns (for example if there are more processes than columns). Even in this case, the allocated memory in that process can be different from zero! (in general, it is 1*sizeof(complex)).
The logical size of the submatrix (local matrix) can be different from the local allocated memory, i.e. local_n0*N1 <= alloc_local. The outputs of local_size are not redundant. FFTW uses the (small) extra memory as a scratch space.