How to compile SFMT

This document explains how to compile SFMT19937 for users who are using UNIX like systems (for example Linux, Free BSD, cygwin, osx, etc) on terminal. I can't help those who use IDE (Integrated Development Environment,) please see your IDE's help to use SIMD feature of your CPU.

1. First Step: Compile test programs using Makefile.

1-1. Compile standard C test program.

Check if SFMT19937.c and Makefile are in your current directory. If not, cd to the directory where they exist. Then, type

make std

If it causes an error, try to type

cc -o test-std32 test32.c

or try to type

gcc -o test-std32 test32.c

If success, then check the test program. Type

./test-std32

You will see many random numbers displayed on your screen. If you want to check these random numbers are correct output, redirect output to a file and diff it with sfmt19937-32.out.txt, like this:

./test-std32 > foo.txt
diff -w foo.txt sfmt19937-32.out.txt

Silence means they are the same because diff reports the diffrence of two file.

If you want to know the generation speed of SFMT, type

./test-std32 -s

It is very slow. To make it fast, compile it with -O3 option. If your compiler is gcc, you should specify -fno-strict-aliasing option with -O3. type

gcc -O3 -fno-strict-aliasing -o test-std32 test32.c
./test-std32 -s

1-2. Compile SSE2 test program.

If your CPU supports SSE2 and you can use gcc version 3.4 or later, you can make test-sse32. To do this, type

make sse2

or type

gcc -O3 -msse2 -fno-strict-aliasing -DSSE2=1 -o test-sse32 test32.c

If everything works well,

./test-sse32 -s

shows much shorter time than test-std32 -s.

1-3. Compile AltiVec test program.

If you are using Macintosh computer with PowerPC G4 or G5, and your gcc version is later 3.3, you can make test-alti32. To do this, type

make alti

or type

gcc -O3 -faltivec -fno-strict-aliasing -DALTIVEC=1 -o test-alti32 test32.c

If everything works well,

./test-alti32 -s

shows much shorter time than test-std32 -s.

1-4. Compile and check output automatically.

To make test program and check 32-bit and 64-bit output automatically, type

make std-check

To check test program optimized for SSE2, type

make sse2-check

To check test program optimized for AltiVec, type

make alti-check

2. Second Step: Use SFMT pseudorandom number generator with your C program.

2-1. Use sequential call and static link.

Here is a very simple program sample1.c which calculates PI using Monte-Carlo method.

#include <stdio.h>
#include <stdlib.h>
#include "sfmt19937.h"

int main(int argc, char* argv[]) {
    int i, cnt, seed;
    double x, y, pi;
    const int NUM = 10000;

    if (argc >= 2) {
	seed = strtol(argv[1], NULL, 10);
    } else {
	seed = 12345;
    }
    cnt = 0;
    init_gen_rand(seed);
    for (i = 0; i < NUM; i++) {
	x = genrand_res53();
	y = genrand_res53();
	if (x * x + y * y < 1.0) {
	    cnt++;
	}
    }
    pi = (double)cnt / NUM * 4;
    printf("%lf\n", pi);
    return 0;
}
      

To compile sample1.c with sfmt19937.c, type

gcc -o sample1 sfmt19937.c sample1.c

If your CPU supports SSE2 and you want to use optimized SFMT for SSE2, type

gcc -msse2 -o sample1 sfmt19937-sse2.c sample1.c

If your CPU supports AltiVec and you want to use optimized SFMT for AltiVec, type

gcc -faltivec -o sample1 sfmt19937-alti32.c sample1.c

2-2. Use block call and static link.

Here is sample2.c which modifies sample1.c. The block call fill_array64 is much faster than sequential call, but it needs an aligned memory. The standard function to get an alligned memory is posix_memalign, but it isn't usable in every OS.

#include <stdio.h>
#if !defined(_POSIX_C_SOURCE)
#include <malloc.h>
#endif
#define _XOPEN_SOURCE 600
#include <stdlib.h>
#include "sfmt19937.h"

int main(int argc, char* argv[]) {
    int i, j, cnt, seed;
    double x, y, pi;
    const int NUM = 10000;
    const int R_SIZE = 2 * NUM;
    uint64_t *array;

    if (argc >= 2) {
	seed = strtol(argv[1], NULL, 10);
    } else {
	seed = 12345;
    }
#if defined(__APPLE__)
    printf("malloc used\n");
    array = malloc(sizeof(uint64_t) * R_SIZE);
    if (array == NULL) {
	printf("can't allocate memory.\n");
	return 1;
    }
#elif defined(_POSIX_C_SOURCE)
    printf("posix_memalign used\n");
    if (posix_memalign((void **)&array, 16, sizeof(uint64_t) * R_SIZE) != 0) {
	printf("can't allocate memory.\n");
	return 1;
    }
#elif defined(__GNUC__) && (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3))
    printf("memalign used\n");
    array = memalign(16, sizeof(uint64_t) * R_SIZE);
    if (array == NULL) {
	printf("can't allocate memory.\n");
	return 1;
    }
#else /* in this case, gcc doesn't suppport SSE2 */
    printf("malloc used\n");
    array = malloc(sizeof(uint64_t) * R_SIZE);
    if (array == NULL) {
	printf("can't allocate memory.\n");
	return 1;
    }
#endif
    cnt = 0;
    j = 0;
    init_gen_rand(seed);
    fill_array64(array, R_SIZE);
    for (i = 0; i < NUM; i++) {
	x = to_res53(array[j++]);
	y = to_res53(array[j++]);
	if (x * x + y * y < 1.0) {
	    cnt++;
	}
    }
    free(array);
    pi = (double)cnt / NUM * 4;
    printf("%lf\n", pi);
    return 0;
}
      

To compile sample2.c with sfmt19937.c, type

gcc -o sample2 sfmt19937.c sample2.c

If your CPU supports SSE2 and you want to use optimized SFMT for SSE2, type

gcc -msse2 -o sample2 sfmt19937-sse2.c sample2.c

If your CPU supports AltiVec and you want to use optimized SFMT for AltiVec, type

gcc -faltivec -o sample2 sfmt19937-alti32.c sample2.c

or type

gcc -faltivec -o sample2 sfmt19937-alti64.c sample2.c

The difference between sfmt19937-alti32.c and sfmt19937-alti64.c is: sfmt19937-alti32.c support 32-bit and 64-bit output, but 64-bit output is slower, on the other hand, sfmt19937-alti64.c support 64-bit output only, but it's faster than sfmt19937-alti32.c

2-3. Use sequential call and inline functions.

Here is sample3.c which modifies sample1.c. This is very similar to sample1.c. The difference is only one line. It include "sfmt19937.c" instead of "sfmt19937.h" .

#include <stdio.h>
#include <stdlib.h>
#include "sfmt19937.c"

int main(int argc, char* argv[]) {
    int i, cnt, seed;
    double x, y, pi;
    const int NUM = 10000;

    if (argc >= 2) {
	seed = strtol(argv[1], NULL, 10);
    } else {
	seed = 12345;
    }
    cnt = 0;
    init_gen_rand(seed);
    for (i = 0; i < NUM; i++) {
	x = genrand_res53();
	y = genrand_res53();
	if (x * x + y * y < 1.0) {
	    cnt++;
	}
    }
    pi = (double)cnt / NUM * 4;
    printf("%lf\n", pi);
    return 0;
}
      

To compile sample3.c, type

gcc -o sample3 sample3.c

If your CPU supports SSE2 and you want to use optimized SFMT for SSE2, change "sfmt19937.c" in sample3.c to "sfmt19937-sse2.c" and type

gcc -msse2 -o sample3 sample3.c

If your CPU supports AltiVec and you want to use optimized SFMT for AltiVec, change "sfmt19937.c" in sample3.c to "sfmt19937-alti32.c" or "sfmt19937-alti64.c" and type

gcc -faltivec -o sample3 sample3.c

2-4. Initialize SFMT using init_by_array function.

Here is sample4.c which modifies sample1.c. The 32-bit integer seed can only make 232 kinds of initial state, to avoid this problem, SFMT provides init_by_array function. This sample uses init_by_array function which initialize the internal state array with an array of 32-bit. The size of an array can be larger than the internal state array and all elements of the array are used for initialization, but too large array is wasteful.

#include <stdio.h>
#include <strings.h>
#include "sfmt19937.h"

int main(int argc, char* argv[]) {
    int i, cnt, seed_cnt;
    double x, y, pi;
    const int NUM = 10000;
    uint32_t seeds[100];

    if (argc >= 2) {
	seed_cnt = 0;
	for (i = 0; (i < 100) && (i < strlen(argv[1])); i++) {
	    seeds[i] = argv[1][i];
	    seed_cnt++;
	}
    } else {
	seeds[0] = 12345;
	seed_cnt = 1;
    }
    cnt = 0;
    init_by_array(seeds, seed_cnt);
    for (i = 0; i < NUM; i++) {
	x = genrand_res53();
	y = genrand_res53();
	if (x * x + y * y < 1.0) {
	    cnt++;
	}
    }
    pi = (double)cnt / NUM * 4;
    printf("%lf\n", pi);
    return 0;
}
      

To compile sample4.c, type

gcc -o sample4 sfmt19937.c sample4.c

Now, seed can be a string. Like this:

./sample4 your-full-name