Q and A of SIMD-oriented Fast Mersenne Twister

Questions and Answers of SIMD-oriented Fast Mersenne Twister

about source code

MSBs of MASKs seem to have no effect.: Question: MSBs of MASKs seem to have no effect. Can I set 0 to MSBs of MSK1 to MSK4? Because I intend to implement it to other language which doesn't have unsigned integer.; Answer: Yes. As you see in do_recursion function, in the expression
(b->u[0] >> SR1) & MSK1
the left SR1 bits of b->u[0] are filled by 0 by shift, and consequently SR1 MSBs of MASK1 has no effect in the result. Thus, you can set SR1 MSBs of MASKs arbitrarily, for example, to 0s. Similar for SR2--4 and MASK2--4. (The MASKs are selected out to attain good pseudorandomness from many randomly chosen candidates, and they are as they were found.) However, please check carefully that the output sequence does not change from the original, when you implement SFMT in other languages.

about dSFMT and MinGW

dSFMT ver. 1.3 dose not work on MinGW

answer: Some people reported this problem. And Yusdi Santoso give us a mail to avoid this:

I found out that this was due to stack misalignment. Instead of aligning the __m128i variables at 16bytes boundary, the compiler aligned them with 8 bytes offset. In any case I found that I can work this around by commenting one line in dSFMT.c:
	  /** dsfmt initialized flag */
	  //int dsfmt_global_is_initialized = 0;                  //this line is commented
	
Instead I add this line in the file that contain the main function. In your example, inside test.c. This hack somehow make the alignment normal again.

answer: dSFMT ver. 2.0 works on cygwin. (I hope it also works on MinGW.)

Speed comparison

Question:

I downloaded and installed dSFMT-src-2.0 and SFMT-src-1.3.3, and I timed them against the original version that we've been using for many years now. I wrote some test programs to compare the original one and the new one. (attached files are here. file1, file2, file3) However, I find that the new versions are slower than the first one by a significant amount of time. I wonder what I am doing wrong.

Answer:

My guess is that:

Your compiler changes function call to Genrand() to inline function. This means the code of Genrand() is expanded where function call is placed. This decrease the cost of function call and increase the speed much.
Your test program calls Genrand() but does not use its result. Your compiler thinks the result is not used then conversion to float is not necessary. Because of the same reason, the tempering is not necessary.

To do the fair comparison, you should

separate the code of MT and test program.
use the result of Genrand() (i.e simply adding)

The cost of function call is very high, please consider the use of fill_array function of (d)SFMT in your application. fill_array is designed to decrease the cost.

Reply:

I truly appreciate your help. Indeed you are right, when used correctly your algorithms are significantly faster than the older version. I saw a speed up by a factor of 10 when using dSFMT and 20 if I use the arrays. You help has been very valuable. Thank you.

Return to SFMT main page