Comparison of speed

Table 1. The CPU time (sec.) for 108 generations of 32-bit integers, for four different CPUs and two different return-value methods. The ratio to the SFMT coded in SIMD is listed, too.

CPU/CompileroutputMTMT(SIMD) SFMTSFMT(SIMD)
Pentium-M
1.4GHz
Intel C/C++
ver. 9.0
block1.122 0.627 0.689 0.298
(ratio)3.77 2.10 2.31 1.00
seq1.511 1.221 1.017 0.597
(ratio)5.07 4.10 3.41 2.00
Pentium 4
3GHz
Intel C/C++
ver. 9.0
block0.633 0.391 0.412 0.217
(ratio)2.92 1.80 1.90 1.00
seq1.014 0.757 0.736 0.412
(ratio)4.67 3.49 3.39 1.90
Athlon 64 3800+
2.4GHz
gcc
ver. 4.0.2
block0.686 0.376 0.318 0.156
(ratio)4.40 2.41 2.04 1.00
seq0.756 0.607 0.552 0.428
(ratio)4.85 3.89 3.54 2.74
PowerPC G4
1.33GHz
gcc
ver. 4.0.0
block1.089 0.490 0.914 0.235
(ratio)4.63 2.09 3.89 1.00
seq1.794 1.358 1.645 0.701
(ratio)7.63 5.78 7.00 2.98

We compared two algorithms: MT19937 and SFMT19937, with implementations using and without using SIMD instructions.

We measured the speeds for four different CPUs: Pentium M 1.4GHz, Pentium IV 3GHz, AMD Athlon 64 3800+, and PowerPC G4 1.33GHz. In returning the random values, we used two different methods. One is sequential generation, where one 32-bit random integer is returned for one call. The other is block generation, where an array of random integers is generated for one call.

We measured the consumed CPU time in second, for 108 generations of 32-bit integers. More precisely, in case of the block generation, we generate 105 of 32-bit random integers by one call, and it is iterated for 103 times. For sequential generation, the same 108 32-bit integers are generated, one per a call. We used the inline declaration inline to avoid the function call, and unsigned 32-bit, 64-bit integer types uint32_t, uint64_t defined in INTERNATIONAL STANDARD ISO/IEC 9899 : 1999(E) Programming Language-C, Second Edition (which we shall refer to as C99 in the rest of this article). Implementations without SIMD are written in C99, whereas those with SIMD use some standard SIMD extension of C99 supported by the compilers icl (Intel C compiler) and gcc.

Table 1. summarises the speed comparisons. The first four lines list the CPU time (in second) needed to generate 108 32-bit integers, for a Pentium-M CPU with the Intel C/C++ compiler. The first line lists the seconds for the block-generation scheme. The second line shows the ratio of CPU time to that of SFMT(SIMD). Thus, SFMT coded in SIMD is 2.10 times faster than MT coded in SIMD, and 3.77 times faster than MT without SIMD. The third line lists the seconds for the sequential generation scheme. The fourth line lists the ratio, with the basis taken at SFMT(SIMD) block-generation (not sequential). Thus, the block-generation of SFMT(SIMD) is 2.00 times faster than the sequential-generation of SFMT(SIMD).