I found out that this was due to stack misalignment. Instead of aligning the __m128i variables at 16bytes boundary, the compiler aligned them with 8 bytes offset. In any case I found that I can work this around by commenting one line in dSFMT.c:/** dsfmt initialized flag */ //int dsfmt_global_is_initialized = 0; //this line is commentedInstead I add this line in the file that contain the main function. In your example, inside test.c. This hack somehow make the alignment normal again.