Call _mm256_zeroupper() when leaving avx512 code · joe-nano/fftw3@eba07c4

Commit

Call _mm256_zeroupper() when leaving avx512 code

Carsten Steger says:

   simd-avx512.h defines VLEAVE as nothing in FFTW 3.3.7.  However, the
   current Intel® 64 and IA-32 Architectures Optimization Reference Manual,
   chapter 15.18, recommends the following:
   - When you have to mix group B instructions with Intel SSE instructions,
     or you suspect that such a mixture might occur, use the VZEROUPPER
     instruction whenever a transition is expected.
   - Add VZEROUPPER after group B instructions were executed and before any
     function call that might lead to Intel SSE instruction execution.
   - Add VZEROUPPER at the end of any function that uses group B instructions.
   - Add VZEROUPPER before thread creation if not already in a clean state
     so that the thread does not inherit Dirty Upper State.
   (Group B are instruction types that modify bits 128-511 of vector
   registers 0-15.)

   Therefore, I believe it would be prudent to define VLEAVE as
   _mm256_zeroupper in simd-avx512.h (see the attached patch).

At https://software.intel.com/en-us/forums/intel-isa-extensions/topic/704023
Mark Charney says:

   To be clear, we very much still recommend using VZEROUPPER on
   Skylake. Even though it does not have the same penalties as earlier
   designs in that family for mixing AVX and SSE code, we definitely
   recommend using VZEROUPPER on Skylake.

   Yes it would obviously be better if there were one solution.  For
   code that has to run on both families, the "common code" solution
   is to use the Xeon guidelines.

If Mark Charney recommends VZEROUPPER, that's good enough for me.

Loading branch information

matteo-frigo committed Feb 19, 2018

1 parent b267008 commit eba07c4

simd-support/simd-avx512.h

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -311,6 +311,6 @@ static inline V BYTWJ2(const R *t, V sr)
  
    #endif /* FFTW_SINGLE */

    #define TWVLS (2 * VL)

    #define VLEAVE() /* nothing */

    #define VLEAVE _mm256_zeroupper

    #include "simd-common.h"

0 comments on commit `eba07c4`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `eba07c4`

Commit

There are no files selected for viewing

0 comments on commit eba07c4

0 comments on commit `eba07c4`