Skip to content

Commit

Permalink
Call _mm256_zeroupper() when leaving avx512 code
Browse files Browse the repository at this point in the history
Carsten Steger says:

   simd-avx512.h defines VLEAVE as nothing in FFTW 3.3.7.  However, the
   current Intel® 64 and IA-32 Architectures Optimization Reference Manual,
   chapter 15.18, recommends the following:
   - When you have to mix group B instructions with Intel SSE instructions,
     or you suspect that such a mixture might occur, use the VZEROUPPER
     instruction whenever a transition is expected.
   - Add VZEROUPPER after group B instructions were executed and before any
     function call that might lead to Intel SSE instruction execution.
   - Add VZEROUPPER at the end of any function that uses group B instructions.
   - Add VZEROUPPER before thread creation if not already in a clean state
     so that the thread does not inherit Dirty Upper State.
   (Group B are instruction types that modify bits 128-511 of vector
   registers 0-15.)

   Therefore, I believe it would be prudent to define VLEAVE as
   _mm256_zeroupper in simd-avx512.h (see the attached patch).

At https://software.intel.com/en-us/forums/intel-isa-extensions/topic/704023
Mark Charney says:

   To be clear, we very much still recommend using VZEROUPPER on
   Skylake. Even though it does not have the same penalties as earlier
   designs in that family for mixing AVX and SSE code, we definitely
   recommend using VZEROUPPER on Skylake.

   Yes it would obviously be better if there were one solution.  For
   code that has to run on both families, the "common code" solution
   is to use the Xeon guidelines.

If Mark Charney recommends VZEROUPPER, that's good enough for me.
  • Loading branch information
matteo-frigo committed Feb 19, 2018
1 parent b267008 commit eba07c4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion simd-support/simd-avx512.h
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,6 @@ static inline V BYTWJ2(const R *t, V sr)
#endif /* FFTW_SINGLE */
#define TWVLS (2 * VL)

#define VLEAVE() /* nothing */
#define VLEAVE _mm256_zeroupper

#include "simd-common.h"

0 comments on commit eba07c4

Please sign in to comment.